To Improve CEPH performance for VMware, Install SSDs in VMware hosts, NOT OSD hosts.
SSDs deployed in CEPH OSD servers, whether for caching / journaling or primary storage, are not very effective from a performance point of view. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run user applications) of where the IO bottleneck is. This post looks at this performance shortcoming of CEPH when connected to ESXi hosts, and its solution.
There are two options for improving the performance of CEPH.
Option 1 is to deploy SSDs in CEPH OSD servers, whether they be for journaling, read caching, or primary storage(all-flash CEPH).
Option 2 is to deploy SSDs and host side caching software in the VMware hosts, that are connected to CEPH. The host side caching software then caches reads and writes to the in-VMware host SSD from VMware Datastores that reside on CEPH volumes.
Below are reasons for why we recommend Option 2 above.
SSDs in CEPH OSD servers don’t help much because that’s NOT where the storage IO bottleneck is.
Option 1 – SSD in the OSD server.
Option 2 – SSD in the VMware host.
SSD deployed in an OSD server, whether its for caching & journaling or primary storage, improves the performance of OSD server’s Linux filesystem, however the storage bottleneck is further upstream (closer to the VMs), primarily in CEPH’s iSCSI gateway and secondly in the CEPH layer that replicates data across OSD servers. As a result, accelerating the Linux filesystem of the OSD server is not the most effective way to improve performance of VMs / VMware (that connect to CEPH).
With in-VMware host caching, the SSD is in the VMware host, hence closer to the host CPU that consumes ‘hot’ data, and upstream of CEPH iSCSI gateway and CEPH replication layer, where the bottlenecks are. Secondly, the SSD is on a NVME/PCIe slot which is the highest bandwidth slot on the motherboard. For these reasons, host side caching in VMware is the most effective way of improving VM level storage performance from CEPH, even if you are using an all-Flash CEPH.
Write IO Path with and without Host Side Caching software.
High write latency is the main problem in CEPH. As you can see from the below diagrams, the write IO path is much shorter when host side caching is installed in ESXi hosts, and hence this option has lower write latency.
With Host Side Cache in ESXi, VM write latency for all writes = Host cache media latency + Latency of VMware network (used by host side caching software to mirror write cache).
Without Host Side Cache, VM write latency for all writes = Latency of iSCSI Network + iSCSI gwy s/w & server latency + OSD SSD latency + OSD s/w & server latency + CEPH replication latency.
Read IO Path for Cached Reads: As you can see from the below diagrams, the read path when host side caching is deployed in the ESXi host, is shorter as well.
when Host Side Cache is installed in ESXi, VM read latency for cached read = Host cache media latency.
Without Host Side Cache, read latency for reads cached in CEPH = iSCSI Network Latency + iSCSI gwy s/w & server latency + OSD SSD latency + OSD s/w & server latency.
Even if you have an all-flash CEPH, you should evaluate VirtuCache with in-host NVME SSD or RAM to verify the above argument.
Iometer test results when caching SSD is in OSD server versus when caching SSD is in VMware host
Below are latency and throughput charts for a standard Iometer test when an Intel NVME SSD was used as a cache device in an OSD server versus when the same SSD was used as cache device in a VMware host with VirtuCache.
A straightforward Iometer test was run from within a VM, that had 80/20 read/write ratio; 100% random IO pattern; 4KB block size; using a 10GB test file.
The test setup had 2 ESXi hosts running ESXi 6.7 connected to CEPH over iSCSI (6 OSD servers + 3 MON servers + 3 iSCSI gateway servers). Iometer test was run in only one VM.
Option 1 – Two Intel P4600 NVME SSDs were installed in each CEPH OSD server (SUSE12.2), one to cache reads and another to journal the writes. Read caching on the OSD server was done using Intel CAS software. Writes were natively journaled to the SSD in CEPH.
Option 2 –VirtuCache was installed in each ESXi 6.7 host, caching reads and writes to an Intel P4600 NVME SSD installed in the same ESXi host.