To Improve CEPH performance for VMware, Install SSDs in VMware hosts, NOT OSD hosts.
SSDs deployed for caching in CEPH OSD servers are not very effective. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run end user applications) of where the IO bottleneck is. This post looks at the performance shortcoming of CEPH and its solution.
There are two options for improving the performance of CEPH.
Option 1 is to deploy SSDs in CEPH OSD servers for journaling (write caching) and read caching.
Option 2 is to deploy SSDs in the VMware hosts (that connect to CEPH over iSCSI) along with host side caching software, that then automatically caches reads and writes to the in-VMware host SSD from VMware Datastores created on CEPH volumes.
Below are reasons for why we recommend that you go with Option 2.
SSDs in CEPH OSD servers don’t help much because that’s NOT where the storage IO bottleneck is.
Option 1 – SSD + Caching in the OSD server.
Option 2 – SSD + Caching in the VMware host.
Caching SSD deployed in an OSD server, improves the performance of its Linux filesystem, however the storage bottleneck is further upstream (closer to the VMs), in CEPH’s iSCSI gateway and the CEPH layer that replicates data across OSD servers. As a result, accelerating the Linux filesystem of the OSD server doesn’t improve storage performance for applications (that run in VMs).
With in-VMware host caching, the SSD is in the VMware host, hence closer to the host CPU that consumes ‘hot’ data, and upstream of CEPH iSCSI gateway and CEPH replication layer, where the bottleneck is. Secondly, the SSD is on a NVME/PCIe slot which is the highest bandwidth slot on the motherboard. For these reasons, host side caching in VMware is effective in improving VM level storage performance from CEPH.
Write IO Path.
The longer the write IO path, the higher the write latency.
Write IO path when SSD + Caching is in the OSD server.
Write IO path when SSD + Caching is in the VMware host.
All writes go from VM > VMware iSCSI adapter > NIC in VMware host > Switch > CEPH iSCSI gateway > Journaling SSD (CEPH write cache) in OSD server + another Journaling SSD in another OSD server (because of 2X replication factor).
All writes go from VM > Host side caching SSD in VMware host + another SSD in another VMware host (2X replication factor in VirtuCache). Note: To prevent data loss in case of host failure, VirtuCache replicates cached writes to another SSD in another VMware host in the cluster.
As you can see, the write path is much shorter in the case of VMware host side caching, and hence this option has lower write latency.
Read IO Path.
Read IO path when SSD + Caching is in the OSD server.
Read IO path when SSD + Caching is in the VMware host.
Reads go from SSD in OSD server > CEPH iSCSI gateway > Switch > NIC in VMware host > iSCSI adapter in VMware > VM.
Assuming 90% cache hit ratio, 90% of the reads go from host side caching SSD in the VMware host > VM.
Read path in the case of VMware host side caching is much shorter as well.
SSD capacity required.
Total number of SSDs required when SSD + Caching is in the OSD server = Number of OSD servers.
Total number of SSDs required when SSD + Caching is in the VMware host = Number of VMware hosts.
We are assuming that your use case is storage capacity intensive (hence the choice of CEPH) and not compute intensive, i.e. there are more CEPH OSD servers than there are VMware hosts, and hence you’d need more SSDs if you were to do caching in OSD hosts versus doing caching in VMware hosts.
For all the reasons above, you will be benefited by lower latencies, higher throughput, and lower costs, if you deploy host side caching software and SSDs in VMware hosts versus deploying caching and SSDs in CEPH.
Below are latency and throughput charts for a standard Iometer test when an Intel NVME SSD was used as a cache device in an OSD server versus when the same SSD was used as cache device in a VMware host with VirtuCache.
[80-20 read-write ratio, 100% random IO, 4KB block size, 10GB test file]
3 ESXi hosts running ESXi 6.7 connected to CEPH over iSCSI (6 OSD servers + 3 MON & iSCSI gateway servers).
Option 1 – Intel P4600 NVME SSD in each CEPH OSD server, to cache reads and writes. Read caching with Intel CAS software. Write caching with CEPH journaling.
Option 2 –VirtuCache was installed in each ESXi host, caching reads and writes to an Intel P4600 NVME SSD installed in the same ESXi host.