Archive: Posts

To Improve CEPH performance for VMware, Install SSDs in VMware hosts, NOT OSD hosts.

SSDs deployed for caching in CEPH OSD servers are not very effective. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run user applications) of where the IO bottleneck is. This post looks at this performance shortcoming of CEPH and its solution.

There are two options for improving the performance of CEPH.

Option 1 is to deploy SSDs in CEPH OSD servers for journaling (write caching) and read caching.

Option 2 is to deploy SSDs and host side caching software in the VMware hosts (that connect to CEPH over iSCSI). The host side caching software then automatically caches reads and writes to the in-VMware host SSD from VMware Datastores created on CEPH volumes.

Below are reasons for why we recommend that you go with Option 2.

  1. SSDs in CEPH OSD servers don’t help much because that’s NOT where the storage IO bottleneck is.

    Option 1 – Caching SSD in the OSD server.

    Option 2 – Caching SSD in the VMware host.

    Caching SSD deployed in an OSD server, improves the performance of its Linux filesystem, however the storage bottleneck is further upstream (closer to the VMs), in CEPH’s iSCSI gateway and the CEPH layer that replicates data across OSD servers. As a result, accelerating the Linux filesystem of the OSD server doesn’t improve storage performance for applications (that run in VMs).

    With in-VMware host caching, the SSD is in the VMware host, hence closer to the host CPU that consumes ‘hot’ data, and upstream of CEPH iSCSI gateway and CEPH replication layer, where the bottleneck is. Secondly, the SSD is on a NVME/PCIe slot which is the highest bandwidth slot on the motherboard. For these reasons, host side caching in VMware is effective in improving VM level storage performance from CEPH.

  2. Write IO Path.

    The longer the write IO path, the higher the write latency.

    Write IO path when SSD + Caching is in the OSD server.

    Write IO path when SSD + Caching is in the VMware host.

    All writes go from VM > VMware iSCSI adapter > NIC in VMware host > Switch > CEPH iSCSI gateway > Journaling SSD (CEPH write cache) in OSD server + another Journaling SSD in another OSD server (because of 2X replication factor).

    All writes go from VM > Host side caching SSD in VMware host + another SSD in another VMware host (2X replication factor in VirtuCache). Note: To prevent data loss in case of host failure, VirtuCache replicates cached writes to another SSD in another VMware host in the cluster.

    As you can see, the write path is much shorter in the case of VMware host side caching, and hence this option has lower write latency.

  3. Read IO Path.

    Read IO path when Caching SSD is in the OSD server.

    Read IO path when Caching SSD is in the VMware host.

    Cached reads go from SSD in OSD server > CEPH iSCSI gateway > Switch > NIC in VMware host > iSCSI adapter in VMware > VM.

    Cached reads go from SSD in the VMware host > VM.

    Read path in the case of VMware host side caching is much shorter as well.

  4. SSD capacity required.

    Total number of SSDs required when caching SSD is in the OSD server is equal to the number of OSD servers.

    Total number of SSDs required when caching SSD is in the VMware host is equal to the number of VMware hosts.

    We are assuming that your use case is storage capacity intensive (hence the choice of CEPH) and not compute intensive, i.e. there are more CEPH OSD servers than there are VMware hosts, and hence you’d need more SSDs if you were to do caching in OSD hosts versus doing caching in VMware hosts.

  5. For all the reasons above, you will be benefited by lower latencies, higher throughput, and lower costs, if you deploy host side caching software and SSDs in VMware hosts versus deploying caching and SSDs in CEPH.


    Iometer test results when caching SSD is in OSD server versus when caching SSD is in VMware host

    Below are latency and throughput charts for a standard Iometer test when an Intel NVME SSD was used as a cache device in an OSD server versus when the same SSD was used as cache device in a VMware host with VirtuCache.

    A very straightforward Iometer test was run that had 80/20 read/write ratio; 100% random IO pattern; 4KB block size; using a 10GB test file.

    The test setup had 3 ESXi hosts running ESXi 6.7 connected to CEPH over iSCSI (6 OSD servers + 3 MON & iSCSI gateway servers).

    Option 1 – Intel P4600 NVME SSD installed in each CEPH OSD server (SUSE12.2), to cache reads and writes. Read caching on the OSD server was done using Intel CAS software. Write caching used native CEPH journaling.

    Option 2 –VirtuCache was installed in each ESXi 6.7 host, caching reads and writes to an Intel P4600 NVME SSD installed in the same ESXi host.

    Throughput comparison when SSD+caching is in the OSD server versus when the same SSD is in the VMware host along with VirtuCache
    Throughput comparison when caching SSD is in the OSD server vs. when the same SSD is in the ESXi host with VirtuCache
    Latency comparison when SSD+caching is in the OSD server versus when the same SSD is in the VMware host along with VirtuCache
    Latency comparison when caching SSD is in the OSD server vs. when the same SSD is in the ESXi host with VirtuCache