Improving the performance of CEPH storage for VMware

Improving the performance of CEPH storage for VMware


CEPH is fast becoming the most popular open source storage software. However it’s one drawback is high latency. Host side caching software installed in VMware hosts which can cache ‘hot’ data from CEPH volumes to in-VMware host SSD or RAM can be used to overcome this deficiency in CEPH. We believe that this use case for host side caching software will allow CEPH to be used for latency sensitive on-premise situations.

Pros of CEPH storage.

CEPH storage software can be installed on any commodity servers. It clusters servers together and presents this cluster of servers as an iSCSI or NAS appliance. You can build CEPH storage with any server, SSD, HDD, NIC, essentially any server or server part. There is no vendor lock-in for hardware. As a result, hardware costs are low. If you want to scale up storage capacity, you can keep adding servers to this cluster, with storage capacity and performance scaling linearly as you add servers. You manage this cluster, however large, with a single GUI. All in all, it offers better reliability and deployment flexibility at a lower cost than big brand storage appliances.

One drawback of CEPH.

The one drawback with CEPH is that write latencies are high even if one uses SSDs.
By deploying VirtuCache which caches hot data to in-host SSDs, we are able to get All-Flash array like latencies for CEPH storage built using slow SATA hard drives.

VirtuCache and CEPH deployment at Klickitat Valley Health (KVH).

KVH runs a hospital and few clinics in Klickitat Valley, WA.

CEPH deployment: We deployed a 3 server cluster at KVH with each server carrying 24TB (3x 8TB HDD) raw storage and 480GB SSD (for journaling). So total raw storage capacity of 72TB was deployed with CEPH. CEPH was presented over iSCSI to VMware hosts. Since a replication factor of 2 was used, 72TB of raw storage amounted to 36TB of usable capacity.

VirtuCache was configured to cache Datastores created in CEPH to striped 3TB SSDs in each host. Both reads and writes were cached to this host based SSD. As a result latencies at the VM level were now under 10ms, regardless of how high or random the workload got.

Below chart shows CEPH latencies (blue) versus VM latencies (yellow). The reason VM latencies are much lower than SAN/CEPH latencies is because of VirtuCache.


Chart: Write latencies at the SAN versus at the VM.

VM level latencies are much lower than SAN latencies because of VirtuCache caching to in-host SSDs.