Improving the performance of CEPH storage using host side caching software for VMware
CEPH is a popular open source storage software. However it exhibits high write latencies. VirtuCache caches CEPH volumes to in-host SSDs and by doing so reduces VM level latencies considerably.
CEPH came out of cloud service providers. It has now started to become popular in on-premise enterprise IT because it offers better reliability and deployment flexibility at a lower cost than big brand storage appliances.
Pros of CEPH storage.
CEPH runs on commodity servers. It clusters servers together and presents this cluster of servers as an iSCSI appliance. Considering that each 3.5” inch hard drive now carries 10TB capacity and costs only $300, a server with 200TB of storage costs just $20K. If you want to scale up storage capacity, you can keep adding servers to this cluster, with storage capacity and performance scaling linearly as you add servers. You manage this cluster, however large, with a single GUI. It is open source and part of mainline Linux, hence cheap.
One drawback of CEPH.
The one drawback with CEPH is that write latencies are high even if one uses SSDs.
By deploying VirtuCache which caches hot data to in-host SSDs, we are able to get All-Flash array like latencies for CEPH storage despite the fact that our CEPH deployments use slower (7200 RPM) SATA hard drives.
VirtuCache and CEPH deployment at Klickitat Valley Health (KVH).
KVH runs a hospital and few clinics in Klickitat Valley, WA.
CEPH deployment: We deployed a 3 server cluster at KVH with each server carrying 24TB (3x 8TB HDD) raw storage and 480GB SSD (for journaling). So total raw storage capacity of 72TB was deployed with CEPH. CEPH was presented over iSCSI to VMware hosts. Since a replication factor of 2 was used, 72TB of raw storage amounted to 36TB of usable capacity.
VirtuCache was configured to cache Datastores created in CEPH to striped 3TB SSDs in each host. Both reads and writes were cached to this host based SSD. As a result latencies at the VM level were now under 10ms, regardless of how high or random the workload.
Below chart shows CEPH latencies (blue) versus VM latencies (yellow). The reason VM latencies are much lower than SAN/CEPH latencies is because of VirtuCache.
Chart: Write latencies at the SAN versus at the VM.
VM level latencies are much lower than SAN latencies because of VirtuCache caching to in-host SSDs.