Reducing Write Latencies in CEPH Storage
CEPH is a popular open source storage software. However its write latencies are high. VirtuCache caches CEPH volumes to in-host SSDs and by doing so reduces VM level latencies considerably.
CEPH is an open source storage software that has become popular because it offers better reliability and deployment flexibility at a lower cost than big brand storage appliances.
CEPH runs on commodity servers. It clusters servers together and presents this cluster of servers as an iSCSI appliance. Considering that each 3.5” inch hard drive now carries 10TB capacity and costs only $300, a server with 200TB of storage costs just $15K. The advantages of CEPH versus big brand storage appliance is that you can keep adding servers to this cluster, with storage capacity and performance scaling linearly as you add servers. You manage this cluster, however large, with a single GUI. It is open source and part of mainline Linux, hence cheap.
The one drawback with CEPH is that write latencies are high even if one uses SSDs for journaling.
VirtuCache + CEPH
By deploying VirtuCache which caches hot data to in-host SSDs, we have been able to get All-Flash array like latencies for CEPH based storage despite the fact that our CEPH deployments use slower (7200RPM) SATA drives.
Use Case at Klickitat Valley Health (KVH)
KVH runs a hospital and few clinics in Klickitat Valley, WA.
CEPH deployment: We deployed a 3 server cluster at KVH with each server carrying 24TB (3x 8TB HDD) raw storage and 480GB SSD (for journaling). So total raw storage capacity of 72TB was deployed with CEPH. CEPH was presented over iSCSI to VMware hosts. Since a replication factor of 2 was used, 72TB of raw storage amounted to 36TB of usable capacity.
VirtuCache was configured to cache Datastores created in CEPH to a 3TB SSD in each host. Both reads and writes were cached to the host based SSD. As a result latencies at the VM level were now under 10ms, regardless of how high or random the the workload.
Below chart shows CEPH latencies (blue) versus VM latencies (yellow). The reason VM latencies are much lower than SAN/CEPH latencies is because of VirtuCache.
Chart 1: Write latencies at the SAN versus at the VM
VM level latencies are much lower than SAN latencies because of VirtuCache caching to in-host SSDs.