Improving VM write latency in VMware Metro Storage Cluster (vMSC) – By caching writes to in-host NVME / RAM.

Improving VM write latency in VMware Metro Storage Cluster (vMSC) – By caching writes to in-host NVME / RAM.

By caching all writes from VMs to VMware host SSD or RAM, VirtuCache reduces write latency for VMs that are deployed in a VMware Metro / Stretched Storage Cluster (vMSC), and eliminates latency of the WAN link from factoring into VM latencies, thus improving the performance of VMs however slow your WAN link.

This article discusses how VirtuCache does this by explaining the read and write IO path in vMSC before and after installing VirtuCache.

Introduction to VirtuCache

VirtuCache is software you install in the VMware host. It automatically caches frequently & recently used reads and all writes from VMs to in-ESXi host SSD or RAM. By servicing most reads and all writes from ESXi host SSD / RAM, it much reduces the impact of the storage appliance and network on VM level storage latencies.

Storage IO path in uniform VMware Metro / Stretched Storage Cluster (vMSC) configurations

All the major storage vendors offer vMSC in two flavors – uniform and non-uniform, with uniform architecture being the preferred option. The difference between the two architectures is that in uniform architecture, ESXi hosts in both datacenter locations are connected to storage appliances at both sites, whereas in non-uniform architecture, ESXi hosts in each datacenter location are directly connected to only the storage array in its own location. Both architectures require a high-speed WAN link between the two geographically dispersed datacenters to connect the two storage arrays. This is to keep the data in sync between the two arrays.

Failure of the storage array in one site causes VMs to HA restart on hosts at the other site in the case of non-uniform architecture, whereas in uniform architecture, the VMs keep running (i.e. no HA restart happens), and it is for this reason that uniform architecture is the preferred deployment model for vMSC.  The one drawback with uniform architecture is that after a storage array failure, the reads and writes now traverse the WAN link from the array at the other location. And it is this WAN link related read and write latencies that VirtuCache suppresses.

There are other differences between uniform and non-uniform architecture but I won’t delve into those since that’s not the purpose of this post.

Also, I will only focus on uniform architecture since that is the preferred deployment model for vMSC.

In vMSC uniform configuration, the two storage arrays in two different datacenter locations are clustered together, so they are seen as a single storage target by ESXi hosts at both datacenter locations. And every volume on one array has a mirrored volume on the other. The WAN link between the two datacenter locations is used for two purposes – to keep the data in sync between the two arrays, and cross-connect the ESXi hosts in each location to storage arrays in both locations.

Since the IO path from ESXi hosts to the array at the same site is shorter than the IO path to the array at the remote site, all storage IO from the hosts flows to the array at the local site only. Hence these paths are called preferred paths (also called ALUA optimized) paths, and the paths from the hosts to the array at the remote site (which is separated by a WAN link) are the standby paths (or ALUA unoptimized paths). The standby paths come into play only if the storage array with the preferred (ALUA optimized) paths fails.

When there is a storage array failure in Datacenter / Site 1, all VM writes and reads from that Datacenter / Site goto to the remote storage array over the WAN link. This will increase VM read and write latencies.
Completely eliminating (not just reducing) WAN link latency from factoring into VM write latencies

Write IO path in vMSC with VirtuCache installed on every ESXi host:  When VirtuCache is installed on all hosts at both datacenters, all writes from VMs will be written to SSD / RAM that’s in the local ESXi host and another copy of the writes is written to SSD / RAM in another host in the same datacenter. This happens regardless of whether the local storage array is in operation or fails. In other words, VirtuCache will send a write acknowledgement back to VMware when VirtuCache commits writes from VMs to cache media in the local ESXi hosts. This also means that VMware (and hence applications running in VMs) acknowledge a write commit much before the write is committed to the backend storage array(s). Now there is a VirtuCache background job that continuously syncs the locally cached writes to the backend storage array, however this VirtuCache write flush process doesn’t contribute to VM write latency. And it is for this reason that the local storage network latency or inter-datacenter WAN latency don’t contribute to VM write latency when VirtuCache is in the IO path.  During regular operation, VirtuCache syncs the write cache to the local array,  and when the local storage array fails, VirtuCache syncs the write cache to the remote array. Whether VirtuCache flushes the writes to the local array or remote array, VM write latencies remain the same. Since WAN latency doesn’t factor into VM write latency, a lower bandwidth / higher latency link between datacenter will work just fine.

When VirtuCache is installed, VM write latencies stay the same whether the storage appliance at the local site is operational or not
Read IO path in vMSC with VirtuCache installed on every ESXi host

VirtuCache caches frequently and recently used reads to in-host cache media. We expect that over 90% of reads will be serviced from in-host cache media. Any reads that are not in the ESXi local cache are serviced from local storage array in case of regular operation. If there is a storage array failure, these reads will be serviced from the remote array. Since the volume of reads coming from the backend array will be small, a lower bandwidth / higher latency link between datacenter suffices.

Summary

With VirtuCache installed, you can buy a lower bandwidth WAN link, stretch the cluster across longer distances than what the storage vendor recommends, and tolerate WAN latency peaks of up to 200 milliseconds (much more than the 5-10 millisecond WAN latency that the storage vendor recommends), without adversely impacting VM read and write latencies.

Download Trial Contact Us