Our main design principles were to develop a host side caching software that is the highest performing storage performance tier in the market and also the easiest to deploy & use.
VirtuCache is software for VMware that clusters together any in-VMware host storage media (SSD, RAM, PMEM) installed across VMware hosts in a VMware cluster and then caches all reads and writes from any SAN based storage appliance to this clustered pool of host based high-speed media. Subsequently, by automatically serving all VM reads and writes from/to in-host Flash / RAM / PMEM, VirtuCache substantially improves storage performance of VMware & SAN infrastructure, and without requiring any changes to existing storage, server and networking infrastructure.
Using in-VMware host Flash or RAM to solve storage I/O bottleneck
From a cost/MBps throughput and cost/millisecond latency point of view, NVME SSD is the ideal media for solving storage throughput and latency issues for random workloads, as is the case with VMware workloads. RAM is an even higher performing media, however, it’s much more expensive than SSD(Flash Memory), and often times host RAM itself is a constrained resource. But if one does have spare in-VMware host RAM, it should definitely be used as cache media either exclusively or in combination with an in-host SSD.
Secondly, cache media in the VMware host is the ideal place in the storage I/O path to solving the storage bottleneck issue versus cache media in the storage appliance, because host-side cache media is right on the motherboard of the VMware host CPU that consumes ‘hot’ data, and it’s connected to the host CPU via a high-speed memory bus or a PCIe bus. In comparison cache media in the storage appliance is behind storage controllers and network.
Note: All references to in-VMware host cache media in the below sections apply to in-VMware host SATA/SAS/NVME SSD, RAM, and Persistent Memory, since VirtuCache can cache to any of these media types in a VMware host.
All read requests from VMs are intercepted by VirtuCache software in the VMware kernel. VirtuCache first looks up the local cache media for this data. If the data is in the in-host cache media, it is served to the VM from there (called ‘cache hit’). If the data is not in the in-host cache media, the I/O path proceeds along its original course and VMware retrieves the data from the backend LUN/Volume. At that point, VirtuCache copies the data to the in-host cache media as well. Subsequently, if the same data is requested again by any VM on the host, it is now served from the local in-host cache media, instead of from the backend storage appliance. In this way, VirtuCache accelerates read operations by serving up frequently and recently used data from in-host cache media.
All writes from VMs are written to the in-host cache media without synchronously writing to the backend storage appliance. By writing only to the in-host cache media, writes are substantially accelerated, however, the fact that we are not synchronously committing the writes to the backend storage appliance introduces the risk of data loss in case the local host were to fail. To guard against this possibility, VirtuCache protects the local cache by replicating (mirroring) the writes to cache media across hosts in a VMware cluster.
Cache replication to protect against data loss in case of host failure
One of the main benefits of clustering cache media across VMware hosts is being able to mirror the write cache across VMware hosts in a distributed fashion. The administrator specifies the number (0, 1, or 2) of copies of write cache that need to be on separate hosts for each local VM write cache in the cluster. We call this a ‘write replica’. The number of write replicas indicates the maximum number of node failures that can be sustained before there is data loss in the cluster. If an administrator chooses to keep, say, one write replica, VirtuCache automatically replicates the writes to cache media in one additional VMware host in the same cluster. VirtuCache defaults to using the vMotion network for such replication. However, a separate network can be configured as well. Reads are not replicated since the backend storage appliance is always in-sync as far as reads go. In the event of a host failure, VirtuCache syncs the backend storage appliance with a backup copy of the write cache from other hosts (write replicas).
Also, at no point in time will more than two minutes of writes be stored on the in-host cache media. This is to avoid network congestion if a host were to fail. If a host fails, VirtuCache will immediately sync write replica data (stored on other hosts in the cluster) for all the VMs on the failed host, to the backend SAN appliance, and we don’t want large amounts of write replica data flushed over the SAN to choke the network, hence the 2-minute time limit for write data. The two minutes are calculated in terms of SAN speed, so if the SAN speed is, say, 100MBps, the two minutes of writes work out to 12GB of data. So in this case, no more than 12GB of writes will be cached to the in-host cache media. This constraint doesn’t apply to cached reads.
Syncing writes to backend storage
VirtuCache has a background task that continuously syncs the write cache to the backend SAN storage. VirtuCache adjusts the speed and frequency at which writes are synced based on the latency of the storage network and appliance, so as to not choke both by trying to sync too quickly.
Flow control to prevent write intensive VMs from taking over the cache media
Since the amount of cache media installed in VMware hosts will typically be a small percentage of the total LUN capacity of the backend storage appliance, care needs to be taken to prevent write intensive VMs from using up the entire cache media. VirtuCache allows bursty writes from VMs to be written to the cache media at native write speeds of the cache media without synchronously syncing the data to the backend disk. However, if there is prolonged write intensive activity from only a few VMs, VirtuCache’s flow control feature throttles back write speeds to the in-host cache. This helps ensure fair allocation of in-host cache capacity to other VMs on the host and ensures orderly de-staging of writes from the cache media to the backend LUN.
Keeping the cache fresh
All writes from VMs are first written only to the cache media, without synchronously committing the writes to the backend SAN appliance. Regarding reads, as blocks are read from back-end appliance by the VMs, they are immediately copied to the in-host cache media. We do this because the chances of a block that has been read once being read again are much higher than other blocks on the backend storage appliance.
We use a combination of Least Recently Used (LRU) and First-in-First-Out (FIFO) algorithms to replace older data with newer data in the cache.
VirtuCache software on each host is managed using the VirtuCache management VM. Only one instance of the management VM needs to be deployed per vCenter. The management VM carries our web user interface and vCenter plug-in, which lets you view performance statistics and configure VirtuCache across the cluster. The management VM is not in the storage IO path and can be powered off without affecting caching behavior.
VirtuCache is certified by VMware through VMware’s VAIO (‘VMware API for IO Filters’) program. We use VAIO APIs from VMware to develop VirtuCache.
By caching all VM reads and writes, VirtuCache substantially improves VM performance and at a much lower cost than an upgrade to a high-end All-Flash array or hyper-converged hardware.