Our main design principles were to develop a host side caching software that is the highest performing storage performance tier in the market and also the easiest to deploy & use.
VirtuCache is kernel mode software for VMware that clusters together any in-VMware host SSDs (and/or in-VMware host RAM) installed across VMware hosts in a VMware cluster and then caches frequently and recently used data (both reads and writes) from any SAN based storage appliance to this clustered pool of host based high speed media. Subsequently, by automatically serving more and more data from in-host SSDs or RAM, VirtuCache substantially improves storage performance for VMware from any SAN based storage appliance, thus improving the performance of applications running within VMs and allowing higher VM to host ratio, without requiring an expensive upgrade to SSD based storage appliances or hyper-converged hardware.
Using in-VMware host Flash or RAM to solve storage I/O bottleneck
From a cost/MBps throughput and cost/millisecond latency point of view, Flash memory (SSD) is the ideal media for solving storage throughput and latency issues for random workloads, as is the case with VMware workloads. RAM is an even higher performing media, however it’s much more expensive than Flash, and often times host RAM itself is a constrained resource. But if one does have spare in-VMware host RAM, it should definitely be used as cache media either exclusively or if adequate quantity of spare RAM is not available, in combination with an in-VMware host SSD.
Secondly, cache media in the VMware host is the ideal place in the storage I/O path to solve the storage bottleneck issue versus cache media in the storage appliance, because host side cache media is right on the motherboard of the VMware host CPU that consumes ‘hot’ data, and it’s connected to the host CPU via a high speed memory bus or a PCIe bus. In comparison cache media in the storage appliance is behind storage controllers and behind the network.
Note: All references to in-VMware host cache media in the below sections apply to in-VMware host SSD, RAM, and the newer persistent memory, since VirtuCache can cache to any of these media types in a VMware host.
All read requests from VMs are intercepted by VirtuCache software in the VMware kernel. VirtuCache first looks up the local cache media for this data. If the data is in the in-VMware host cache media, it is served to the VM from there (called ‘cache hit’). If the data is not in the in-host cache media, the I/O path proceeds along its original course, and VMware retrieves the data from the backend LUN/Volume. At that point VirtuCache copies the data to the in-VMware host cache media as well. Subsequently if the same data is requested again by any VM on the host, it is now served from the local in-host cache media, instead of from the backend storage appliance. In this way VirtuCache accelerates read operations by serving up more and more data from in-host cache media.
All writes from VMs are written to the in-host cache media without synchronously writing to the backend storage appliance. By writing only to the in-host cache media, writes are substantially accelerated, however the fact that we are not synchronously committing the writes to the backend storage appliance introduces the risk of data loss in case the local host were to fail. To guard against this possibility, VirtuCache protects the local cache by replicating (mirroring) the writes across hosts in a VMware cluster.
Cache replication to protect against data loss in case of host failure
One of the main benefits of clustering cache media across VMware hosts is being able to mirror the write cache across VMware hosts in a distributed fashion. The administrator specifies the number (0, 1, or 2) of copies of write cache that need to be on separate hosts for each local VM write cache in the cluster. We call this a ‘write replica’. The number of write replicas indicates the maximum number of node failures that can be sustained before there is data loss in the cluster. If an administrator chooses to keep, say, one write replica, VirtuCache automatically replicates the dirty writes (dirty writes are writes in the in-host cache that have not yet been synced with backend storage) to cache media on one additional VMware host in the same cluster. We default to using the vMotion network for such replication. However a separate network can be configured as well. Reads are not replicated since the backend storage appliance is always in-sync as far as reads go. In the event of a host failure, VirtuCache syncs the backend storage appliance with backup copy of the write cache from another host (write replica).
Also, at no point in time will more than two minutes of dirty writes be stored on the in-host cache media. This is to avoid network congestion if a host were to fail. If a host fails, VirtuCache will immediately sync write replica data (stored on other hosts in the cluster) for all the VMs on the failed host, to the backend SAN appliance, and we don’t want large amounts of write replica data flushed over the SAN to choke the network, hence the 2 minute time limit for dirty write data. The 2 minutes are calculated in terms of SAN speed, so if the SAN speed is, say, 100MBps, the 2 minutes of dirty writes works out to 12GB of data.
Syncing dirty writes to backend storage
VirtuCache has a background task that continuously syncs dirty writes to the backend SAN storage. VirtuCache adjusts the speed and frequency at which dirty writes are synced based on latency of the SAN and storage appliance, so as not to choke the SAN by trying to sync too quickly.
Flow control to prevent write intensive VMs from taking over the cache media
Since the amount of cache media installed in VMware hosts will typically be a small percentage of the total LUN capacity of the backend storage appliance, care needs to be taken to prevent write intensive VMs from using up the entire cache media. VirtuCache allows bursty writes from VMs to be written to the cache media at native write speeds of the cache media without synchronously syncing the data to the backend disk. However if there is prolonged write intensive activity from only a few VMs, VirtuCache flow control feature throttles back write speeds to the in-host cache. This helps ensure fair allocation of in-host cache capacity to other VMs on the host and ensures orderly de-staging of writes from the cache media to the backend LUN.
Keeping the cache ‘fresh’
All writes from VMs are first written only to the cache media, without synchronously committing the writes to the backend SAN appliance. Regarding reads, as blocks are read from back-end appliance by the VMs, they are immediately copied to the in-host cache media. The reason we do this is because the chances of a block that has been read once being read again is much higher than other blocks on the backend storage appliance.
We use a combination of Least Recently Used (LRU) and First-in-First-Out (FIFO) algorithms to replace older data with newer data in cache, much like how traditional Operating Systems have been using these algorithms for disk-to-memory caching.
Our solution is managed using the VirtuCache management VM. Only one instance of the management VM needs to be deployed per vCenter instance. The management VM lets administrators centrally manage VirtuCache instances on all ESXi hosts managed by that vCenter instance using either our web user interface or our vCenter plug-in. The management VM is not in the IO path and can be powered off without affecting caching behavior.
By accelerating reads and writes using a VMware kernel only software, the performance improvement that VirtuCache brings to our customer’s existing storage appliance rivals the performance of an all-flash array, without the cost involved in an upgrade to an all-flash array.