Why do Snapshots affect VM Performance?
VM snapshot is a very commonly used feature in the VMware vSphere environment. A VM snapshot preserves the state and data of a VM at a particular point in time. This includes the VM powered state, along with all the data including disks, memory and virtual devices. This makes the VM snapshot a particularly useful feature in scenarios where one wants to preserve a known “good” state of the VM. VM snapshots have many popular use cases such as in backup and replication software like Veeam and Nakivo, for VDI environments such as VMWare’s Horizon and for direct use by the end user to preserve a VM state before making major changes to the VM.
When a VM Snapshot is taken, the state of the virtual disks belonging to the VM are preserved. This is ensured by making the virtual disks read only, so the VM stops writing to them. Instead delta disks are created and subsequent VMs writes go to the delta disk instead of the base disk. The format of the delta disk depends on the underlying storage and datastore format. The default and most commonly used format for delta disks on VMFS6 datastores is the SEsparse format. The rest of this article focuses on this category of snapshots.
VM Performance
Performance benchmark tests such as FIO and Hammerdb show a dramatic drop in VM performance in terms of lower IOPS (as much as 85% lower) and higher latencies for VMs with snapshots as shown in the figure below. Also, more the number of snapshots in the snapshot chain of the VM, more severe is the performance degradation.
Reasons for Performance Degradation
The reason for this dramatic drop in VM performance has to do with how disk reads and writes are handled after a snapshot is created. As mentioned before, after a snapshot is taken, the existing base virtual disk is made read only and writes are made to the delta disk. So, the delta disk essentially contains all the updates made to the virtual disk since its parent’s snapshot was taken. The delta disk maintains an index identifying the location and the size of the update and also the actual data of the update. Thus each VM write translates to more than one physical disk write – a) to update the index contained in the delta disk and b) to actually write the data to the delta disk.
Similarly, VM reads also translate to multiple physical disk reads – a) to read the delta disk index to find out if the region corresponding to the read has being updated in the delta disk and if so, then issue additional read(s) on the delta disk to read the actual data and b) if the region corresponding to the read has not being updated in the delta disk than to read the base disk to get the actual data. Also, if there are more than one snapshot in the snapshot chain of the VM, then potentially multiple such reads get triggered at every level in the snapshot chain hierarchy.
Thus we see that both for VM reads and writes, multiple IO operations are triggered on the physical disk for VMs with snapshots. Another observation is that, VMs with multi-level snapshot would trigger even more disk IO operations depending on the length of the snapshot chain. This results in the latency of each VM IO operation to more than double, resulting in fairly dramatic drop in storage IOPS for the VM. Additionally, if a majority of the running VMs in an environment are VMs with snapshot (as would be the case in, say, VDI environments), then the total number of IO operations on the storage appliance would see a multi-fold increase thereby bringing down the IO performance of the appliance and affecting the latency of every storage operation done on the appliance.
VirtuCache alleviates IO performance issues of VM with snapshots
VirtuCache’s caching software significantly improves performance of VMs with snapshots by caching both read and write IO operation of such VMs. IO latency is brought down by an order of magnitude as the IO reads and write get served from local high speed SSD or RAM instead of the storage appliance. Also, by serving the IO from local SSDs, the load on the storage appliance is reduced leading to better overall storage performance.
Additionally, for VDI deployments which use linked or instant clones, VirtuCache caches the IOs of the base disk as well. Thus a cache entry for a region on the base disk improves performance of all the VDI VMs in a pool based out of the same parent base disk and not just the individual VM whose read operation would have triggered the cache entry to be placed in the first place. This pooling of the cache for the parent disk across all VDI VMs in a pool based of that disk significantly improves storage IO performance of VDI VMs and alleviates boot storm issues associated with such deployments.