VDI Boot Storm: What Causes It And How To Fix It
VDI deployments are plagued with boot storm problems as they scale up. Boot storms are caused when many VDI users simultaneously boot up their virtual desktops, say first thing in the morning when they come to work. When a large number of VDI VMs are simultaneously powered on and their OSes boot up it puts enormous stress on various computing resources – CPU, Memory, Networking and Storage. This problem has been faced by VDI deployments ever since they were introduced way back in 2007-08. This led to many initial VDI deployments being abandoned and the technology almost not taking off. With servers now having multicore and faster CPUs, 100s of GBs of RAM, and 10gbps networking, many computing resource pressures have gone away, but contention for storage IOPS continues to plague VDI deployments.
Virtual Desktop Boot Storms caused by Storage Bottlenecks
While Storage speeds have gone up significantly in the last 10 years with the widespread availability of SAS and NVMe SSD based storage, the price-performance increases in storage speeds have still lagged behind other computing resources such as CPU, Memory and Networking. This results in storage becoming a performance bottleneck in any high load situation such as VDI Boot Storms. The problem is aggravated by the resource requirements which are peculiar to VDI deployments. Typically, VDI users require low storage IOPS for normal work, but that’s only after the VM has booted up. Occasional high disk usage can easily be handled if these spikes are random and not synchronized across many VDI VMs. The problem during VDI Boot storms is that while booting, each VDI VM needs to read large areas of the disk where the OS is stored and during certain times such as employees coming to work in the morning the VDI VM boot ups can all happen around the same time. This places high loads on the storage system which if it is not equipped to handle, results in unacceptably high boot times for VDI VMs.
How to handle VDI Boot Storms
The obvious way of eliminating the storage bottleneck during VDI Boot Storm is by upgrading the storage to an all flash array. However some All-Flash arrays are better than others when it comes to alleviating the Boot Storm issue. The cheaper arrays lack beefier x86 processors (controllers); or don’t use persistent memory / RAM on the controller; exhibit high latency in their controller firmware.
To paper over the difference in performance of different all-flash arrays, the VMware Horizon and Citrix XenDesktop provide host side RAM caching capabilities.
The Horizon Storage Accelerator feature of VMware Horizon provides support for caching only read data that is shared across all Horizon View VDI VMs (common blocks), which means that only a subset of reads from master / replica VM in Horizon is cached. It does not provide support for caching VM Writes. It is also capped at 32 GB RAM per ESXi server. The details of the Horizon Storage Accelerator feature and how it compares with Virtunet Systems VirtuCache Caching solution can be read here.
The Citrix MCS Storage Optimization feature for the Citrix XenDesktop VDI solution caches only VM writes and to VM RAM only. Though this feature does not affect booting times of virtual desktops, it does improve the write performance of a virtual desktop after it boots. It also suffers from the possibility of VM data loss or corruption / instability if the host fails or the RAM cache is full. The details of Citrix MCS Storage Optimization feature and how it compares with the VirtuCache Caching solution can be read here.
VirtuCache solves this problem by automatically caching frequently and recently used VM data (reads and writes) from SAN based storage to in-VMware host RAM or SSD. Subsequently, VirtuCache services more and more storage IO from this in-host cache media, instead of the IO requests traversing the storage network and served by the storage appliance. This alleviates the load on the storage infrastructure and improves VM level latencies considerably.
In case of linked clones, VirtuCache recognizes that storage blocks residing in the original replica need to be cached only once even though they are accessed from multiple virtual desktop VMs, while at the same time recognizing that changed blocks in case of persistent linked clones would require different cache copies for only those blocks which contain the desktop VM specific changes.
In case of non-persistent VDI using instant clones, the use of VirtuCache can lead to even faster boot times and more responsive VMs using the ‘Write-back no replica’ caching policy. Typically for server VMs, ‘Write-back one replica’ caching policy is recommended so that in case of abrupt host failure, the VM write data is still preserved correctly through replication (mirroring) of write cache across ESXi hosts. However, in case of non-persistent VDI using instant clones, one doesn’t need to worry about preserving write data in case of host failure since write data is anyway not preserved. Hence the ‘Write-back no replica’ caching policy can be used which further reduces VM latencies as it avoids the incremental network latency required to replicate the write data to other hosts in the VMware Cluster.
VirtuCache is easy to deploy and manage. It can be installed while production VMs are running on the host, without requiring maintenance mode. Details of how the VirtuCache software improves VDI performance can be read here.
VirtuCache is much superior to other host side caching software and built-in caching features in XenDesktop or Horizon, so it’s not fair to compare us to these other software. The only meaningful competition are all-flash arrays. Now if you use host RAM with VirtuCache, VirtuCache would be higher performing than even all flash arrays and at a fraction of their cost.