VMware Host Side Caching to improve the performance of HP MSA Storage
MSA storage arrays from HP are quite popular in small/medium size businesses. They are possibly the cheapest appliances from any big brand OEM that have all the enterprise grade features expected of such arrays. Their only drawback versus more expensive arrays is performance. For instance, the hybrid MSAs don’t cache writes to SSDs, they only cache reads1 or that the storage controllers even in the all-flash MSA are lower powered RAID controller processors (and not the beefier x86 processors), thus choking on high throughput low block size IO.
This article has more details on how VirtuCache improves the performance of these arrays.
Quick introduction to VirtuCache software.
VirtuCache is software installed in VMware hosts that automatically caches recently and frequently used data (reads and writes) from any SAN based storage appliance to any in-VMware host SSD (and/or RAM). By doing so we improve the storage performance of VMware VMs, without requiring you to upgrade or replace your existing storage appliance or network.
The arguments below are broken into two sections. The section below describes how VirtuCache differs from storage appliance based caching in general, regardless of the appliance make/model. And the subsequent section has MSA specific differentiators.
VirtuCache versus storage appliance caching (for any appliance, not just MSA).
The same SSD will perform better in the ESXi host than in the storage appliance – In the case of VirtuCache, the cache media is right on the motherboard of the VMware host CPU that consumes hot data, and connected to the host CPU over a dedicated PCIe/NVME bus (if using PCIe / NVME SSD to cache) or memory bus (if using host RAM as cache media). Versus in the case of storage appliances, where the SSDs are behind the shared storage network and storage controllers. Hence the same cache media will work better with VirtuCache than in the storage appliance.
Higher performing than any appliance if you use host RAM with VirtuCache – This is because RAM is the fastest storage media there is. We will be higher performing than any storage appliance in this case, because storage appliances are not designed to use large amounts of RAM in the storage IO path.
Lower Cost – VirtuCache costs $3000 per host. An enterprise grade NVME / PCIe SSD, like the 2TB Intel P4600, that does an impressive 600K / 200K IOPS random small block reads / writes, costs $1200 (2020 prices). As a result, VirtuCache will be higher performing and lower cost than any controller based caching or tiering solution in any appliance, including HP MSA.
Additional VirtuCache differentiators specific to HP MSA.
Controller bottleneck results in high latencies for small block IO – Even if you have HP’s highest performing all-flash MSA and you are still experiencing high VM latencies, it could be that the controller on the MSA is choked because your application is doing large amounts of small block read / write IO. HP MSA controllers don’t use x86 processors that are now standard in storage appliances, instead they use a RAID controller processor, that are much lower performing than x86 processors (thus also making the MSA cheaper than other appliances). This processor chokes with large amounts of small block IO because small block IO is CPU intensive. The reasons why small block IO uses a lot more CPU cycles than large block IO are listed here2. As a result, a lot of storage appliance CPU cycles are required to process random small block IO. Now VirtuCache uses ESXi host CPUs for caching operations, and not storage appliance CPUs. As a result, VirtuCache has access to larger amounts of CPU, than the MSA, which in turn makes it very effective in accelerating small block IO.
Seagate SAS SSDs are used in MSA. With VirtuCache you can use higher performing NVME SSDs, and even higher performing host RAM as cache media.
Internal caching in Hybrid MSA only caches reads to SSDs (not writes).1 VirtuCache caches reads and writes.
1 – Pages 5 and 6 of this document (https://h20195.www2.hpe.com/v2/getpdf.aspx/A00015961ENW.pdf? ) mention that the MSA has SSD read cache, and 4GB read+write memory cache. So there is no ability for the MSA to write to SSD, it does use the 2GB of RAM on the controller for caching writes, but this is grossly inadequate, since your storage utilization is possibly in tens of terabytes, and so 2GB of cache doesn’t suffice.
2 – Three reasons why small block IO is CPU intensive: Firstly, the block size is small, so VMs can issue large amounts of small block IO quickly (compared to large block size IO). Secondly, whether the block is 1MB or 4KB, the same number of storage appliance processor cycles are used to process the block. Now if the IO is random, it further aggravates storage appliance processor usage since large amounts of metadata needs to be scanned to read / write random blocks.