VMware Host Side Caching to improve the performance of Dell PowerVault ME4 storage appliance
VirtuCache is ESXi software that automatically caches ‘hot’ data from any SAN storage to in-host SSD or RAM. By doing so it improves the storage performance of VMware VMs, without requiring you to upgrade your storage appliance or network.
VirtuCache is a superior solution to Hybrid Dell ME4’s internal Read Caching and Auto-Tiering features. For instance, the Read Caching and Auto-Tiering features in hybrid ME4 only improve VM read performance,1 VirtuCache improves the performance of VM reads and writes. Or that VirtuCache helps improve the performance of small block storage IO even for the high-end All-SSD ME4. Below are a few more ways VirtuCache enhances the performance of Dell ME4.
-
SSD in the ESXi host will perform better than SSD in the appliance. In the case of VirtuCache, the SSD is right on the motherboard of the VMware host CPU that consumes hot data. Versus in the case of ME4, where the SSD is behind the shared storage network and storage controllers.
-
VirtuCache caches reads and writes, whereas ME4’s Read Caching and Auto-Tiering caches/tiers VM reads only.1
-
You can cache to host RAM or NVME / PCIe SSD with VirtuCache. Both these options are much higher performing than the Seagate SAS SSDs that Hybrid and All-Flash ME4 use.
-
Controller bottleneck results in high latencies for small block IO. Even if you have Dell’s highest performing all-flash ME4 and you are still experiencing high VM latencies, it’s most likely that the ME4 controller is choked because your application is doing large amounts of small block read/write IO. Dell ME4 controllers don’t use x86 processors that are now standard in storage appliances, instead, they use a RAID controller processor, that are lower performing than x86 processors (thus also making the ME4 cheaper than other appliances). This processor chokes with large amounts of small block IO because small block IO is CPU intensive.2 Now VirtuCache uses ESXi host CPUs for caching operations, and not storage appliance processors. As a result, VirtuCache has access to larger amounts of CPU, than the ME4, which in turn makes it very effective in accelerating small block IO.
Cross-references.
1 – Page 6 of this official document on delltechnologies.com mentions that the ME4 has SSD read cache and on line 6 it mentions that this feature doesn’t improve write performance. So there is no ability for the hybrid ME4 to write to SSD.
2 – Three reasons why small block IO is CPU intensive: Firstly, the block size is small, so VMs can issue large amounts of small block IO quickly (compared to large block size IO). Secondly, whether the block is 1MB or 4KB, the same number of storage appliance processor cycles are used to process the block. Now if the IO is random, it further aggravates storage appliance processor usage since large amounts of metadata need to be scanned to read/write random blocks.