Performance Tiering and Read Caching in HPE MSA versus Virtunet VirtuCache.
Only read performance is improved by HP MSA’s internal caching and tiering features.1 In comparison, VirtuCache improves the performance of reads and writes for MSA, and it does so by caching to in-VMware host SSD or RAM. There are other differences as well.
SSD in the ESXi host will perform better than SSD in the appliance. In the case of VirtuCache, the cache media is right on the motherboard of the VMware host CPU that consumes hot data. Versus in the case of storage appliance, where the SSD is behind the shared storage network and storage controllers.
You can cache to host RAM or NVME / PCIe SSD with VirtuCache. Both these options are much higher performing than the Seagate SAS SSDs that MSA’s internal Caching / Tiering uses.
Controller bottleneck results in high latencies for small block IO. HPE uses RAID controller processors even in their highest performing all-flash MSAs. These RAID controller processors are lower performing than x86 CPUs that are now the standard controller processors in storage appliances. RAID controller processors do make the MSA cheaper, but it also causes them to choke with large amounts of small block IO because small block IO is CPU intensive.2 Now VirtuCache uses ESXi host CPUs for caching operations, and not storage appliance CPUs. As a result, VirtuCache has access to larger amounts of CPU, than the MSA, which in turn makes VirtuCache very effective in accelerating small block IO.
VirtuCache can cache to much larger amounts of SSD. HPE MSA can cache to only 4TB SSD per pool. VirtuCache can cache to 6TB SSD per host. Since VirtuCache can cache to much larger SSDs capacities, we aim to service almost all the storage IO from cache.
1 – Pages 5 and 6 of this document (https://h20195.www2.hpe.com/v2/getpdf.aspx/A00015961ENW.pdf?) mention that the MSA has SSD read cache, and 4GB read+write memory cache. So there is no ability for the MSA to write to SSD, it does use the 2GB of RAM on the controller for caching writes, but this is grossly inadequate since your storage utilization is possibly in tens of terabytes and so you’d need cache capacity in TBs to achieve a high cache hit ratio.
2 – Three reasons why small block IO is CPU intensive: Firstly, the block size is small, so VMs can issue large amounts of small block IO quickly (compared to large block size IO). Secondly, whether the block is 1MB or 4KB, the same number of storage appliance processor cycles are used to process the block. Now if the IO is random, it further aggravates storage appliance processor usage since large amounts of metadata need to be scanned to read / write random blocks.