- ESXi 6.7 on 4 host cluster.
- Datastores on EMC Unity 300 array connected to ESXi using 10gbps iSCSI.
- VirtuCache was deployed on each host in the cluster, caching to one Intel P4600 2TB PCIe SSD in each host.
- All Datastores on the EMC Unity array were cached with 'Write Back 1 Replica' caching policy. This policy caches reads and writes and mirrors the write cache to cache media in another host. VirtuCache mirrors the write cache to protect against data loss if a host fails.
- To ensure low latency writes, the VMware network that was configured for VirtuCache write replication was a 10gbps network with Jumbo Frames, LACP and LLDP.
- VirtuCache ensured that VM write latencies were consistently under 5ms for the customer's write intensive applications.
The Virtunet Difference
The biggest drawback of the caching functionality in EMC Unity hybrid arrays is that both the SSD layer (FastCache) and memory layer (System Cache) don’t cache small block random writes, so if your workload is write intensive then you might have performance issues even if you have large amounts of system RAM or Fast Cache SSDs in your EMC Unity array.
Reasons for poor random write performance of EMC Unity array.
Page 31 of this link lists two things. Firstly it says that FastCache caches only reads, so random writes will not be cached with FastCache. Secondly, the System Cache (EMC Unity’s RAM cache) does cache writes, but it doesn’t cache small block random writes. EMC has a separate tiering technology, called Fast VP. But tiering by definition moves frequently accessed data (reads) to a faster tier, so its of no use for random writes as well. So random writes in small block sizes miss both the memory and SSD layers in EMC Unity. So adding more RAM or SSD to an EMC Unity array will not help if your performance issues are due to large volume of small block random writes.
In VMware, storage IO is mostly random due to the ‘IO Blender’ problem. We won’t get into the details of what ‘IO blender’ is in this article, since it’s well documented on the web. If you run traditional IT apps like ERP, MS Exchange, file server, VDI, Application Virtualization, etc. in Windows VMs, then the application block sizes are 8KB or less. So typical IT workloads running in Windows VM use small block size, are largely random, and with not much locality either since different VMs are writing to different areas (sectors) of the array. Now if such workload is also write intensive, then the various caching and tiering technologies in EMC Unity arrays might not be able to deliver the storage performance that you require. Conversely, the reason why EMC Unity hybrid appliances do work well for many small to medium sized businesses (SMB) is because most SMB IT workloads are read intensive, and so just accelerating reads suffices.
How VirtuCache fixes this issue with random writes?
VirtuCache caches frequently used reads and all writes to in-host SSD or RAM. Whenever a block is read by a VM from SAN array, VirtuCache copies it to the in-host cache media (SSD/RAM). If the same block is read again, it is now read from in-host cache. Regarding writes, all writes (whether random or sequential) from VMs are written to in-host cache media, without synchronously being committed to the backend array, and so writes are accelerated as well.
For more details on VirtuCache technical architecture and how it caches reads and writes; syncs writes to backend storage appliance; and protects against data loss in case of host failure, click on this link.
Why would VirtuCache be able to cache small block random writes but not EMC Unity?
Of all type of storage IO that puts stress on storage systems, the worst kind is highly random, small block size, write IO. Firstly, because block size is small, a VM can emit large amounts of small block write IO quickly (compared to large block size IO), and if number of VMs on the host are high, then the total volume of small block write IO will add up quickly. Secondly, whether the block is 1MB or 4KB, the same number of CPU cycles are expended processing the block. Add to it the fact that the IO is random which then further aggravates CPU usage since large amount of the metadata index has to be scanned to read / write random blocks. As a result, a lot of CPU cycles are required to process random small block write IO. Now VirtuCache uses ESXi host CPUs for caching operations. Each ESXi host typically has two or more CPUs, with each CPU typically having 16 or more cores, now multiply this by the number of hosts that VirtuCache is installed on. As you can see VirtuCache has access to large amounts of CPU. This is not the case with a storage appliance. A storage appliance typically has two controllers, each with one or two 4/8 core count processors. As a result storage appliance based caching has access to fewer CPUs. So small block random write IO is better handled with VirtuCache than with any storage appliance caching functionality, just because VirtuCache can throw a lot more CPU at the problem than any storage appliance can. This argument also holds true for mid market all-flash arrays.
Cache media closer to the host CPU with VirtuCache.
Then there is the standard VirtuCache advantage versus storage appliance based caching, that the cache media in the case of VirtuCache is closer to host CPU that consumes hot data versus SSDs in the storage appliance are behind the storage controllers and network, thus the in-host SSDs will perform better than if the same SSDs were in the storage appliance.
Signup for the VirtunetSystems Newsletter