Storage IO path in VSAN versus VirtuCache Host Side Caching.
Storage IO path in VSAN and VirtuCache is similar to a large extent since both service storage IO from in-VMware host media. Though with VirtuCache, storage latency is lower than with VSAN for four reasons:
-
Reads are always serviced from local cache media in VirtuCache. In VSAN there is a high chance that all reads might be serviced over the network from another host;
-
In addition to SSD, with VirtuCache you can cache to RAM which is the highest performing media there is, something that’s not possible in VSAN;
-
Write cache flush rate will typically be higher for backend storage array than for locally attached storage. As a result write latencies will be lower with VirtuCache, because its flushing writes to SAN array;
-
VirtuCache is block based, VSAN is object based. Block based storage is lower latency than object based.
VSAN clusters together locally attached storage (SSD/HDD) in ESXi hosts and presents it to VMs as shared storage. VirtuCache clusters together locally attached high speed storage (SSD/RAM) in ESXi hosts and presents it as shared cache to any SAN storage appliance. Since in both VSAN and VirtuCache, locally attached storage is used to service storage IO, storage IO path is similar to a large extent.
NB: In the below section, cache media in case of VSAN refers to SSD, whereas in case of VirtuCache it refers to SSD or RAM since both can be used as cache media in VirtuCache, but only SSDs can be used in VSAN.
Read latencY in VSAN is higher than in VirtuCache because of additional network overhead in VSAN.
Regarding read IO path, when a VM reads a block for the first time, VMware gets it from the SAN array. At the same time, VirtuCache copies it to the local cache media. The next time the same block is read, VirtuCache delivers it from local cache media. If you have a large enough SSD, all reads would come from the locally attached SSD. When a VM does vmotion, read cache follows the VM. So in all cases, reads are almost always coming from SSD or RAM of the host where the VM resides. In VSAN, VM data stays on the locally attached SSD of the host where it was first written to, despite VM vmotions. So, the chances are high that data will be read over the network from SSD on another host.1
Thus there is additional network overhead in servicing reads in VSAN versus VirtuCache.
For high throughput bursty writes, VirtuCache shows lower latencY than VSAN because it takes advantage of higher flush rates to SAN.
In VirtuCache, write acknowledgment is sent back to the VM (that’s doing the writes) when the data is written simultaneously to cache media local to the host where the VM currently is and a copy of that write is also written to cache media in another host in the same cluster (‘Write-Back’ caching policy). In the case of VSAN as well, writes are written to cache media in two hosts. Both VSAN and VirtuCache make two copies of writes to two different hosts to protect against data loss if a host were to fail. The one difference is that in VSAN none of the write copies might be to local media.2
Also with both VSAN and VirtuCache, writes in cache are flushed to backing storage, which is a SAN array in case of VirtuCache and locally raided SSDs in case of VSAN. Since a SAN array typically can sustain higher write throughput than locally attached SSDs, VirtuCache will be able to flush writes faster and hence VirtuCache exhibits lower write latency for high throughput write bursts.
When a host fails, VirtuCache is in degraded state for a maximum of 2 minutes only.
With both VSAN and VirtuCache, if an ESXi host fails, performance is impacted. In the case of VSAN the performance is cut in half since only half the disks are available till all the data from the failed host is rebuilt on another host.3 By default it takes an hour for the rebuild process to start and then depending on the maximum throughput level that the RAIDed disks in VSAN are capable of, it could take from a few hours to a day or more to rebuild storage, till which time VSAN runs in a degraded state. In the case of VirtuCache, performance is degraded during the time that a mirrored copy of write cache is being flushed to the backend SAN array. Because at no point in time does VirtuCache keep more than 2 minutes of write cache in the local cache media, it is in a degraded state for at the most 2 minutes. The 2 minutes are defined in terms of speed of the SAN array. Say the SAN array is capable of absorbing a maximum throughput of 100MBps, so 2 minutes equals 12GB of writes. The reason we went with this design (instead of a flat 70-30 capacity split for reads-writes as is the case with VSAN)4 was because, in any failure situation, we wanted to restore regular caching operation as quickly as possible. So once VirtuCache flushes writes to backend SAN array in 2 minutes or less, the VMs are back in their original cached mode. VSAN in contrast reserves 30% of space in the SSD for write cache, so it will take much longer to flush writes to the backing store and so it will take much longer for VSAN to move out of degraded state.
A BETTER COMPARISON: VSAN VERSUS VIRTUCACHE + SAN ARRAY
To be fair, a more complete comparison for recovery times and features would be for VSAN versus the combination of SAN array + VirtuCache. In VSAN the storage controller is the ESXi host motherboard, whereas, in the VirtuCache + SAN array situation, the storage controller function is split between the backend SAN array controller (for capacity) and ESXi host motherboard (for performance). Since VirtuCache only deals with the performance aspects of storage, for this article I decided to only compare the performance aspects of both by explaining the storage IO path.
Cross References:
(1) Slides 30, 31, and 32 on this post by Duncan Epping explain read IO path, which might be local or over the network from another host. Now read IO path is serviced from local cache only if client cache is enabled, but client cache can only be a maximum of 1GB RAM.
(2) Slide 29 at the above link explains write mirroring in VSAN.
(3) Yellow Bricks post on VSAN and host failure.
(4) Post by VMware on VSAN disk groups.