How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?
Post updated on February 18, 2020.
In terms of price/performance, enterprise NVME SSDs have now become the best choice for in-VMware host caching media. They are higher performing and cost just a little more than their lower performing SATA counterparts. The Intel P4600/P4610 NVME SSDs are my favorites. If you don’t have a spare 2.5” NVME or a conventional PCIe slot in your ESXi host, which precludes you from using NVME SSDs, you could use enterprise SATA SSDs. If you choose to go with SATA SSDs, you will also need a high queue depth RAID controller in the ESXi host. In enterprise SATA SSD category, the Intel S4600/S4610 or Samsung SM863a are good choices. If you don't have a spare PCIe, NVME, SATA, or SAS slot in the host, then the only choice is to use the much more expensive but higher performing host RAM as cache media.This blog article will cover the below topics. - Few good SSDs and their performance characteristics. - Write IOPS rating and lifetime endurance of SSDs. - Sizing the SSD. - How many SSDs are needed in a VMware host and across the VMware cluster? - In case of SATA SSDs, the need to RAID0 the SSD. - Queue Depths. - Where to buy SSDs?
Considering that a wide range of high speed media - SATA / SAS / NVME SSDs, and RAM can be used for server side caching in VMware, my preferred in-host media is an enterprise grade NVME SSD. Below are the SSDs I recommend (as of 2020):
- If you have a conventional PCIe slot in your hosts, my first choice is the Intel P4600 SSD (cost ~ $0.6/GB); and my second choice is Samsung PM1725 (cost ~ $0.4/GB).
- If you have a 2.5" U.2 slot in your hosts, use the Intel P4610 SSD (cost ~ $0.5/GB).
Not very well know is the fact that the Samsung PM1725 and Intel P4600 come in conventional PCIe form factor. So they can be installed in older servers that have a traditional PCIe slot (see table below).
Only read performance is improved by HP MSA's internal caching and tiering features.1 In comparison, VirtuCache improves the performance of reads and writes for MSA, and it does so by caching to in-VMware host SSD or RAM. There are other differences as well.
Why are Write Latencies High Even When the EMC Unity Appliance has Plenty of System Cache Memory and Fast Cache SSDs?
If you are experiencing high write latencies in VMs when using an EMC Unity Hybrid array, it could be because the caching and tiering solutions in Unity don’t cache random small block writes, and your applications might be generating large volumes of such storage IO.
High speed storage is back in the VMware Host with both HCI and Host Side Caching, but the similarities end there.
The main advantage that hyper-converged infrastructure(HCI) has over traditional SAN based infrastructure is that HCI has put high speed storage back in the compute nodes. This is also true for Host Side Caching software, with the added benefit that host side caching maintains the flexibility that SAN based infrastructure always had over HCI, that of being able to scale and do maintenance on compute and storage hardware independently of each other.
Other pros of host side caching versus hyper-converged infrastructure are:
Storage IO path in VSAN and VirtuCache are similar to a large extent, since both service storage IO from in-VMware host media. Though with VirtuCache, storage latencies are lower than with VSAN for four reasons:
Reads are almost always serviced from local cache media in VirtuCache. In VSAN there is a high chance that all reads might be serviced over the network from another host;
In addition to SSD, with VirtuCache you can cache to RAM which is the highest performing media there is, something that's not possible in VSAN;
Write cache flush rate will typically be higher for backend storage array than for locally attached storage. As a result write latencies will be lower with VirtuCache, because its flushing writes to SAN array;
VirtuCache is block based, VSAN is object based;
VMware will discontinue VFRC starting in ESXi 7.0 to be released in Q4 2019. 0
Despite the end-of-life announcement for VFRC, if you still want to review the differences between VFRC and VirtuCache, below are the three most important ones.
We cache reads and writes, VMware's VFRC caches only reads. Caching writes improves the performance of not only writes, but also of reads.1
We require no ongoing administration. Caching in our case is fully automated, and all Vmware features are seamlessly supported. Versus VFRC that requires administrator intervention when doing vmotion, for creating a new VM, for maintenance mode, for VM restore from backup, requires knowledge of application block size, requires SSD capacity assignment per vdisk. Many other tasks require admin oversight as well.
We provide easy to understand VM, cache, network, and storage appliance level metrics for throughput, IOPS, and latencies, and alerting to forewarn of failure events. VFRC doesn't.
Below is a longer list of differences, cross-referenced with VMware authored content:
VirtuCache is ESXi software that automatically caches 'hot' data from any SAN storage to in-host SSD or RAM. By doing so it improves the storage performance of VMware VMs, without requiring you to upgrade your storage appliance or network.
VirtuCache competes and also complements Hybrid MSA's internal Read Caching and Performance Tiering features, and the All-Flash MSA. For instance, the Read Caching and Performance Tiering features in hybrid MSA only improve VM read performance,1 VirtuCache improves the performance of VM reads and writes. Or that VirtuCache helps improve the performance of small block storage IO even for the high-end All-SSD MSA. Here are a few more ways VirtuCache enhances the performance of HPE MSA.
The big difference between the two is that VSA caches only 2GB of reads from the Master VM1,2. VirtuCache caches reads + writes from all server & desktop VMs, and it can cache to TBs of in-host SSD/RAM, so that all storage IO is serviced from in-host cache.
More details in the table below.
SSDs deployed for caching in CEPH OSD servers are not very effective. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run user applications) of where the IO bottleneck is. This post looks at this performance shortcoming of CEPH and its solution.
There are two options for improving the performance of CEPH.
Option 1 is to deploy SSDs in CEPH OSD servers for journaling (write caching) and read caching.
Option 2 is to deploy SSDs and host side caching software in the VMware hosts (that connect to CEPH over iSCSI). The host side caching software then automatically caches reads and writes to the in-VMware host SSD from VMware Datastores created on CEPH volumes.
Below are reasons for why we recommend that you go with Option 2.