Considering that a wide range of high speed media - SATA/SAS/NVME SSDs, RAM, and the newer NVDIMM, can be used for caching or server side storage in VMware, our preferred in-host media by far is an enterprise grade NVME SSD. The specific SSD I recommend (as of 2019) is the Intel P4600 (if you have a spare PCIe slot in the host) or the Intel P4610 (if you have a spare U.2 NVME slot).
Not very well know is the fact that some enterprise grade NVME SSDs, like the Intel P4600, come in conventional PCIe form factor. So they can be installed in older servers that have an x4 (or wider) PCIe slot (more on that later in this article).
MSA storage arrays from HP are very popular in small/medium size businesses. Over 500,000 arrays have been sold worldwide 1. They are possibly the cheapest enterprise grade arrays from a big brand OEM. They have High Availability features like dual controllers, four NIC ports per controller, RAID, and Erasure Coding. They scale to 840TBs of storage per array. The only area they fall short versus more expensive arrays, is performance. For instance, even the hybrid MSAs don’t cache writes to SSDs 2, they only cache reads. This article has more details on the different ways VirtuCache improves the performance of all-hard drive, hybrid, and all-flash MSA.
Why are Write Latencies High Even When the EMC Unity Appliance has Plenty of System Cache Memory and Fast Cache SSDs?
If you are experiencing high write latencies in VMs when using an EMC Unity Hybrid array, it could be because the caching and tiering solutions in Unity don’t cache random small block writes, and your applications might be generating large volumes of such storage IO.
High Speed Storage is Back in the VMware Host with Hyper Converged Infrastructure and Host Side Caching, but the similarities end there….
The main advantage that hyper-converged infrastructure(HCI) has over traditional converged infrastructure (separate hardware for compute and storage) is that HCI has put high speed storage back in the compute nodes. This is also true for Host Side Caching software, with the added benefit that host side caching maintains the flexibility that Converged Infrastructure (CI) always had over HCI, that of being able to scale and do maintenance on compute and storage hardware independently of each other.
Other pros of host side caching + converged versus hyper-converged infrastructure are listed below.
Storage IO path in VSAN and VirtuCache are similar to a large extent, since both service storage IO from in-VMware host media. Though with VirtuCache, storage latencies are lower than with VSAN for four reasons:
Reads are almost always serviced from local cache media in VirtuCache. In VSAN there is a high chance that all reads might be serviced over the network from another host;
In addition to SSD, with VirtuCache you can cache to RAM which is the highest performing media there is, something that's not possible in VSAN;
Write cache flush rate will typically be higher for backend storage array than for locally attached storage. As a result write latencies will be lower with VirtuCache, because its flushing writes to SAN array;
VirtuCache is block based, VSAN is object based;
VMware will discontinue VFRC starting in ESXi 7.0 to be released in Q4 2019. 0
Despite the end-of-life announcement for VFRC, if you still want to review the differences between VFRC and VirtuCache, below are the three most important ones.
We cache reads and writes, VMware's VFRC caches only reads. Caching writes improves the performance of not only writes, but also of reads.1
We require no ongoing administration. Caching in our case is fully automated, and all Vmware features are seamlessly supported. Versus VFRC that requires administrator intervention when doing vmotion, for creating a new VM, for maintenance mode, for VM restore from backup, requires knowledge of application block size, requires SSD capacity assignment per vdisk. Many other tasks require admin oversight as well.
We provide easy to understand VM, cache, network, and storage appliance level metrics for throughput, IOPS, and latencies, and alerting to forewarn of failure events. VFRC doesn't.
Below is a longer list of differences, cross-referenced with VMware authored content:
The big difference between the two is that VSA caches only 2GB of reads from the Master VM1,2. VirtuCache caches reads + writes from all server & desktop VMs, and it can cache to TBs of in-host SSD/RAM, so that all storage IO is serviced from in-host cache.
More details in the table below.
SSDs deployed for caching in CEPH OSD servers are not very effective. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run user applications) of where the IO bottleneck is. This post looks at this performance shortcoming of CEPH and its solution.
There are two options for improving the performance of CEPH.
Option 1 is to deploy SSDs in CEPH OSD servers for journaling (write caching) and read caching.
Option 2 is to deploy SSDs and host side caching software in the VMware hosts (that connect to CEPH over iSCSI). The host side caching software then automatically caches reads and writes to the in-VMware host SSD from VMware Datastores created on CEPH volumes.
Below are reasons for why we recommend that you go with Option 2.
How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?
In terms of price/performance, enterprise NVME SSDs have now become the best choice for in-VMware host caching media. They are higher performing and cost just a little more than their lower performing SATA counterparts. The Intel P4600/P4610 NVME SSDs are my favorites. If you don’t have a spare 2.5” NVME or PCIe slot in your ESXi host, which precludes you from using NVME SSDs, you could use enterprise SATA SSDs. If you choose to go with SATA SSDs, you will also need a high queue depth RAID controller in the ESXi host. In enterprise SATA SSD category, the Intel S4600/S4610 or Samsung SM863a are good choices. If you don't have a spare PCIe, NVME, SATA, or SAS slot in the host, then the only choice is to use the much more expensive but higher performing host RAM as cache media.
This blog article will cover the below topics.
- Few good SSDs and their performance characteristics.
- Write IOPS rating and lifetime endurance of SSDs.
- Sizing the SSD.
- How many SSDs are needed in a VMware host and across the VMware cluster?
- In case of SATA SSDs, the need to RAID0 the SSD.
- Queue Depths.
- Where to buy SSDs?
CEPH is a great choice for deploying large amounts of storage. It's biggest drawbacks are high storage latencies and the difficulty of making it work for VMware hosts.
The Advantages of CEPH.
CEPH can be installed on any ordinary servers. It clusters these servers together and presents this cluster of servers as an iSCSI target. Clustering (of servers) is a key feature so CEPH can sustain component failures without causing a storage outage and also to scale capacity linearly by simply hot adding servers to the cluster. You can build CEPH storage with off the shelf components - servers, SSDs, HDDs, NICs, essentially any commodity server or server components. There is no vendor lock-in for hardware. As a result, hardware costs are low. All in all, it offers better reliability and deployment flexibility at a lower cost than big brand storage appliances.
CEPH has Two Drawbacks - High Storage Latencies and Difficulty Connecting to VMware.