How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?

How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?

In terms of price/performance, enterprise NVME SSDs have now become the best choice for in-VMware host caching media. They are higher performing and cost just a little more than their lower performing SATA counterparts. The Samsung PM1725b (in PCIe form factor) and the Intel P4610 (2.5″ U.2 form factor) NVME SSDs are my favorites. If you don’t have a spare U.2 NVME slot or a conventional PCIe slot in your ESXi host, which precludes you from using NVME SSDs, you could use enterprise SATA SSDs. If you choose to go with SATA SSDs, you will also need a high queue depth RAID controller in the ESXi host. In the enterprise SATA SSD category, the Intel S4610 or Samsung SM883 are good choices. If you don’t have any slots in the host to install an SSD, then the only choice is to use the more expensive but higher performing host RAM as cache media.

This blog article will cover the below topics.

– Few good SSDs and their performance characteristics.

– Write IOPS rating and lifetime endurance of SSDs.

– Sizing the SSD.

– How many SSDs are needed in a VMware host and across the VMware cluster?

– In the case of SATA / SAS SSDs, the need to RAID0 the SSD.

– Importance of Queue Depths.

– Where to buy SSDs?

Question – What problem are you trying to solve?

You have no storage performance issues if you have less than 5 milliseconds (ms) latency at the VM level at all times. So there’s no need for you to go through this article if you don’t have this problem. 🙂

There are two types of storage performance problems in the VMware world. If the aggregate peak storage MBps from all VMs on a single ESXi host is greater than 100MBps, then the high latencies you are experiencing are a direct consequence of high throughput. However such high throughput is rare and so it’s not often that we run into this problem at customers. The more prevalent problem is when you experience high VM latencies (> 20ms peak latencies) even at low throughput (say < 1MBps peak throughput). Please note that you want to track VM level stats and not appliance or datastore level stats.

Both these problems can be solved by host side caching software caching to in-VMware host SSD. If you are experiencing high latency at low throughput then either a SATA or NVME SSD will work. Now if you require low latencies at very high storage throughput (say > 100MBps storage IO per host), then definitely go with write-intensive enterprise NVME SSDs, and not with SATA SSDs. More on SSD selection in later sections.

Selection Criteria for SSDs – Endurance and Random Write IOPS ratings.

Write IOPS rating for random 4KB block size is the single most important parameter to select an in-VMware host SSD. This is because most storage IO from within VMware is random and uses small block sizes. All enterprise SSDs do well on reads, but it’s only a few that do well on writes for small block (4KB) random workload. Also its almost always the case that an SSD with higher random write IOPS rating is also higher performing on random read IOPS. Lastly, since VirtuCache accelerates both reads and writes, we pay closer attention to the write IOPS specs for the SSD.

The next most important parameter is the endurance of the SSD. Endurance is measured in terms of the total amount of lifetime writes in petabytes that the SSD OEM warrants the SSD for. Since all enterprise SSDs are warranted for 5 years, this parameter is either expressed in petabytes written over 5 years or in a parameter called DWPD (Drive Writes Per Day). DWPD is the number of times the entire capacity of the drive can be written to on a daily basis and warranted by the OEM for 5 years. Host side caching involves continuously replacing older, less frequently used data with newer, more frequently used data, and both deletes and new writes are write operations, and so you need a high endurance SSD. The SSD OEM warrants the SSD for the earlier of 5 years or when the write endurance limit of the SSD is reached.

My favorite SSDs are the Samsung PM1725b NVME SSD (in PCIe x8 form factor) or the Intel P4610 NVME SSD (in 2.5″ U.2 form factor). Datacenter NVME SSDs come in 2.5” U.2 form factor or conventional PCIe form factor. If your server doesn’t have either of these slots, but it has a spare SATA / SAS slot then you could go with the Intel S4610 SATA SSD or the Samsung SM883 SATA SSD. However, if you decide to go with SATA SSDs, ensure that the RAID controller in the host is high queue depth. If you don’t have any slot in the server for an SSD, then the only choice is to use the more expensive and higher performing host RAM as cache media.

Below is a table that compares key metrics for these SSDs and host RAM as tested by us, for storage IO generated within a VMware VM for 100% random 100% read tests using 4KB block size, and where VirtuCache was caching the entire Iometer test file to in-VMware host caching media (100% cache hit ratio).

In-VMware Host Cache Media

Read Throughput (MBps)

Read Latency (ms)

Cost $/GB

(in 2020)

Endurance (Petabyte Writes)

Standard Deviation for Latencies

Host RAM

630

0.4

$7

Not a concern. Very High.

Very low.

Intel P4610 NVME 1.6TB, U.2 form factor

400

0.5

$0.40

12.3

Very low.

Samsung PM1725b NVME 1.6TB, PCIe form factor

300

0.7

$0.40

14.6

Low.

Intel S4610 SATA 1.9TB

120

5

$0.20

10.8

Low.

Samsung SM883 SATA 1.9TB

110

6

$0.20

10.5

High.

What about SAS SSDs?

SAS SSDs tout better error control and lower failure rates versus other SSDs. For the purpose of VirtuCache where we implement error control techniques within VirtuCache to ensure lower SSD failure rates, SAS SSDs do not add additional value versus the cheaper SATA or NVME SSDs.

Consumer SSDs are even cheaper and have a high IOPS rating, so why not consumer SSDs?

First of all, consumer SSDs have low endurance (less than 100TB in most cases). Also, consumer SSDs are warranted for 3 years and not 5 years like their enterprise counterparts.

Secondly, by looking at the IOPS rating for some consumer SSDs, you might get the impression that they are higher performing than enterprise SSDs. However, in a VMware environment, you are better served by lower latencies and low standard deviation for latencies than simply comparing IOPS ratings across SSDs. Unfortunately, SSD OEMs don’t list latencies, they only list IOPS or MBps throughput. For instance, the Samsung PM871b (consumer SSD) is higher throughput / IOPS than the Samsung SM883, but the SM883 is much lower latency than the PM871b and far more consistent (low standard deviation for latencies) as well.

Where should I buy SSDs from?

You can buy host side SSDs from your server vendor or from retailers like Amazon.com, CDW, Newegg, etc. The SSD costs much less if bought from a retailer than if the same SSD was bought from the server vendor. The retailers also pass through the SSD OEM warranty of 5 years versus the same SSD when rebranded by the server vendor is now warranted by the server vendor for only 3 years. The one “advantage” with a server vendor branded SSD is that it does make the server management console light go green vs. the same SSD you buy from Amazon.com might not.

What size SSD?

My rule of thumb for generic IT workloads is that 10-20% of media serves 80-90% of storage requests. So 20% of storage used by all VMs on that host should be the SSD capacity. While evaluating VirtuCache, (using VirtuCache stats screen) if you notice that the cache hit ratio is low and the SSD is full, you should increase the SSD capacity to get to > 90% cache hit ratio. You could do that by replacing your existing SSD with a single higher capacity SSD (preferred) or getting two smaller but equal size SSDs and creating a RAID 0 array across the two SSDs.

Please note that as of July 2020, I wouldn’t recommend using RAID controllers for NVME SSDs. First of all, RAID controllers for NVME SSDs are rarely fitted in servers. This is because a single NVME SSD is now 8TBs large and NVME SSDs are also quite fast. Hence RAID is not needed for NVME SSDs for both capacity and performance reasons. Secondly, the current generation of NVME RAID controllers is not very good. They increase the latencies of the underlying NVME SSDs quite a bit.

How many SSDs do you need?

With VirtuCache, if you are caching only reads (Write-Through Caching), then you need only one SSD per host and only for those hosts that need to be accelerated. If you are caching writes as well (Write-Back Caching), you will need one SSD per host and for all the hosts in the VMware cluster. This is because in case of write caching, VirtuCache commits a write to the local SSD and a copy of that same write is also synchronously copied to another SSD in another host in the same ESXi cluster. Writes are mirrored across two hosts in this fashion to protect against data loss in case of a host failure.

Also, I don’t recommend that you use multiple SSDs in a single host, instead use a single SSD. This is because a RAID0 array of a single SSD is higher performing than RAID0 array of multiple SSDs.

Why is Queue Depth so important?

Queue Depth is a number assigned by the device vendor to their storage device (or software) that advertises to the component above the device in the storage IO path, the maximum number of IO requests the device (or software) can process in parallel. Every device or software component in the storage IO path has a Queue Depth associated with it. IO requests that are sent to that device, that are in excess of the Queue Depth, get queued. You don’t want any queueing on any device in the storage IO path, else the latencies go up. A higher Queue Depth means that the device is lower latency and generally higher performing.

If using SATA SSDs, please check the Queue Depth of the SSD device and the RAID controller. Esxtop command in ESXi tells you the Adapter Queue Depth (field called AQLEN) for the RAID controller and Disk Queue Depth (field called DQLEN) for the RAID0 SSD device. In the case of cheap (low Queue Depth) RAID controllers, the RAID controller becomes a bottleneck to the RAID0 SSD behind it, hence it is very important that both the AQLEN and DQLEN be greater than 512 for the RAID0 SSD device and the RAID controller.

In the case of NVME SSDs, Queue Depths are almost always greater than 900. But it’s always a good idea to confirm the queue depth (AQLEN and DQLEN) of the NVME SSD regardless.

For A SATA/SAS SSD, do I need to RAID THE SSD?

This section is not applicable to NVME SSDs.

Yes, you should RAID the SATA/SAS SSD in each host but not for the conventional reason of protecting data. You need to RAID-0 the SATA / SAS SSD to assign the SSD a higher Queue Depth than what the default VMware SATA driver is capable of assigning to it. By assigning the SATA SSD a higher Queue Depth, a larger number of requests can be processed by the SSD, thus improving throughput and reducing latencies. A higher Queue Depth than what is possible by the default VMware SATA driver can only be assigned to the SATA SSD in this fashion.

You shouldn’t do RAID-1 or a higher RAID level since VirtuCache takes care of protecting against data loss if an SSD or host fails. For read cache, the reads are always kept in sync between local SSD and the storage array at all times. And in the case of write caching, VirtuCache protects the writes on the SSD by mirroring writes over the network to an SSD in another host. So even if a host were to fail, you don’t lose any data. Also, having multiple SSDs in RAID1 (or higher RAID level) deteriorates SSD latencies considerably.

Summary

-Use enterprise SSDs, not consumer.

-Use NVME SSD if you have a spare PCIe slot or U.2 NVME slot in your server. If you don’t have these slots, use a SATA SSD behind a high queue depth RAID controller. If you don’t have a spare SATA or SAS slot even, then the only choice is to use some amount of host RAM as cache media.

-If using SATA / SAS SSD, make sure that the RAID controller has a Queue Depth higher than 512. For NVME SSD, Queue Depth is not a concern.

-For accelerating reads + writes, you need cache media (host RAM / SSD) in every host in the ESXi cluster. For accelerating only reads, cache media is needed for only those hosts needing acceleration.

-If using SATA / SAS SSD, RAID0 a single SSD. Do not RAID1 (or use higher RAID levels) multiple SSDs. If using NVME SSD, the topic of RAID is not relevant. 

– Though you can use multiple SSDs in a host, It’s preferable to use only one SSD.

Disclaimer: Author and Virtunet have no affiliation with Samsung, Intel, or any other SSD OEM. There was no monetary compensation made or free SSD samples sent to the author or Virtunet from Samsung or Intel.

Download Trial Contact Us