Archive: Posts

How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?

In terms of price/performance, enterprise NVME SSDs have now become the best choice for in-VMware host caching media. They are higher performing and cost just a little bit more than their lower performing SATA counterparts. The Intel P4600/P4610 NVME SSDs are my favorites. The Samsung PM1725a is my second choice. If you don’t have a spare 2.5” NVME or PCIe slot in your ESXi host, which precludes you from using NVME SSDs, you could use enterprise SATA SSDs. If you choose to go with SATA SSDs, you will also need a high queue depth RAID controller in the ESXi host. In enterprise SATA SSD category, the Intel S4600/S4610 or Samsung SM863a are good choices. If you don’t have a spare PCIe, NVME, SATA, or SAS slot in the host, then the only choice is to use the much more expensive and higher performing host RAM as cache media.

This blog article will cover the below topics.

– Write IOPS rating and lifetime endurance of SSDs.

– Sizing the SSD.

– How many SSDs are needed in a VMware host and across the VMware cluster?

– In case of SATA SSDs, the need to RAID0 the SSD.

– Queue Depths.

– Where to buy SSDs?

Question – What problem are you trying to solve?

You have no storage performance issues if you have less than 10 millisecond(ms) latencies at the VM level at all times. So no need for you to go through this article if you don’t have this problem. :-)

There are two types of storage performance problems in the VMware world. If the aggregate peak storage MBps from all VMs on a single ESXi host is upwards of 100MBps, then the high latencies you are experiencing are a direct consequence of high throughput. However such high throughput is rare and so it’s not often that we run into this problem at customers. The more prevalent problem is when you experience high VM latencies (> 20ms peak latencies at the VM level) even at low throughput (say < 1MBps peak throughput at the VM level).

Both these problems can be solved by host side caching software caching to in-Vmware host SSD. If you are experiencing high latencies at low throughput then either a SATA or NVME SSD will work. Now if you require low latencies at very high storage throughput (say > 100MBps storage IO per host), then definitely go with write intensive enterprise NVME SSD, and not with SATA SSDs. More on SSD selection in later sections.

Selection Criteria for SSDs – Endurance and Random Write IOPS ratings.

Write IOPS rating for random 4KB block size is the single most important parameter to select an in-VMware host SSD. This is because most storage IO from within VMware is random and at small block size. All enterprise SSDs do well on reads, but its only a few that do well on writes for small block (4KB) random workload. Also its almost always the case that a SSD with a higher random write IOPS rating is also higher performing on random read IOPS. Lastly, since VirtuCache accelerates both reads and writes, we pay closer attention to the write IOPS specs for the SSD.

The next most important parameter is endurance of the SSD. Endurance is measured in terms of total amount of lifetime writes in petabytes that the SSD OEM warrants the SSD for. Since all enterprise SSDs are warranted for 5 years, this parameter is either expressed in petabytes written over 5 years or in a parameter called DWPD (Drive Writes Per Day). DWPD is the number of times the entire capacity of the drive can be written to on a daily basis and warranted by the OEM for 5 years. Host side caching involves continuously replacing older, less frequently used data with newer, more frequently used data, and both deletes and new writes are write operations, and so you need a high endurance SSD. The SSD OEM warrants the SSD for the earlier of 5 years or when the write endurance limit of the SSD is reached.

My favorite SSDs are the Intel P4600 or P4610 NVME SSDs, and a second choice, which is cheaper, is the Samsung P1725a NVME SSDs. NVME SSDs come in a 2.5” form factor or a PCIe form factor. If you don’t have a 2.5” NVME slot or a PCIe slot, then you could go with the Intel S4600 / S4610 SATA SSD or the Samsung SM863a SATA SSD. However if you decide to go with SATA SSD, ensure that the RAID controller in the host is high queue depth. If you don’t have a spare PCIe, NVME, SATA, or SAS slot in the host, then the only choice is to use the much more expensive and higher performing host RAM as cache media.

Below is a table that compares key metrics for these SSDs and host RAM as tested by us, for storage IO generated within a VMware VM for 100% random 100% read tests using 4KB block size, and where VirtuCache was caching the entire Iometer test file to in-VMware host caching media (100% cache hit ratio).

In-VMware Host Cache Media

Read Throughput (MBps)

Read Latency (ms)

Cost $/GB (in 2018)

Endurance (Petabyte Writes)

Standard Deviation for Latencies

Host RAM

630

0.4

$8

Not a concern. Very High.

Very low.

Intel P4600 NVME 2TB

400

0.5

$0.80

11

Very low.

Samsung PM1725a NVME 1.6TB

250

1

$0.60

14.6

High.

Intel S4600 SATA 1.9TB

120

5

$0.50

10.8

Low.

Sammy SM863a SATA 1.9TB

100

8

$0.40

6.2

High.

What about SAS SSDs?

SAS SSDs tout better error control and lower failure rates versus other SSDs. For the purposes of VirtuCache where we implement error control techniques within VirtuCache to ensure lower SSD failure rates, SAS SSDs do not add additional value versus the cheaper SATA or NVME SSDs.

Consumer SSDs are even cheaper and perform well, so why not consumer SSDs?

First of all, consumer SSDs have low endurance. Most consumer SSDs are warranted for 3 years and for lifetime writes of less than 100TB.

Secondly, by looking at IOPS rating for some consumer SSDs, you might get the impression that they are higher performing than enterprise SSDs. However in a VMware environment, you are better served by lower latencies and low standard deviation for latencies than simply comparing IOPS ratings across SSDs. Unfortunately, SSD OEMs don’t list latencies, they only list IOPS or MBps throughput. For instance the Samsung 860Pro is higher throughput / IOPS than the Samsung SM863a, but the SM863a is much lower latency than the 860Pro and far more consistent as well.

Where should I buy SSDs from?

You can buy host side SSDs from your server vendor or from retailers like Amazon.com etc. The SSD costs much less if bought from a retailer than if the same SSD was bought from the server vendor. The retailers also pass through the SSD OEM warranty of 5 years versus the same SSD when rebranded by the server vendor is now warranted by the server vendor for only 3 years. Also Amazon.com and other retailers sell the latest SSD. The server vendors sell SSDs that are a year or so old, since the qualification cycles at the server vendor are that long. SSD technology is evolving at a rapid clip, so SSDs that are a year old are lower performing than more recent models. The one “advantage” with a server vendor branded SSD is that it does make the server management console light go green vs. the same SSD you buy from Amazon.com might not.

What size SSD?

My rule of thumb for generic IT workloads is that 20% of media serves 80% of storage requests. So 20% of storage used by all VMs on that host should be the SSD capacity for your in-VMware host SSD. While evaluating VirtuCache, (using VirtuCache stats screen) if you notice that the cache hit ratio is low and the SSD is full, you should increase the SSD capacity to get to > 80% cache hit ratio. You could do that by replacing your existing SSD with a single higher capacity SSD (preferred) or getting two smaller but equal size SSDs and creating a RAID 0 array across the two SSDs. Please note that (as of October 2018), you can only RAID0 SATA SSDs using the in-host RAID controller. RAID controllers for NVME SSDs are not generally available yet.

How many SSDs do you need?

With Virtucache, if you are caching only reads, then you need only one SSD per host and only for those hosts that need to be accelerated. If you are caching writes as well, you will need one SSD per host and for all the hosts in the VMware cluster. This is because in case of write caching, VirtuCache commits a write to the local SSD and a copy of that same write is also synchronously copied to another SSD in another host in the same ESXi cluster. Writes are mirrored across two hosts in this fashion to protect against data loss in case of host failure.

For SATA SSDs, do I need to RAID SSDs?

This section is not applicable to NVME SSDs.

Ideally you should RAID the SATA SSD in each host but not for the conventional reason of protecting data. You need to RAID-0 the SATA SSD to assign the SSD a higher Queue Depth than what the default VMware SATA driver is capable of assigning to it. By assigning the SATA SSD a higher Queue Depth, larger number of requests can be processed by the SSD, thus improving throughput and reducing latencies. A higher Queue Depth than what is possible by the default VMware SATA driver can only be assigned to the SATA SSD in this fashion.

You don’t need to do RAID-1 or a higher RAID level, since for read cache, the reads are always kept in sync between local SSD and the storage array at all times. And in the case of write caching, VirtuCache protects the writes on the SSD by mirroring writes over the network to one more SSD in one more host. So even if a host were to fail, you don’t lose any data. In fact we recommend against RAID-1 or higher RAID levels for the in-host SSD since they deteriorate SSD performance substantially.

Why is Queue Depth so important?

Queue Depth is a number assigned by the storage IO device vendor to the device that advertises to the component above the device in the storage IO path, the maximum number of IO requests the device can process in parallel. Every device or software component in the storage IO path has a Queue Depth associated with it. IO requests that are sent to that device, that are in excess of the Queue Depth, get queued. You don’t want any queueing on any device in the storage IO path, else the latencies go up. A higher Queue Depth means that the device is lower latency and generally higher performing.

If using SATA SSDs, please check the Queue Depth of the SSD device and the RAID controller. Esxtop command in ESXi tells you the Adapter Queue Depth (field called AQLEN) for the RAID controller and Disk Queue Depth (field called DQLEN) for the RAID0 SSD device. In the case of cheap (low Queue Depth) RAID controllers, the RAID controller becomes a bottleneck to the RAID0 SSD stuck behind it, hence it is very important that both the AQLEN and DQLEN be greater than 512 for the RAID0 SSD device and the RAID controller.

In the case of NVME SSDs, Queue Depths are almost always greater than 900. But it’s always a good idea to confirm the queue depth (AQLEN and DQLEN) of the NVME SSD regardless.

Summary

-Use enterprise SSDs not consumer.

-Use NVME SSD if you have a spare PCIe slot or 2.5” NVME slot in your server. If you don’t have any of these slots, use SATA SSD behind a high queue depth RAID controller. If you don’t have a spare SATA or SAS slot even, then the only choice is to use some amount of host RAM as cache media.

-If using SATA SSD, RAID the SSD as RAID 0 and use a RAID controller with a Queue Depth higher than 512, else the RAID controller becomes the bottleneck. For NVME SSD, Queue Depths are not a concern.

-For accelerating reads + writes, you need one SSD per host for every host in the ESXi cluster.

-For accelerating only reads, SSD is needed for only those hosts needing acceleration.

-RAID 1 or higher for SSD is not recommended.

Disclaimer: Author and Virtunet has no affiliation with Samsung, Intel or any other SSD OEM. There was no monetary compensation made or free SSD samples sent to the author or Virtunet from Samsung or Intel.