Archive: Posts

How to Select SSDs for Host Side Caching for VMware – Interface, Model, Size, Source and Raid Level ?

A high queue depth RAID controller and an enterprise grade SSD in the ESXi host are the key to a low latency ESXi + storage environment that is also reasonably priced.

This blog article will cover the below topics

– Selecting SSD interface type – SATA, SAS, or PCIe (also called NVME)?

– Sizing the SSD?

– How many SSDs are needed in a VMware host and across the VMware cluster?

– The need to RAID0 the SSD?

– Where to buy the SSD from?

Question – What problem are you trying to solve?

You have no storage performance issues if you have less than 10 millisecond(ms) latencies at the VM level at all times. So no need for you to go through this article if you don’t have this problem. :-)

There are two types of storage performance problems in the VMware world. If the aggregate peak storage MBps from all VMs on a single ESXi host is upwards of 100MBps, then the high latencies you are experiencing are a direct consequence of high throughput. This problem gets compounded further if many ESXi hosts convey such high throughput to the same storage appliance. However such high throughput is rare and so it’s not often that we run into this problem at customers. The more prevalent problem is when you experience high VM latencies (> 20ms peak latencies at the VM level) even at low throughput (say < 1MBps peak throughput at the VM level). In this second situation, any enterprise grade SATA SSD will solve the problem. Even the cheapest SATA SSD shows lower latencies at higher throughput than an entire mid-range HDD based storage appliance. For such an array, each hard drive provides 400 IOPS/disk, and the IOPS rating of the array linearly increases as you add more HDDs typically topping off at 50K IOPS. Now even a cheap enterprise grade SATA SSD (costing ~ 50 US cents/GB) does around 50K IOPS. So a single enterprise grade SATA SSD installed in the VMware host, for the purposes of caching data from such a mid-range storage array is sure to boost the storage performance of the array many times. Now in the rare case that each host is posting very high throughput, you could use a good NVME SSD instead of a SATA SSD.

Selection Criteria for SSDs – Endurance and Random Write IOPS ratings.

Write IOPS rating for random 4KB block size is the single most important parameter to select an in-VMware host SSD. This is because most storage IO from within VMware is random and at small block size. Write IOPS rating is more important than read IOPS rating because caching involves continuously writing new data to the SSD, and evicting old data from it, both of which are write operations.

The next most important parameter is endurance of the SSD. Endurance is measured in terms of total amount of lifetime writes in petabytes that the SSD can sustain and that number warranted by the SSD OEM. Since all enterprise SSDs are warranted for 5 years, this parameter is either expressed in petabytes or in a parameter called DWPD (Drive Writes Per Day). DWPD is the number of times the entire capacity of the drive can be written to on a daily basis and warranted by the OEM. So say a SSD’s published endurance rating is DWPD of 3 and that it is a 1 TB SSD. So over a 5 year period, the OEM will warrant that you could write a total of 3 DWPD x 1 TB x 5 year x 365 days to the SSD, so a lifetime endurance of 5.5 PB. Host side caching involves writing large amounts of data to the SSD, since older less frequently used data is continuously replaced with newer data, and hence you need a high endurance SSD.

My favorite SSDs are the 960GB Samsung SM863a or the 960GB Intel S4600. The SM863a costs 60 US cents/GB, does 95K IOPS reads, 25K IOPS writes, for random 4KB block size. Samsung warrants it for the earlier of 5 years or 6 petabytes of lifetime writes. A comparable SSD from Intel is the S4600. It costs US 50 cents per GB, does 72K IOPS reads, 65K IOPS writes for random 4KB blocks. Intel warrants it for the lesser of 5 years or 5 petabytes lifetime writes.

What about other SSD interface types – SAS or NVME SSDs?

SAS SSDs tout better error control and lower failure rates versus SATA SSDs. For the purposes of VirtuCache where we implement error control techniques within VirtuCache to ensure lower SSD failure rates, SAS SSDs do not add additional value versus the cheaper SATA SSDs. A few years ago, SAS SSDs were also higher performing than SATA, but that’s not true anymore.

NVME SSDs are the highest performing SSDs on the market. However for most VMware customers, we would recommend NVME SSDs not for performance reasons but when there are either no SATA slots available on the host or when the host has a cheap, low queue depth RAID controller that cannot keep up with a good SATA SSD. NVME SSDs support a few hundred thousand IOPS. However most VMware deployments are not benefited by such performance. One reason is that there is only so much IO you can do from a VM or a host. VMware’s flow control mechanisms ensure that a single VM or host does not consume large amounts of storage bandwidth. In summary, go with a NVME SSD if you don’t have a spare SATA slot on the host or if you have a low queue depth RAID controller.

Consumer SSDs are even cheaper and perform well, so why not consumer SSDs?

First of all, consumer SSDs have low endurance. Most consumer SSDs are warranted for 3 years and for lifetime writes of less than 100TB.

Secondly, by looking at IOPS rating for some consumer SSDs, you might get the impression that they are higher performing than enterprise SSDs. However please keep in mind that in a VMware environment, you are better served by lower latencies and consistency of latencies than simply comparing IOPS ratings across SSDs. Unfortunately, SSD OEMs don’t list latencies, they only list IOPS or MBps throughput. For instance the Samsung 860 pro is higher throughput / IOPS than the Samsung SM863a, but the SM863a is much lower latency than the 860 Pro and far more consistent as well.

If we use a NVME SSD, which one?

NVME SSDs are PCIe SSDs that support the new NVME standards agreed to by SSD manufacturers.

The Samsung PM1725a or the Intel P4600 are our goto NVME SSDs. Also VMware starting in 6.0 has an in-box software driver for NVME, so installing a NVME SSD is as easy as a SATA SSD.

Where should I buy SSDs from?

You can buy host side SSDs from your server vendor or from retailers like Amazon.com etc. The SSD costs much less if bought from a retailer than if the same SSD was bought from the server vendor. The retailers also pass through the SSD OEM warranty of 5 years versus the same SSD when rebranded by the server vendor is now warranted by the server vendor for only 3 years. Also Amazon.com and other retailers sell the latest SSD. The server vendors sell SSDs that are a year or so old, since the qualification cycles at the server vendor are that long. SSD technology is evolving at a rapid clip, so SSDs that are a year old are lower performing than more recent models. The one “advantage” with a server vendor branded SSD is that it does make the server management console light go green vs. the same SSD you buy from Amazon.com might not.

What size SSD?

My rule of thumb for generic IT workloads is that 20% of media serves 80% of storage requests. So 20% of storage used by all VMs on that host should be the SSD capacity for your in-VMware host SSD. While evaluating VirtuCache, (using VirtuCache stats screen) if you notice that the cache hit ratio is low and the SSD is full, you should increase the SSD capacity to get to > 80% cache hit ratio. You could do that by replacing your existing SSD with a single higher capacity SSD (preferred) or getting two smaller but equal size SSDs and creating a RAID 0 array across the two SSDs.

How many SSDs do you need?

With Virtucache, if you are caching only reads, then you need only one SSD per host and only for those hosts that need to be accelerated. If you are caching writes as well, you will need one SSD per host and for all the hosts in the VMware cluster. This is because in case of write caching VirtuCache commits a write to the local SSD and a copy of that same write is also synchronously copied to another SSD in another host in the same ESXi cluster. Writes are mirrored across two hosts in this fashion to protect against data loss in case of host failure.

Do I need to RAID SSDs?

Yes ideally you should RAID the SSD in each host but not for the conventional reason of protecting data. You need to RAID-0 the SSD so as to assign the SSD a higher Queue Depth than what the default VMware SATA driver is capable of assigning to the SSD. By assigning the SATA SSD a higher Queue Depth, larger number of requests can be processed by the SSD, thus improving throughput and reducing latencies. A higher Queue Depth than what is possible by the default VMware SATA driver can only be assigned to the SATA SSD in this fashion.

You don’t need to do RAID-1 or a higher RAID level, since for read cache, the reads are always kept in sync between local SSD and the storage array at all times. And in the case of write caching, VirtuCache protects the writes on the SSD by mirroring writes over the network to one more SSD in one more host. So even if a host were to fail, you don’t lose any data. In fact we recommend against RAID-1 or higher RAID levels for the in-host SSD since they deteriorate SSD performance substantially.

Summary

-Use enterprise SSDs not consumer.

-Use NVME SSD only if there is no SATA slot on the host or the host RAID controller is low Queue Depth.

-For accelerating reads + writes, you need one SSD per host for every host in the ESXi cluster.

-For accelerating only reads, SSD is needed for only those hosts needing acceleration.

-RAID SSD as RAID 0 and use a RAID controller with a Queue Depth higher than 256, else the RAID controller becomes the bottleneck.

-RAID 1 or higher for SSD is not recommended.

SATA SSDs from tier-1 OEMs are the best value for money and provide sufficiently high performance for storage infrastructure to be no longer the bottleneck. Stating this opinion differently, enterprise SATA SSDs are sufficiently high performing to transfer the infrastructure bottleneck to another hardware component (Memory, Network, CPU) meaning that an even higher performing SSD will not necessarily result in better application performance or higher VM densities, since some other hardware component now becomes the bottleneck.

Disclaimer: Author and Virtunet has no affiliation with Samsung, Intel or any other SSD OEM. There was no monetary compensation made or free SSD samples sent to the author or Virtunet from Samsung or Intel.