Blog

VMware’s vFlash Read Cache (VFRC) versus Virtunet’s Read+Write Caching

VMware has discontinued VFRC in ESXi 7 0

Despite the end-of-life for VFRC, if you still want to review the differences between VFRC and VirtuCache, below are the four most important ones.

  1. We cache reads and writes, VMware's VFRC caches only reads.

  2. We improve the performance of any and all VMs (server and VDI). VFRC doesn't support VDI.

  3. We require no ongoing administration. Caching in our case is fully automated, and all VMware features are seamlessly supported. VFRC requires administrator intervention when doing vmotion, for creating a new VM, for maintenance mode, for VM restore from backup, requires knowledge of application block size, requires SSD capacity assignment per vdisk. Many other tasks require admin oversight.

  4. We provide VM, cache, network, and storage appliance level metrics for throughput, IOPS, and latencies, and alerts to forewarn of failure events. VFRC doesn't.

Below is a longer list of differences, cross-referenced with VMware authored content:

VMware Host Side Caching to improve the performance of HP MSA Storage

VirtuCache is ESXi software that automatically caches 'hot' data from any SAN storage to in-host SSD or RAM. By doing so it improves the storage performance of VMware VMs, without requiring you to upgrade your storage appliance or network.

VirtuCache competes and also complements Hybrid MSA's internal Read Caching and Performance Tiering features, and the All-Flash MSA. For instance, the Read Caching and Performance Tiering features in hybrid MSA only improve VM read performance,1 VirtuCache improves the performance of VM reads and writes. Or that VirtuCache helps improve the performance of small block storage IO even for the high-end All-SSD MSA. Here are a few more ways VirtuCache enhances the performance of HPE MSA.

View Storage Accelerator vs. VirtuCache – VMware Host Side Storage Caching

The big difference between the two is that VSA caches only 2GB of reads from the Master VM1,2. VirtuCache caches reads + writes from all server & desktop VMs, and it can cache to TBs of in-host SSD/RAM, so that all storage IO is serviced from in-host cache.

More details in the table below.

How to simulate production workloads running in VMware VMs using FIO or Iometer?

Here is a quick way to reproduce your entire ESXi cluster-wide production workload using only one VM running a storage IO testing tool like FIO or Iometer. This exercise is useful when you evaluate new storage technologies to see how they might perform with your real-life workload, but without actually deploying those in production.

The focus of this post is to do this in under 30 minutes and using freely available Iometer or FIO tools.

Step 1: If most of your workload is running in Linux VMs, use FIO. If Windows VMs, use Iometer.

Step 2: The table below lists the three most important characteristics of your production workload for you to be able to simulate it in Iometer or FIO, with instructions for how to collect this data. The fields denoted with <> need to be replaced with appropriate values in FIO command line or Iometer GUI.

Storage IO parameters taken from any one of your ESXi hosts, preferably the host that's doing the most IO.

SSH to one of your production hosts > At esxcli type esxtop,  then type d, and calculate the values using equations listed below.

<Block Size>: Payload carried by each IO. (KiloBytes)

X 1000

<Read/Write Mix>: Proportion of read IOPS to total IOPS.

READ/s ÷ (READ/s + WRITES/s)

<IO Depth>: Number of simultaneous IO requests generated by your workload.

On the same esxtop screen, type u and calculate the sum of all ACTV and QUED values for all your LUNs. If the ACTV and QUED values for a Datastore are zero while you are watching this screen, then assume the value of 1 for ACTV and 0 for QUED for every LUN.

You will also need the below parameters to plug into FIO and Iometer:

  1. <Test File Size>: 5GB file size is a good balance between two competing objectives - quicker test completion times and ensuring that the dataset is large enough to overflow appliance memory buffers.

  2. <Random Workload>: Use 100% random workload since VMware workloads are random. Also, random workloads stress the storage infrastructure much more than sequential.

  3. <Number of Test VM Cores>: Since real-life applications are multi-threaded, you need to define the number of threads that Iometer or FIO will spawn, and it is over these threads that storage IO is generated by FIO and Iometer. The parameter is called ‘numjobs’ in FIO and ‘Workers’ in Iometer. It should be set to the number of CPU cores assigned to the VM. Since the number of threads that can be processed simultaneously cannot exceed the total CPU cores assigned to the VM, a higher value will just choke the CPU and not be of much use.

STEP 3: Run the test by either following Step 3a if you are using FIO in Linux or 3b if you are using Iometer in Windows.  STEP 3a: If using FIO in a Linux VM. run the below command from within your Linux VM

sudo fio -size=<Test File Size> -direct=1 -rw=randrw -rwmixread=<Read/Write Mix> -bs=<Block Size> -ioengine=libaio -iodepth=<IO Depth> -runtime=1200 -numjobs=<Number of Test VM Cores> -time_based -group_reporting -name=my_production_workload_profile

for instance:

sudo fio -size=5GB -direct=1 -rw=randrw -rwmixread=69 -bs=4K -ioengine=libaio -iodepth=12 -runtime=1200 -numjobs=4 -time_based -group_reporting -name=my_production_workload_profile

The above syntax means that FIO will run on a 5GB file, the workload is fully random, 69% read and 31% write, storage payload per IO is 4KB, with 12 simultaneous IO requests generated against the FIO test file, the test will run for 20 minutes, and it is a multi-threaded test that uses 4 threads (processes) running in parallel.

STEP 3b: If using Iometer in a Windows VM.

Firstly, install the older 2006.07.27 edition. Don’t use the latest 1.1.0 edition which has bugs.

Here is a blog post with screenshots on how to run Iometer. Now if you are familiar with Iometer then skip this link and proceed to the below section that lists the fields and associated values to plug into the Iometer GUI.

  1. Create as many ‘Workers’ in Iometer as there are <Number of Test VM Cores>. An Iometer ‘Worker’ is a thread (process).
  2. For each 'Worker', on the 'Disk Targets' tab, set the ‘# of Outstanding I/Os’ to be equal to <IO Depth> ÷ <Number of Test VM Cores> and set the 'Maximum Disk Size' equal to 10Million sectors, which equals a 5GB test file. Configure each worker with the same values.
  3. Assign ‘Access Specification’ to each ‘Worker’. ‘Access Specification’ is the workload pattern generated by each ‘Worker’. ‘Transfer Request Size’ should be equal to the <Block Size>. Set ‘Percent Random/Sequential Distribution’ to 100% Random. Set the 'Percent Read/Write Distribution’ per the <Read/Write Mix>. Assign the same ‘Access Specification’ to all ‘Workers’.
STEP 4: Measuring Performance

Run the FIO or Iometer test for 20 minutes and then collect the read & write throughput and latency from the FIO or Iometer output. If the latencies are under 5ms, then your storage infrastructure is performing fine.

If you are evaluating VirtuCache, wait till the ‘cache hit ratio’ field (VirtuCache GUI > ‘Performance’ tab) reaches 99%, before you take throughput and latency readings.

STEP5: Scaling Iometer and FIO to mimic workload for your entire ESXi cluster.

The above steps showed how to mimic the workload for one of your hosts in the cluster. To scale this test to represent the storage workload for your entire ESXi cluster, simply multiply the <IO Depth> by the number of hosts in your cluster and run the test again using the new <IO Depth> value.  If the <IO Depth> exceeds 64, then clone the test VM, and distribute the <IO Depth> equally across these multiple test VMs, ensuring that the <IO Depth> stays under 64 per test VM. A higher <IO Depth> than 64 will prevent your production workload from being properly simulated by a single VM.

Steps to Run Iometer in a VMware VM?

Iometer is a great storage IO testing tool. It is easy to use, flexible, accurate, and free. Below are steps to run Iometer from within a Windows VM running on VMware.

Step1: Install the older 2006.07.27 edition. Don’t use the latest 1.1.0 edition which has bugs. Download link is here.

Change only the parameters listed below, keep all else default.

Step 2: After you install Iometer, run it with Windows ‘run as Administrator’ option. Each ‘Worker’ (screenshot below) is a process (or thread) that Iometer creates to generate storage IO on. The number of ‘Workers’ should be equal to the number of CPU cores assigned to the VM. Each CPU core can process only one Worker, so there is no point having more ‘Workers’ than that, because then the CPU on the VM will be choked. You don't need large number of VM CPU cores to generate a lot of load either. Four to eight CPU cores assigned to the Windows VM is fine.

Iometer Screenshot: Configure Iometer Workers (Processes).Iometer Screenshot: Configure Iometer Workers (Processes).

Step 3: For each worker, assign it a ‘Maximum Disk Size’ as shown below. This will create one file in C:\ that all the ‘Workers’ generate read and write IO to. The file should be large enough to overflow any memory buffers in the storage IO path, and small enough so the test hits all the sectors on the file quickly (say in under 30 minutes). 5-10GB is a good file size. Anything over 20GB is overkill and under 2GB might be served entirely from storage appliance memory.

‘# of Outstanding I/Os’ defines the number of concurrent IOs that each Iometer worker will generate. The higher the ‘# of Outstanding I/Os’, the higher the throughput that Iometer tries to push against storage. A Windows VM cannot push more than 64 IOs per VM, so if you multiply the Number of Workers with the ‘# of Outstanding I/Os’, the resulting value should always be kept under 64. A total IO depth of 64 is high. If you add up the IO depth of all your production VMs (assuming an ESXi cluster of 6-12 hosts), it is unlikely that the total number at any point will exceed 64. Now if you do want to test with higher IO depths, for instance, to see what the maximum throughput your storage is capable of, you can clone the test VM and run the same Iometer test in multiple VMs simultaneously.

Ensure that '# of Outstanding I/Os' and ‘Maximum Disk Size’ are set the same for all 'Workers'.

As an example, for Flash based storage, a 4-threaded (i.e. 4 workers) workload pushing 3-6 IOs per thread should exercise the entire 5 GB test file in 20-30 minutes. So, reviewing the Iometer stats at the end of 20-30 minutes should give you an accurate picture of your storage performance.

Iometer Screenshot: Set Test File Size and IO Depth Per Iometer WorkerIometer Screenshot: Set Test File Size and IO Depth Per Iometer Worker.

Step 4: Now for each ‘Worker’, assign it a workload, called ‘Access Specification’ in Iometer. As shown in the screenshot below, click on ‘Access Specification’ tab and assign it a Read/Write mix (80/20 is typical for enterprise wide traditional IT workload). Ensure that the workload is 100% random (since VMware workload is random) and the 'Transfer Request Size' a.k.a block size is 4KB (4KB is the average block size that Windows uses). Random small block IO is the most stressful for any storage infrastructure, so that's another reason for these settings.

Iometer Screenshot: Set Block Size, Random IO, Read/Write Mix Per WorkerIometer Screenshot: Set Block Size, Random IO, Read/Write Mix Per Worker.

Step 5: Click on ‘Results Display’ tab. Set the ‘Update Frequency’ and change the default ‘Display’ parameters to the ones shown below.

Then click the green flag icon at the top to start the test.

Iometer Screenshot: Display IOPS and Latency in Results Screen, and Run Test.Iometer Screenshot: Display IOPS and Latency in Results Screen, and Run Test.

Wait for 20 minutes and then note the read / write throughput and latencies.

NB: If you want to mimic your real-life production workload using Iometer, then you need to plug the values for 'Transfer Request Size', '# of Outstanding I/Os', and 'Percent Read / Write Distribution' for your production workload in Iometer. Please review this blog post on how to quickly find those values for your real-life workload.

To Improve CEPH performance for VMware, Install SSDs in VMware hosts, NOT OSD hosts.

SSDs deployed in CEPH OSD servers, whether for caching / journaling or primary storage, are not very effective from a performance point of view. The problem lies not in the SSDs, but because they are deployed at a point in the IO path that is downstream (in relation to VMs that run user applications) of where the IO bottleneck is. This post looks at this performance shortcoming of CEPH when connected to ESXi hosts, and its solution.

There are two options for improving the performance of CEPH.

Option 1 is to deploy SSDs in CEPH OSD servers, whether they be for journaling, read caching, or primary storage(all-flash CEPH).

Option 2 is to deploy SSDs and host side caching software in the VMware hosts, that are connected to CEPH. The host side caching software then caches reads and writes to the in-VMware host SSD from VMware Datastores that reside on CEPH volumes.

Below are reasons for why we recommend Option 2 above.

CEPH Storage for VMware vSphere

CEPH is a great choice for deploying large amounts of storage. It's biggest drawbacks are high storage latencies and the difficulty of making it work for VMware hosts.

The Advantages of CEPH.

CEPH can be installed on any ordinary servers. It clusters these servers together and presents this cluster of servers as an iSCSI target. Clustering (of servers) is a key feature so CEPH can sustain component failures without causing a storage outage and also to scale capacity linearly by simply hot adding servers to the cluster. You can build CEPH storage with off the shelf components - servers, SSDs, HDDs, NICs, essentially any commodity server or server components. There is no vendor lock-in for hardware. As a result, hardware costs are low. All in all, it offers better reliability and deployment flexibility at a lower cost than big brand storage appliances.

CEPH has Two Drawbacks - High Storage Latencies and Difficulty Connecting to VMware.

Improving Storage Performance of Dell VRTX

Dell's PowerEdge VRTX hyper-converged appliance can either have all hard drive datastores or all SSD datastores, but you can't have SSDs act as tiering or caching media for VRTX volumes / virtual disks. That's where VirtuCache comes in.

Infinio’s Read Caching versus Virtunet’s Read+Write Caching Software

The biggest differences are:

  1. Virtunet's VirtuCache accelerates reads and writes. Infinio Accelerator accelerates only reads.

  2. Infinio doesn't support Linked Clones or Instant Clones, hence VDI is not supported. We support all VDI features in VMware Horizon and Citrix Xendesktop.

  3. With us you can apply caching policy at the datastore and/or VM level versus only at the VM level with Infinio.

  4. Infinio doesn't accelerate IO that originates in the VMware OS. VirtuCache accelerates all storage IO originating in ESXi and VMs. Significance of this is in the table below.

Citrix MCS IO vs. VirtuCache – Server Side Storage IO Caching

Both cache 'hot' data to in-host RAM, but the differences between Citrix MCS Storage Optimization and VirtuCache are many. The top three are:

- MCSIO works only for non-persistent Xenapp / Xendesktop VMs.1 VirtuCache works for all VMs on the ESXi host;

- Citrix MCSIO can cache only VM writes and to VM RAM only.1 VirtuCache can cache VM reads and writes to in-host SSD or RAM;

- With MCSIO there will be VM data loss / instability if the host fails or the RAM cache is full,1 not so with VirtuCache.

For detailed differences, please review the table below.

Page 2 of 712345...Last »