Shared Disaster Recovery Infrastructure using CEPH storage.

Why deploy Disaster Recovery(DR) infrastructure when you already have backups?

The main reason to deploy DR infrastructure is to protect against the rare case where you lose your entire datacenter. In case of a datacenter wide outage, you can flip over to your offsite DR data center. The DR datacenter is a replica of your entire production infrastructure, from applications running in VMs to servers, storage, and network configuration.

DR and backups though related are two different things involving slightly different processes and underlying technologies, resulting in varying times to restore the VM state and data.

Other than the rare datacenter wide outage, there are more frequently occurring failure situations where leveraging DR infrastructure is a better choice than restoring from backups. For instance, even if one VM were to fail, you can flip over to the replica of that VM in your DR site and have end users use the replica VM as the interim production VM till you restore your primary production VM. You can restore your production VM from your replica VM while end users are using the replica VM (now acting as interim production VM) in parallel.

With DR infrastructure, you can also search and replace corrupt or deleted files, databases, mailboxes, across all VMs without restoring the VMs.

Provided you have available rack space, and if you already have a process for doing backups, the incremental cost to deploy DR is not too much either. You can repurpose older servers to use as both DR VMware hosts and DR SAN storage (how to repurpose older servers to build SAN storage is described in later sections), then you need VMware Essentials licenses, and lastly you need a backup/DR software. A software like Veeam that you might already be using for backups, has DR functionality in its basic license, so no additional licensing cost are incurred here.

What is CEPH storage? And it's relationship with Virtunet Systems?

CEPH is open source storage software that runs on commodity servers. It clusters servers together and presents this cluster of servers as an iSCSI appliance. Virtunet Systems has enhanced CEPH with an iSCSI module to interface with VMware and Hyper-V; developed software for VAAI and ODX (storage offload from VMware and Hyper-V); and built an easy to use GUI. Virtunet's version of CEPH is called VirtuStor.

Servers of any make, model, size, and antiquity can be ‘hot’ added to an existing CEPH cluster to add capacity or improve performance.

From a pricing point of view CEPH gives smaller hospitals the enterprise storage features they need, but at much lower costs than traditional SAN storage appliances.

Why is CEPH suitable for shared DR storage?

CEPH storage has its origins at cloud service providers (SPs). The fact that commodity servers can be used to build SAN storage was important to cloud SPs to keep their hardware costs low. Low cost storage is a key requirement for on-premises DR storage.

Since CEPH is used by cloud SPs, it also has features to isolate and encrypt data and storage I/O path between multiple organizations that might be using the same CEPH storage cluster, a requirement if different organizations are to share the same storage hardware.

As you scale out the storage cluster, cost per capacity reduces dramatically. Starting at $2/GB for raw 5TB storage, the cost drops to 20 cents/GB for 300TB of storage. And so it is cost effective for smaller IT departments to pool together their DR/backup budgets to get larger amounts of storage for their DR infrastructure.

Sharing compute using VMware.

VMware lends itself well to a shared DR infrastructure as well. Since each VMware physical server can host a maximum of 512 VMs, large number of VMs can be deployed on only a 2-host VMware cluster.

DR Infrastructure at St. James Hospital.

Currently the DR infrastructure at St. James has

  1. 2-host compute cluster using VMware Essentials license running on repurposed servers. The license cost is $600 for VMware Essentials license.

  2. 3-server VirtuStor CEPH cluster for iSCSI storage using older servers but with new storage media. It has raw capacity of 24TB (usable of 12TB). The cost for the 12TB usable is $15K.

  3. Veeam is used to replicate data from St. James’ production VMware cluster to this DR infrastructure. The cost of Veeam Enterprise Plus Essentials is ~ $7K.

  4. So the cost of the entire infrastructure with one time services fee to put this together was $30K.

By simply adding a few more hard drives and SSDs, this infrastructure had the capacity to accommodate the DR workload for two more hospitals of the same size as St. James.

The incremental cost to share St. James’ DR infrastructure was $10K per year for replicating 30 or so VMs and 10TB of data.

Furthermore, if this idea of shared backup and DR infrastructure was to become popular among other hospitals in the collaborative that St. James belonged to, both VMware and CEPH storage clusters could be scaled up by adding more servers to each cluster. This configuration could support hundreds of organizations with a single clustered DR environment.

VirtuCache doubles VM densities at Myers Briggs Testing Institute

MBTI is a not-for-profit organization that improves the performance of people and organizations. They are best known for Myers-Briggs tests, the world’s foremost personality assessment tests used by institutions to help employees better understand themselves and how they interact with others.

Main Challenges

MBTI wanted to repurpose few of their expensive Dell C6420 servers to run additional applications, which meant that they would need to increase the number of existing VMs deployed on each host. As is often the case, there was plenty of CPU, memory, and networking capacity available on each one of the servers, and it was only storage latencies that started to increase disproportionately with higher VM densities.

MBTI decided to look for the cheapest possible solution that would improve storage throughput and latencies, which in turn would facilitate the migration of additional VMs to each VMware host.

IT Infrastructure
  • VMs and Physical Servers - MBTI’s IT had four Dell 4-node C6420 servers running Windows Server VMs on VMware 6.5. Before VirtuCache, there were about 80 VMs running MS Exchange, MS Dynamics, and other enterprise applications in this cluster.

  • Storage - 64 TB of storage on iSCSI Hitachi storage appliance was used by these servers.

  • Workload Characteristics - On an average, less than 16 TB of data changed every day and read-write mix varied widely between 40-60 to 80-20 read-write ratio.

  • VMware’s Distributed Resource Scheduler (DRS) functionality was configured to be automatic and aggressive, which ensured that workloads were equally distributed at all times across these 4 physical hosts.

VirtuCache Deployment

MBTI decided to deploy VirtuCache on two of the four physical servers in the cluster. A 1.6TB NVME card was used by VirtuCache to cache data from Datastores.

VirtuCache along with the NVME drive was installed in the ESXi host in under 30 minutes.

Steady state Cache Hit Ratio (ratio of IO served from the in-server SSD to the IO served from backend LUNs) was at 75-80%, with warm-up time of 10 minutes.

Guest Average Latency (GAVG) was measured before and after VirtuCache, using the standard vmware utility called esxtop. The below chart shows reduced GAVG after deploying VirtuCache, which resulted in higher VM densities. Since auto-DRS was enabled on the VMware cluster, VMware automatically sensed improvements in storage performance on the server that had VirtuCache installed and moved VMs from the other servers to these two VirtuCache accelerated servers, increasing the number of VMs from 20 before VirtuCache to 42 after VirtuCache.

GAVG as measured using ESXTOP Before VirtuCache After VirtuCache
Read GAVG 35-1500 ms 0.1 – 6 ms
Write GAVG 20 – 600 ms 0.1 – 6 ms
Benefit to MBTI

Using VirtuCache, MBTI was able to reduce the number of physical servers in their VMware cluster from four to two, thus reducing VMware licensing costs and hardware costs.

Seagate Chooses VirtuCache for its 1200 Accelerator Product Line to Increase VM Densities for VDI Deployments

Seagate’s Enterprise SSD business unit manufacturers and sells high performance enterprise grade Solid State Drives (SSDs). Seagate realized that enterprise SSDs were well suited for the fast growing Virtual Desktop Infrastructure (VDI) space. Since most VDI deployments use VMware vSphere, Seagate was interested in partnering with a software vendor that could compliment their SSDs to address storage IO issues in VMware based VDI deployments.

Listed below are the specific criteria that Seagate was looking for in a software solution.

  1. Since their SAS or SATA SSDs could be readily deployed in any server, Seagate was looking for a server side software solution that could be bundled with their SSDs.
  2. The software had to be as easy to install in the server as the SSD itself.
  3. The overall cost of the solution needed to be quite a bit cheaper than other alternatives.
  4. Since latencies are a big issue in VDI deployments, such a solution needed to achieve at a minimum the same latencies even at high 2gbps throughput, as would be achieved if the VMware server was connected to an all-flash array.
  5. This server side solution had to eke throughput and latencies from the Seagate SSD that were closer to raw SSD throughput and latencies than rest of the competition. This was especially so because Seagate had recently announced 12 gbps SAS SSDs, whose latencies were comparable to the more expensive PCIe Flash cards. So if a high performance server side SAN acceleration software could be paired with these new Seagate SSDs, then this combination could effectively compete with all-flash storage appliances on the one hand and in-server PCIe Flash cards on the other.
  6. Lastly, increased VM densities had to be demonstrated versus competition.

Caching from SAN to in-Host SSD to Improve Performance of Call Center Workflow Management

Call Center Workflow Management software is storage IO intensive since it deals with ingesting and analyzing large volumes of audio. By caching from slower storage to in-host SSDs/DRAM, VirtuCache improves storage performance considerably thus improving performance of Call Center Workflow software running within VMs.
Page 5 of 512345