Blog

Shared Disaster Recovery Infrastructure using CEPH storage.

Why deploy Disaster Recovery(DR) infrastructure when you already have backups?

The main reason to deploy DR infrastructure is to protect against the rare case where you lose your entire datacenter. In case of a datacenter wide outage, you can flip over to your offsite DR data center. The DR datacenter is a replica of your entire production infrastructure, from applications running in VMs to servers, storage, and network configuration.

DR and backups though related are two different things involving slightly different processes and underlying technologies, resulting in varying times to restore the VM state and data.

Other than the rare datacenter wide outage, there are more frequently occurring failure situations where leveraging DR infrastructure is a better choice than restoring from backups. For instance, even if one VM were to fail, you can flip over to the replica of that VM in your DR site and have end users use the replica VM as the interim production VM till you restore your primary production VM. You can restore your production VM from your replica VM while end users are using the replica VM (now acting as interim production VM) in parallel.

With DR infrastructure, you can also search and replace corrupt or deleted files, databases, mailboxes, across all VMs without restoring the VMs.

Provided you have available rack space, and if you already have a process for doing backups, the incremental cost to deploy DR is not too much either. You can repurpose older servers to use as both DR VMware hosts and DR SAN storage (how to repurpose older servers to build SAN storage is described in later sections), then you need VMware Essentials licenses, and lastly you need a backup/DR software. A software like Veeam that you might already be using for backups, has DR functionality in its basic license, so no additional licensing cost are incurred here.

What is CEPH storage? And it's relationship with Virtunet Systems?

CEPH is open source storage software that runs on commodity servers. It clusters servers together and presents this cluster of servers as an iSCSI appliance. Virtunet Systems has enhanced CEPH with an iSCSI module to interface with VMware and Hyper-V; developed software for VAAI and ODX (storage offload from VMware and Hyper-V); and built an easy to use GUI. Virtunet's version of CEPH is called VirtuStor.

Servers of any make, model, size, and antiquity can be ‘hot’ added to an existing CEPH cluster to add capacity or improve performance.

From a pricing point of view CEPH gives smaller hospitals the enterprise storage features they need, but at much lower costs than traditional SAN storage appliances.

Why is CEPH suitable for shared DR storage?

CEPH storage has its origins at cloud service providers (SPs). The fact that commodity servers can be used to build SAN storage was important to cloud SPs to keep their hardware costs low. Low cost storage is a key requirement for on-premises DR storage.

Since CEPH is used by cloud SPs, it also has features to isolate and encrypt data and storage I/O path between multiple organizations that might be using the same CEPH storage cluster, a requirement if different organizations are to share the same storage hardware.

As you scale out the storage cluster, cost per capacity reduces dramatically. Starting at $2/GB for raw 5TB storage, the cost drops to 20 cents/GB for 300TB of storage. And so it is cost effective for smaller IT departments to pool together their DR/backup budgets to get larger amounts of storage for their DR infrastructure.

Sharing compute using VMware.

VMware lends itself well to a shared DR infrastructure as well. Since each VMware physical server can host a maximum of 512 VMs, large number of VMs can be deployed on only a 2-host VMware cluster.

DR Infrastructure at St. James Hospital.

Currently the DR infrastructure at St. James has

  1. 2-host compute cluster using VMware Essentials license running on repurposed servers. The license cost is $600 for VMware Essentials license.

  2. 3-server VirtuStor CEPH cluster for iSCSI storage using older servers but with new storage media. It has raw capacity of 24TB (usable of 12TB). The cost for the 12TB usable is $15K.

  3. Veeam is used to replicate data from St. James’ production VMware cluster to this DR infrastructure. The cost of Veeam Enterprise Plus Essentials is ~ $7K.

  4. So the cost of the entire infrastructure with one time services fee to put this together was $30K.

By simply adding a few more hard drives and SSDs, this infrastructure had the capacity to accommodate the DR workload for two more hospitals of the same size as St. James.

The incremental cost to share St. James’ DR infrastructure was $10K per year for replicating 30 or so VMs and 10TB of data.

Furthermore, if this idea of shared backup and DR infrastructure was to become popular among other hospitals in the collaborative that St. James belonged to, both VMware and CEPH storage clusters could be scaled up by adding more servers to each cluster. This configuration could support hundreds of organizations with a single clustered DR environment.

VirtuCache doubles VM densities at Myers Briggs Testing Institute

MBTI is a not-for-profit organization that improves the performance of people and organizations. They are best known for Myers-Briggs tests, the world’s foremost personality assessment tests used by institutions to help employees better understand themselves and how they interact with others.

Seagate Chooses VirtuCache for its 1200 Accelerator Product Line to Increase VM Densities for VDI Deployments

Seagate’s Enterprise SSD business unit manufacturers and sells high performance enterprise grade Solid State Drives (SSDs). Seagate realized that enterprise SSDs were well suited for the fast growing Virtual Desktop Infrastructure (VDI) space. Since most VDI deployments use VMware vSphere, Seagate was interested in partnering with a software vendor that could compliment their SSDs to address storage IO issues in VMware based VDI deployments.

VirtuCache Host Side Caching versus Storage Controller Cache

Global Foundries is one of the world’s largest semiconductor foundries making chips for AMD, IBM, and Qualcomm.

Global Foundries’ IT had recently moved their Business Intelligence applications from physical servers to VMs. Such a migration had resulted in high storage latencies and lower throughput for this application, which meant that the user queries during the daytime and batch jobs during off-hours were taking a lot longer than when the workload was running on bare metal Centos 6.x servers. At peak throughput of a few hundred MBps, latencies were in hundreds of milliseconds, which in turn caused the application to malfunction.  As a result, they wanted to ensure that VM level latencies at sustained 100MBps throughput was less than 15 ms and, and even at peak throughput of 400MBps, latencies would not exceed 30 ms.

Global Foundries IT Infrastructure
  • Physical Servers - Global Foundries had six HP BL460c G7 blades with 144GB RAM running non-virtualized Centos 6.x that ran their Business Intelligence applications on Oracle 11G.
  • For improved manageability, and specifically to leverage VMware features like DRS, High Availability and Load Balancing, they wanted to deploy this application within VMs in VMware 5.1.
  • Storage - A total of 36 TB of storage on LUNs was provisioned across a 8 gbps Fiber Channel Clariion CX4 appliance and another 10gbps FCoE HP 3PAR 7200 appliance
Workload Characteristics
  • On an average, less than 8 TB of data changed every day. Global Foundries’ home grown application was using Oracle 11G for the underlying database and had 60-40 read-write ratio.
  • At peak throughput of 400 MBps, they wanted latencies under 30 ms, and for sustained 100MBps throughput, they wanted 10-15 ms latencies.
Competing Solutions In addition to Virtunet Systems, Global Foundries was evaluating storage controller based caching / tiering from their existing storage appliance vendors. Comparing VirtuCache with EMC Fast Cache and HP 3PAR Tiering Global Foundries’ selection process involved comparing price per IOPS, price per GB, and latencies for the below three competing approaches :
  1. EMC proposed deploying their FastCache functionality with EMC branded SAS SSDs within the CX4 appliance. 2 TB of Fast Cache SSDs were deployed.
  2. HP proposed deploying SSDs within the StoreServ appliance and tiering data to SSDs from disks. Again 2 TB of SSDs were deployed.
  3. VirtuCache using 400 GB of Micron SAS SSDs in each of the six ESXi blade servers.
EMC FastCache with EMC MLC SAS SSD HP 3PAR Tiering to HP MLC SAS SSD VirtuCache and Micron SAS SSD on each blade
Avg Read MBps 86 102 286
Avg Write MBps 8 9 11
Avg Read Latency ms 18 15 5
Avg Write Latency ms 14 10 6
Cost *  $160,000Includes FastCache license $ 40,000Includes data tiering functionality $ 19,000Includes VirtuCache Licensing for 6 hosts
*Cost Calculated Based On Publicly Available Info Listed Below Price for HP 3PAR SSD = $20/GB Price for Micron SAS SSD = $4/GB Price for EMC Fast Cache SSD = $80/GB Benefit to Global Foundries By deploying VirtuCache, Global Foundries was assured low latencies even at high throughput. As a result they were able to successfully migrate their database driven Business Intelligence workloads from non-virtualized Centos servers to VMware VMs, and at much lower costs than storage array based caching or tiering solutions from HP or EMC.

VirtuCache Improves Latencies of VDI VMs at the University of California, Los Angeles

UCLA has deployed Virtual Desktops for students and staff using VMware Horizon View. End users of these virtual desktops complained about slow boot times, Windows cursor and start button freezing and generally slow response times from the virtual desktops at various times during the day.

Caching from SAN to in-Host SSD to Improve Performance of Call Center Workflow Management

Call Center Workflow Management software is storage IO intensive since it deals with ingesting and analyzing large volumes of audio. By caching from slower storage to in-host SSDs/DRAM, VirtuCache improves storage performance considerably thus improving performance of Call Center Workflow software running within VMs.

Page 5 of 512345