VirtuCache used to Improve Hadoop Performance within VMs at Stanford
Typically Hadoop workloads are run on bare-metal servers. However since Stanford’s School of Medicine was 100% virtualized, and because their monitoring and management tools were integrated within VMware, it was easier for them to deploy Hadoop within VMs, instead of provisioning new physical servers.
The biggest challenge Stanford faced with Hadoop VMs was low throughput and high latencies for writes to disk.
VirtuCache was deployed on VMware hosts along with a 800 GB Intel S3500 SSD in each host. The 800GB S3500 costs $800 and does about 60MBps Write IOPS and 180MBps Read IOPS at under 10ms latencies from within VMs. With VirtuCache, all recent writes were written to this SSD and all frequently used data was read from this same SSD. The writes were then asynchronously synced with SAS based shared disks. As is standard configuration with VirtuCache, all writes to local SSD were replicated to two others SSDs in two different VMware hosts to prevent against data loss in case the local SSD or local host failed.
By ensuring that most of the storage IO happened to the local in-host SSD, we ensured consistently low latencies even for bursty write-intensive workloads from within Hadoop VMs.
VirtuCache is the ONLY solution in the market that can accelerate writes from local disk based storage with a kernel only deployment. A kernel only deployment ensures very low latencies compared with a VM based solution (as is the case with most of our competition).
As a result, deploying VirtuCache resulted in a 6-10X improvement in write latencies for Hadoop workloads from within VMs.