An Order of Magnitude Improvement in IOPS and Latencies using SQLIO Benchmark
Performance Benchmark using SQLIO that compares Storage I/O with and without VirtuCache
To simulate a real-life test we ran SQLIO overnight on a relatively small server. Our test spanned both random and sequential reads and writes, and stepped through queue depth from 1 through 128 and number of threads from 1 through 64.
All tests were run from within a single Windows 2008 Server R2 VM that had 4 cores assigned to one vCPU and 4 GB of RAM.
Highlights of the test results are
- Read IOPS for both Random and Sequential Reads with VirtuCache are generally in the 30,000-40,000 IOPS range compared with a range of 650-800 IOPS without VirtuCache. A 40-50X improvement in throughput.
- Cache warm-up time was negligible. It was a relatively small 25 GB load per test cycle that contributed to small ‘warm-up’ times, however both with respect to cache warm-up times and IOPS improvement, ours would be the fastest solution on the market for VMware.
- Negligible improvement in write throughput because ours(and everyone else’s caching solution for VMware) is a ‘write-through’ cache that commits all writes to both the Flash card and Disk simultaneously. The current state of caching for VMware market is restricted to read or write-through caching only.
To improve write speeds, ‘write-back’ caching needs to be implemented, which has data consistency issues in case of hardware failure, and this area is work in progress for caching to in-server solid state storage.
- With a higher end server with a larger multi-core CPU and larger amounts of RAM, the performance multiples will be higher, however the idea with these tests were to showcase real life examples and not for bragging rights for scorching speeds.
Hardware and Test Setup Details Below
To simulate a near real life test, we used SQLIO test suite running on a 4 core Xeon 3400 processor with 8GB RAM and a cheap 500GB SCSI drive.
We were caching data from this SCSI drive to a 300GB in-server Flash card installed in a PCIe slot.
SQLIO was configured to use 64KB block size. And each of the four test cases for random reads, random writes, sequential reads, and sequential writes were cycled through 1 through 64 number of threads and 1 through 128 number of pending requests.
The graphs below can be broken into 4 quartiles, with the first showing results for random reads, the second showing results for random writes, a third showing sequential reads and the last one for sequential writes.