Four Leading Causes of Storage I/O Bottlenecks in Virtualized Data Centers
The main cause of Storage I/O bottlenecks in data centers is the mechanical limitations of disk-based storage devices. Few data centers can afford to upgrade to solid state drives, but disk-based drives have not kept up with the performance increases in other areas of computing.
Memory bandwidth from the two major companies, Intel and AMD, improved 9.3 times and 4.8 times respectively between 2004 and 2009, going from 4.3 to 40 GB/sec and 5.3 to 25.6 GB/sec. That is an increase in productivity of approximately 1000 and 500 percent, and Intel’s internal testing in November 2011 showed their Xeon E5-2690 with a memory bandwidth of 79.55 GB/sec, a full 18.5 times better than the 2004 benchmarks. While CPU speed increases slowed in 2005, topping out at around 3.8 GHz, the introduction of multiprocessing through packing more transistors into a single chip and running multiple CPUs in parallel has continued the trend of better and faster I/O request processing.
In contrast, disk-based drive performance improved only 24 percent between 2004 and 2009 (1). Comparing the top performing drive in a 2010 study (2) with top performing drive in a 2012 benchmark(3) on read and write throughput, the improvement since 2009 has been 200%, going from just over 100 MB/sec to just over 200 MB/sec for both measures. However, that performance improvement degrades when you look at not just read/write but also drive size. The larger the drive, the slower the performance, and drive size has been increasing by much more than 2 times.
While it might seem as if the hard disks are responsible for all storage I/O bottlenecks, even if the disks perform well enough to keep up with the demand, the standard storage array architecture can also pose a limitation for data throughput. Out of the 13 storage array systems compared on The StorageSavvy Blog (4) (as of the 9/14/2012 update), 8 of them use two or fewer controllers for the storage array. Many times these controllers are set in failover configuration, so only one is active at any particular time. This means no matter how much traffic is sent across the network to the storage array, it all must filter through a single storage controller.
Both of the above issues are true for any data center that uses storage arrays. However, the benefits of virtualization, maximized use of server resources and reduced demand for additional space and hardware, put increased pressure on the weak point of the storage appliance.
According to the VMware documentation, a sufficiently strong server can manage up to 300 virtual machines (5). Hyper-V lists 384 as the maximum number of virtual machines it can support (6). Jeff Victor, a technical blogger, reports running 1000 virtual machines on a 64 GB Solaris server (7).
Since the goal of virtualization is to minimize the number of servers, the ideal environment ensures high memory, CPU, and network usage most of the time. If the applications running on these VMs (more often 20 to a server than 300 plus) are data intensive, the number of I/O requests from each server is multiplied by the number of VMs on that server.
Since the memory, CPU, bus, and network advances have outstripped controller and hard disk speed improvements, the VM requirements must be carefully balanced to distribute the demand for I/O resources or those requests will back up. The consequences of more demands than the controller/disks can handle range from higher latency to crashing applications and VMs when critical requests time out.
This is not to say that hardware manufacturers have been ignoring these issues. The introduction of solid state drives is one method to reduce the storage bottleneck, but until prices come down, this solution is out of reach for most data centers. Other improvements include the addition of a flash cache on the disk array or storage controller to allow for quicker responses. Low-end storage controllers send the commands directly to the disks to process, but higher-end controllers have the ability to cache frequent results, predict the next request and read ahead, and buffer write commands until there are free disk cycles depending on the amount of cache memory. Two other efforts to speed performance are called queuing, where the controller can hold pending requests instead of returning a fail code, and coalescing where requests are organized into sequential information for better writing and reading speed.(8)
The way virtualization affects coalescing, otherwise known as the I/O blender, is the fourth bottleneck. Data centers have traditionally isolated applications to a single server for better performance. Virtualization allows them to do this without requiring additional physical servers, but the virtual machines must share the I/O request processing resources. Once an application’s request is filtered out of its virtual machine, the request is mixed into the queue with all other application activity on the server, meaning sequential requests are now randomized. Those combined requests are then subject to batch processing determined by the queue time and queue depth settings, increasing the chances of further separation of sequential requests. By the time the calls reach the storage controller, even with the ability to queue and coalesce requests, the likelihood of contiguous segments existing in the queue is significantly reduced. Coalescing requests allows the controller to minimize seek times. Without that ability, the disks take longer to return results, or more requests have to be held in the queue.
Storage I/O bottlenecks have been a concern for data centers long before virtualization, and its advantages are unmistakable even with the increased bottleneck risks. Therefore, data centers need to seek out methods to address the virtualization-specific bottleneck concerns by offering better ways to manage storage I/O.
About the Author
Margaret Fisk has worked in the tech industry for 15 years in a variety of positions which include data center operations manager, professional services engineer, technical writer, programmer, and most recently, technical marketing for Virtunet.
References:
1. I/O Bottlenecks: Biggest Threat to Data Storage by Henry Newman
2. Unsung Heroes: 14 Years of Hard Drive Performance
http://hothardware.com/Reviews/Unsung-Heroes-14-Years-of-HDD-Performance/?page=1
3. Tom’s Hardware Hard Drive Performance Charts
4. Storage Array Comparison – Architecture
http://storagesavvy.com/storage-array-comparison-architecture/
5. Server Consolidation: Reduce IT Costs While Maintaining Control and Offering Choice
http://www.vmware.com/solutions/consolidation/consolidate.html
6. Requirements and Limits for Virtual Machines and Hyper-V in Windows Server 2008 R2
http://technet.microsoft.com/en-us/library/ee405267(v=ws.10).aspx
7. Spawning 0.5kZ/hr (Part 3)
https://blogs.oracle.com/JeffV/entry/title_spawning_0_5kz_hr
8. Storage Basics – Part V: Controllers, Cache and Coalescing by Joshua Townsend
http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/
9. Storage Array Comparison – Architecture
http://storagesavvy.com/storage-array-comparison-architecture/
10. Storage Basics Part V – Controllers, Cache, and Coalescing
http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/