Improving the Performance of Synology Storage for VMware VMs
SYNOLOGY: CHEAP AND VMWARE READY
Synology storage arrays have a few things going for them, they are one of the most reasonably priced storage arrays on the market that are also certified as ‘VMware Ready’ by VMware.
HIGH LATENCIES IN SYNOLOGY WHEN CONNECTED TO VMWARE
One issue with these appliances is that they exhibit high latencies, when connected to ESXi hosts. Small block IO and high VM densities aggravates this issue further.
VIRTUCACHE HOST CACHE SSD VERSUS SSD IN STORAGE ARRAY
Generally speaking, and this argument holds for any storage array, because VirtuCache caches to Flash or memory in the host, it will perform better than if the same media were in the storage array, because in the case of a storage array, the SSD is behind a controller and network versus in the case of VirtuCache the SSD is on the same server motherboard as the VMs.
VIRTUCACHE CAN USE NVME SSD OR HOST RAM, WHICH SYNOLOGY CAN’T
Synology’s highest IOPS array uses SAS or SATA SSDs in the array. VirtuCache can use higher performing NVME SSD or even higher performing host RAM. The combination of such high performing media and the fact that this media is on the same motherboard as the host CPU, makes VirtuCache powered storage infrastructure higher IOPS and lower latency versus any storage array.
LOW POWERED CONTROLLER CPU CHOKES WITH SMALL BLOCK IO
Specifically, versus Synology (and other similar arrays) that have low powered processors in their arrays, these CPUs result in high latencies when you have large amounts of small block random IO.1 Small block IO stresses storage controller CPU more than large block IO, because as their name suggests, the payload (called block size) carried by each IO operation is small and hence each IO conveys smaller amount of information/data, than large block IO. Since storage controller CPUs do about the same amount work per IO operation whether it is small or large block IO, to convey the same amount of information as large block IO, the storage controller CPU must do more work when processing small block IO. Hence storage arrays with low powered processors choke on small block IO.
LOW CORE COUNT CONTROLLER CPU RESULTS IN HIGH LATENCIES FOR HIGHLY MULTI-THREADED WORKLOAD
Another reason for high storage latency is small core counts in storage array CPUs. A CPU core is required to process a thread, so multi-threaded workloads are benefited by high core count CPUs, both on the ESXi host motherboard and storage array CPUs. As VM densities increase, so do the number of threads, which in turn require larger number of CPU cores to process these threads in parallel. Cheaper appliances like Synology have low core count processors, which in turn cause high latencies.
VIRTUCACHE USES HIGH CORE COUNT AND HIGHER GHZ ESXI HOST CPUS
Now VirtuCache uses ESXi host CPUs for caching operations, and not storage appliance processors. As a result, VirtuCache has access to larger number of CPUs, higher powered and higher core count CPUs, than any storage array, which in turn makes it very good at accelerating small block and highly multi-threaded IO.
EVEN IF A STORAGE ARRAY IS HIGH IOPS IT MIGHT NOT BE LOW LATENCY
Note that high latencies don’t necessarily mean low IOPS (or low MBps storage throughput). Synology’s fastest appliances have a rating of 240K IOPS. You can get this kind of IOPS with even small core count CPUs, because each core can process 60-80K random IOPS at 4KB block sizes, so in a typical VMware environment you will experience high latencies (for all the reasons listed above) much before you reach the IOPS/throughput rating for the appliance.
1To check if you have small block IO, run Esxtop then type v, this will show you MBps and IOPS for reads and writes for all VMs on the host. Dividing MBps by IOPS gives you the block size (or payload). If the block size is less than 8KB then you have small block IO.