Why is VDI Slow?
“VDI is slow” is usually one of the most common complaints in VMWare deployments. There are many root causes for the perceived poor performance of VDI. Administrators often waste precious time chasing the wrong problems. This article lists the usual suspects so that they can quickly address end-user complaints.
Long boot times, slow application response times, cursor freezes, and jittery audio or video are the typical problems reported. VDI is usually slow due to storage and network bottlenecks, inadequate resources or software issues. Insufficient planning, incorrect configuration, and ineffective monitoring lead to these problems.
Eliminate the most obvious cause: virtual desktops might have insufficient processor or memory resources provisioned. Monitor the alarms for virtual machine CPU and memory usage. Use this guide to estimate the memory requirements.
Unlike physical desktops or laptops, virtual desktop users have to share the storage, CPU and memory resources. Storage speeds have increased significantly. However, they have lagged behind advances in CPU, memory and network speeds. Storage bottlenecks, thus, are key factors for slow VDI. Find out if storage issues are the root cause by checking the disk latencies.
Users often complain of long boot and/or log on times. Boot Storms aggravated by storage bottlenecks is one of the main reasons.
Upgrading to high performance all-flash arrays is an obvious solution. However, it is expensive and needs storage overhaul. Caching solutions such as VirtuCache provide an inexpensive alternative without disrupting existing storage infrastrucure. Caching is a natural-fit for VDI since virtual desktops share the same base image (storage).
Cursor freeze and other lags typically result when the virtual desktops are not provisioned enough bandwidth. Use Network I/O control and traffic shaping to guarantee bandwidth to the virtual machines (unless dedicated NICs can be assigned).
Network latencies, along with storage latencies, effectively dictate the responsiveness of VDI. Follow these best practices.
Additionally, organisations need to test how VDI traffic and user experience differs between LAN and WAN connections. It would be a mistake for VDI POC to only test on LAN connections. Ensure that the group policy object (GPO) settings are optimized for WAN connections as per these recommendations.
Special attention also needs to be paid on the active directory configuration to ensure that network latencies are minimal. It has a large impact on the boot/log-on time.
Windows was designed to run on physical hardware for single users. It is necessary to tweak the software and settings to enable efficient operation in virtual machines. Refer to these VMware and Microsoft guides for best practices to prepare the base images.
With VDI, Antivirus (AV) scans are big burdens, especially when multiple scans happen at the same time on the same physical host. This is mitigated somewhat by Virtualization-aware AV software. Caching solutions like VirtuCache also decrease the AV scan burden, especially with linked-clone deployments where the scanning of the base image is accelerated.
Organizations deploying VDI for the first time typically opt for roaming profiles. These tend to grow over time, especially when these profiles don’t have quotas. This can cause inordinately long log on times. Evaluate whether roaming profiles are really needed before implementing them. Folder redirection may be used in conjunction with roaming profiles to reduce the latencies.
In deployments with multiple domain controllers, misconfiguration often results in long log on times (or long boot times as per user complaints). Ensure that desktops connect to the correct domain controller. Use the VMware Logon Monitor to troubleshoot slow logons.
Custom log on scripts
Sometimes the log on scripts are not maintained and impacts the group of desktops or users in a way that log on times increase. Do not use custom log on scripts unless absolutely needed. Otherwise, spend time on maintaing them, especially across software upgrades which may cause these scripts to become unoperational or, in some cases, unresponsive.
Security vs Performance Trade-offs
Many organizations have reported significant responsiveness gains after disabling certain security patches. Only consider this if you can trade-off security for performance. Ensure that robust perimeter security measures are implemented in your organization before proceeding.
How to disable the spectre patch (results in improvement in log on time and overall performance):
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 1 /f reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f