Performance Tuning for Remote Desktop Virtualization Hosts

Article
05/02/2016

Remote Desktop Virtualization Host (RD Virtualization Host) is a role service that supports Virtual Desktop Infrastructure (VDI) scenarios and lets multiple concurrent users run Windows-based applications in virtual machines that are hosted on a server running Windows Server 2012 R2 and Hyper-V.

Windows Server 2012 R2 supports two types of virtual desktops, personal virtual desktops and pooled virtual desktops.

In this topic:

General considerations
Performance optimizations

General considerations

Storage

Storage is the most likely performance bottleneck, and it is important to size your storage to properly handle the I/O load that is generated by virtual machine state changes. If a pilot or simulation is not feasible, a good guideline is to provision one disk spindle for four active virtual machines. Use disk configurations that have good write performance (such as RAID 1+0).

When appropriate, use Disk Deduplication and caching to reduce the disk read load and to enable your storage solution to speed up performance by caching a significant portion of the image.

Data Deduplication and VDI

Introduced in Windows Server 2012 R2, Data Deduplication supports optimization of open files. In order to use virtual machines running on a deduplicated volume, the virtual machine files need to be stored on a separate host from the Hyper-V host. If Hyper-V and deduplication are running on the same machine, the two features will contend for system resources and negatively impact overall performance.

The volume must also be configured to use the “Virtual Desktop Infrastructure (VDI)” deduplication optimization type. You can configure this by using Server Manager (File and Storage Services -> Volumes -> Dedup Settings) or by using the following Windows PowerShell command:

Enable-DedupVolume <volume> -UsageType HyperV

Note

Data Deduplication optimization of open files is supported only for VDI scenarios with Hyper-V using remote storage over SMB 3.0.

For more info on Data Deduplication, see Performance Tuning for Storage Subsystems.

Memory

Server memory usage is driven by three main factors:

Operating system overhead
Hyper-V service overhead per virtual machine
Memory allocated to each virtual machine

For a typical knowledge worker workload, guest virtual machines running x86 Window 8 or Windows 8.1 should be given ~512 MB of memory as the baseline. However, Dynamic Memory will likely increase the guest virtual machine’s memory to about 800 MB, depending on the workload. For x64, we see about 800 MB starting, increasing to 1024 MB.

Therefore, it is important to provide enough server memory to satisfy the memory that is required by the expected number of guest virtual machines, plus allow a sufficient amount of memory for the server.

CPU

When you plan server capacity for an RD Virtualization Host server, the number of virtual machines per physical core will depend on the nature of the workload. As a starting point, it is reasonable to plan 12 virtual machines per physical core, and then run the appropriate scenarios to validate performance and density. Higher density may be achievable depending on the specifics of the workload.

We recommend enabling hyper-threading, but be sure to calculate the oversubscription ratio based on the number of physical cores and not the number of logical processors. This ensures the expected level of performance on a per CPU basis.

Virtual GPU

Microsoft RemoteFX for RD Virtualization Host delivers a rich graphics experience for Virtual Desktop Infrastructure (VDI) through host-side remoting, a render-capture-encode pipeline, a highly efficient GPU-based encode, throttling based on client activity, and a DirectX-enabled virtual GPU. RemoteFX for RD Virtualization Host upgrades the virtual GPU from DirectX9 to DirectX11. It also improves the user experience by supporting more monitors at higher resolutions.

The RemoteFX DirectX11 experience is available without a hardware GPU, through a software-emulated driver. Although this software GPU provides a good experience, the RemoteFX virtual graphics processing unit (VGPU) adds a hardware accelerated experience to virtual desktops.

To take advantage of the RemoteFX VGPU experience on a server running Windows Server 2012 R2, you need a GPU driver (such as DirectX11.1 or WDDM 1.2) on the host server. For more information about GPU offerings to use with RemoteFX for RD Virtualization Host, contact your GPU provider.

If you use the RemoteFX virtual GPU in your VDI deployment, the deployment capacity will vary based on usage scenarios and hardware configuration. When you plan your deployment, consider the following:

Number of GPUs on your system
Video memory capacity on the GPUs
Processor and hardware resources on your system

RemoteFX server system memory

For every virtual desktop enabled with a virtual GPU, RemoteFX uses system memory in the guest operating system and in the RemoteFX-enabled server. The hypervisor guarantees the availability of system memory for a guest operating system. On the server, each virtual GPU-enabled virtual desktop needs to advertise its system memory requirement to the hypervisor. When the virtual GPU-enabled virtual desktop is starting, the hypervisor reserves additional system memory in the RemoteFX-enabled server for the VGPU-enabled virtual desktop.

The memory requirement for the RemoteFX-enabled server is dynamic because the amount of memory consumed on the RemoteFX-enabled server is dependent on the number of monitors that are associated with the VGPU-enabled virtual desktops and the maximum resolution for those monitors.

RemoteFX server GPU video memory

Every virtual GPU-enabled virtual desktop uses the video memory in the GPU hardware on the host server to render the desktop. In addition to rendering, the video memory is used by a codec to compress the rendered screen. The amount of memory needed is directly based on the amount of monitors that are provisioned to the virtual machine.

The video memory that is reserved varies based on the number of monitors and the system screen resolution. Some users may require a higher screen resolution for specific tasks. There is greater scalability with lower resolution settings if all other settings remain constant.

RemoteFX processor

The hypervisor schedules the RemoteFX-enabled server and the virtual GPU-enabled virtual desktops on the CPU. Unlike the system memory, there isn’t information that is related to additional resources that RemoteFX needs to share with the hypervisor. The additional CPU overhead that RemoteFX brings into the virtual GPU-enabled virtual desktop is related to running the virtual GPU driver and a user-mode Remote Desktop Protocol stack.

On the RemoteFX-enabled server, the overhead is increased, because the system runs an additional process (rdvgm.exe) per virtual GPU-enabled virtual desktop. This process uses the graphics device driver to run commands on the GPU. The codec also uses the CPUs for compressing the screen data that needs to be sent back to the client.

More virtual processors mean a better user experience. We recommend allocating at least two virtual CPUs per virtual GPU-enabled virtual desktop. We also recommend using the x64 architecture for virtual GPU-enabled virtual desktops because the performance on x64 virtual machines is better compared to x86 virtual machines.

RemoteFX GPU processing power

For every virtual GPU-enabled virtual desktop, there is a corresponding DirectX process running on the RemoteFX-enabled server. This process replays all the graphics commands that it receives from the RemoteFX virtual desktop onto the physical GPU. For the physical GPU, it is equivalent to simultaneously running multiple DirectX applications.

Typically, graphics devices and drivers are tuned to run a few applications on the desktop. RemoteFX stretches the GPUs to be used in a unique manner. To measure how the GPU is performing on a RemoteFX server, performance counters have been added to measure the GPU response to RemoteFX requests.

Usually when a GPU resource is low on resources, Read and Write operations to the GPU take a long time to complete. By using performance counters, administrators can take preventative action, eliminating the possibility of any downtime for their end users.

The following performance counters are available on the RemoteFX server to measure the virtual GPU performance:

RemoteFX graphics

Frames Skipped/Second - Insufficient Client Resources Number of frames skipped per second due to insufficient client resources
Graphics Compression Ratio Ratio of the number of bytes encoded to the number of bytes input

RemoteFX root GPU management

Resources: TDRs in Server GPUs Total number of times that the TDR times out in the GPU on the server
Resources: Virtual machines running RemoteFX Total number of virtual machines that have the RemoteFX 3D Video Adapter installed
VRAM: Available MB per GPU Amount of dedicated video memory that is not being used
VRAM: Reserved % per GPU Percent of dedicated video memory that has been reserved for RemoteFX

RemoteFX software

Capture Rate for monitor [1-4] Displays the RemoteFX capture rate for monitors 1-4
Compression Ratio Deprecated in Windows 8 and replaced by Graphics Compression Ratio
Delayed Frames/sec Number of frames per second where graphics data was not sent within a certain amount of time
GPU response time from Capture Latency measured within RemoteFX Capture (in microseconds) for GPU operations to complete
GPU response time from Render Latency measured within RemoteFX Render (in microseconds) for GPU operations to complete
Output Bytes Total number of RemoteFX output bytes
Waiting for client count/sec Deprecated in Windows 8 and replaced by Frames Skipped/Second - Insufficient Client Resources

RemoteFX vGPU management

Resources: TDRs local to virtual machines Total number of TDRs that have occurred in this virtual machine (TDRs that the server propagated to the virtual machines are not included)
Resources: TDRs propagated by Server Total number of TDRs that occurred on the server and that have been propagated to the virtual machine

RemoteFX virtual machine vGPU performance

Data: Invoked presents/sec Total number (in seconds) of present operations to be rendered to the desktop of the virtual machine per second
Data: Outgoing presents/sec Total number of present operations sent by the virtual machine to the server GPU per second
Data: Read bytes/sec Total number of read bytes from the RemoteFX-enabled server per second
Data: Send bytes/sec Total number of bytes sent to the RemoteFX-enabled server GPU per second
DMA: Communication buffers average latency (sec) Average amount of time (in seconds) spent in the communication buffers
DMA: DMA buffer latency (sec) Amount of time (in seconds) from when the DMA is submitted until completed
DMA: Queue length DMA Queue length for a RemoteFX 3D Video Adapter
Resources: TDR timeouts per GPU Count of TDR timeouts that have occurred per GPU on the virtual machine
Resources: TDR timeouts per GPU engine Count of TDR timeouts that have occurred per GPU engine on the virtual machine

In addition to the RemoteFX virtual GPU performance counters, you can also measure the GPU utilization by using Process Explorer, which shows video memory usage and the GPU utilization.

Performance optimizations

Dynamic Memory

Dynamic Memory enables more efficiently utilization of the memory resources of the server running Hyper-V by balancing how memory is distributed between running virtual machines. Memory can be dynamically reallocated between virtual machines in response to their changing workloads.

Dynamic Memory enables you to increase virtual machine density with the resources you already have without sacrificing performance or scalability. The result is more efficient use of expensive server hardware resources, which can translate into easier management and lower costs.

On guest operating systems running Windows 8 with virtual processors that span multiple logical processors, consider the tradeoff between running with Dynamic Memory to help minimize memory usage and disabling Dynamic Memory to improve the performance of an application that is computer-topology aware. Such an application can leverage the topology information to make scheduling and memory allocation decisions.

Tiered Storage

RD Virtualization Host supports tiered storage for virtual desktop pools. The physical computer that is shared by all pooled virtual desktops within a collection can use a small-size, high-performance storage solution, such as a mirrored solid-state drive (SSD). The pooled virtual desktops can be placed on less expensive, traditional storage such as RAID 1+0.

The physical computer should be placed on a SSD is because most of the read-I/Os from pooled virtual desktops go to the management operating system. Therefore, the storage that is used by the physical computer must sustain much higher read I/Os per second.

This deployment configuration assures cost effective performance where performance is needed. The SSD provides higher performance on a smaller size disk (~20 GB per collection, depending on the configuration). Traditional storage for pooled virtual desktops (RAID 1+0) uses about 3 GB per virtual machine.

CSV cache

Failover Clustering in Windows Server 2012 and Windows Server 2012 R2 provides caching on Cluster Shared Volumes (CSV). This is extremely beneficial for pooled virtual desktop collections where the majority of the read I/Os come from the management operating system. The CSV cache provides higher performance by several orders of magnitude because it caches blocks that are read more than once and delivers them from system memory, which reduces the I/O. For more info on CSV cache, see How to Enable CSV Cache.

Pooled virtual desktops

By default, pooled virtual desktops are rolled back to the pristine state after a user signs out, so any changes made to the Windows operating system since the last user sign-in are abandoned.

Although it’s possible to disable the rollback, it is still a temporary condition because typically a pooled virtual desktop collection is re-created due to various updates to the virtual desktop template.

It makes sense to turn off Windows features and services that depend on persistent state. Additionally, it makes sense to turn off services that are primarily for non-enterprise scenarios.

Each specific service should be evaluated appropriately prior to any broad deployment. The following are some initial things to consider:

Service	Why?
Auto update	Pooled virtual desktops are updated by re-creating the virtual desktop template.
Offline files	Virtual desktops are always online and connected from a networking point-of-view.
Background defrag	File-system changes are discarded after a user signs off (due to a rollback to the pristine state or re-creating the virtual desktop template, which results in re-creating all pooled virtual desktops).
Hibernate or sleep	No such concept for VDI
Bug check memory dump	No such concept for pooled virtual desktops. A bug-check pooled virtual desktop will start from the pristine state.
WLAN autoconfig	There is no WiFi device interface for VDI
Windows Media Player network sharing service	Consumer centric service
Home group provider	Consumer centric service
Internet connection sharing	Consumer centric service
Media Center extended services	Consumer centric service

Note

This list is not meant to be a complete list, because any changes will affect the intended goals and scenarios. For more info, see Hot off the presses, get it now, the Windows 8 VDI optimization script, courtesy of PFE!.

Note

SuperFetch in Windows 8 is enabled by default. It is VDI-aware and should not be disabled. SuperFetch can further reduce memory consumption through memory page sharing, which is beneficial for VDI. Pooled virtual desktops running Windows 7, SuperFetch should be disabled, but for personal virtual desktops running Windows 7, it should be left on.

Performance Tuning for Server Roles

Send comments about this topic to Microsoft

Performance Tuning for Remote Desktop Virtualization Hosts

General considerations

Storage

Data Deduplication and VDI

Memory

CPU

Virtual GPU

RemoteFX server system memory

RemoteFX server GPU video memory

RemoteFX processor

RemoteFX GPU processing power

Performance optimizations

Dynamic Memory

Tiered Storage

CSV cache

Pooled virtual desktops

Related topics

Additional resources