What's New in Data Deduplication in Windows Server

 

Applies To: Windows Server 2012 R2

This topic describes features that were added to the Data Deduplication feature in Windows Server 2012 R2, including support for the optimization of live virtual hard disks (VHDs) for Virtual Desktop Infrastructure (VDI) workloads.

Introduced in Windows Server 2012, Data Deduplication involves finding and removing duplication within data without compromising its fidelity or integrity. The goal is to store more data in less space by segmenting files into small variable-sized chunks (32–128 KB), identifying duplicate chunks, and maintaining a single copy of each chunk. Redundant copies of the chunk are replaced by a reference to the single copy. The chunks are compressed and then organized into special container files in the System Volume Information folder.

For more information, see Data Deduplication Overview. For recommended usage scenarios, see Plan to Deploy Data Deduplication.

What’s new in Data Deduplication in Windows Server 2012 R2

The following table describes the changes in Data Deduplication functionality in Windows Server 2012 R2.

Functionality

New or updated?

Description

Data deduplication for remote storage of VDI workloads

New

Optimize active VHDs for VDI workloads by implementing Data Deduplication on Cluster Shared Volumes (CSVs).

Expand an optimized file on its original path

New

Use the Expand-DedupFile Windows PowerShell cmdlet to expand optimized files on a specified path if needed for compatibility with applications, performance, or other requirements.

Data deduplication for remote storage of VDI workloads

In Windows Server 2012 R2, Data Deduplication can be installed on a scale-out file server and used to optimize live VHDs for VDI workloads.

What value does this change add?

By optimizing CSV volumes for your VDI workloads, you can stretch the virtual machine capacity of your existing storage subsystem. Storage savings as great as 95 percent can be achieved by implementing Data Deduplication on live VHDs for VDI deployments.

Important

In Windows Server 2012 R2, the performance of VHDs that are optimized through Data Deduplication is fully tested and supported only on VDI workloads. The same performance gains are not guaranteed for non-VDI workloads on virtual machines running Hyper-V; nor does Microsoft offer support for these scenarios in Windows Server 2012 R2.

The space savings from data deduplication make it feasible to deploy solid-state drives (SSDs) (with their improved I/O) for VDI, and to simplify supporting infrastructure such as “just a bunch of disks” (JBOD) enclosures, cooling, and power.

By consolidating files, data deduplication can improve caching efficiency, and, as a result, I/O on the storage subsystem, for some types of operation.

For more information about the benefits of using data deduplication with VDI workloads, see Extending Data Deduplication to new workloads in Windows Server 2012 R2.

What works differently?

This feature is implemented through the new HyperV usage type for the Enable-DedupVolume cmdlet, which enables optimization to be performed on active VHD files. To enable the use of scale-out file servers for VDI workloads, data deduplication of CSVs is supported.

Improvements in write efficiency and faster optimization speeds were implemented to make optimization on active VHD files feasible. However, when Data Deduplication involves virtualization, the computer on which data deduplication is enabled cannot be the same server that is running Hyper-V. This ensures that optimization does not compete with the virtual machines for resources on the Hyper-V management operating system. For more information, see T:Deduplication.Enable-DedupVolume.

Your Hyper-V and VDI infrastructure can remain the same, with one exception: all VHD files for the virtual machines must be stored on a file server running Windows Server 2012 R2. The storage on that file server can be provided by directly attached disks (such as JBOD enclosures used with Storage Spaces), or it can be provided by a storage area network (SAN) or an iSCSI storage device. To help ensure that the storage is highly available, we recommend that you use a clustered file server with CSVs that provide storage for the VHDs. For procedures that describe how to set up Data Deduplication for a VHD workload, see Deploying Data Deduplication for VDI storage in Windows Server 2012 R2.

Expand an optimized file on its original path

The new Expand-DataDedupFile Windows PowerShell cmdlet enables you to expand optimized files on a specified path if needed for application compatibility, performance, or other requirements. The files are expanded on the original path. For more information, see T:Deduplication.Expand-DedupFile.

What value does this change add?

This gives you a way to expand individual files within an optimized volume if for any reason the optimized files are resulting in compatibility or performance issues.

What works differently?

This capability is new in Windows Server 2012 R2.

See also