Running Data Deduplication

Article
02/18/2022

Applies to: Windows Server 2022, Windows Server 2019, Windows Server 2016, Azure Stack HCI, versions 21H2 and 20H2

Running Data Deduplication jobs manually

You can run every scheduled Data Deduplication job manually by using the following PowerShell cmdlets:

Start-DedupJob: Starts a new Data Deduplication job
Stop-DedupJob: Stops a Data Deduplication job already in progress (or removes it from the queue)
Get-DedupJob: Shows all the active and queued Data Deduplication jobs

All settings that are available when you schedule a Data Deduplication job are also available when you start a job manually except for the scheduling-specific settings. For example, to start an Optimization job manually with high priority, maximum CPU usage, and maximum memory usage, execute the following PowerShell command with administrator privilege:

Start-DedupJob -Type Optimization -Volume <Your-Volume-Here> -Memory 100 -Cores 100 -Priority High

Monitoring Data Deduplication

Job successes

Because Data Deduplication uses a post-processing model, it is important that Data Deduplication jobs succeed. An easy way to check the status of the most recent job is to use the Get-DedupStatus PowerShell cmdlet. Periodically check the following fields:

For the Optimization job, look at LastOptimizationResult (0 = Success), LastOptimizationResultMessage, and LastOptimizationTime (should be recent).
For the Garbage Collection job, look at LastGarbageCollectionResult (0 = Success), LastGarbageCollectionResultMessage, and LastGarbageCollectionTime (should be recent).
For the Integrity Scrubbing job, look at LastScrubbingResult (0 = Success), LastScrubbingResultMessage, and LastScrubbingTime (should be recent).

Note

More detail on job successes and failures can be found in the Windows Event Viewer under \Applications and Services Logs\Windows\Deduplication\Operational.

Optimization rates

One indicator of Optimization job failure is a downward-trending optimization rate which might indicate that the Optimization jobs are not keeping up with the rate of changes, or churn. You can check the optimization rate by using the Get-DedupStatus PowerShell cmdlet.

Important

Get-DedupStatus has two fields that are relevant to the optimization rate: OptimizedFilesSavingsRate and SavingsRate. These are both important values to track, but each has a unique meaning.

OptimizedFilesSavingsRate applies only to the files that are 'in-policy' for optimization (space used by optimized files after optimization / logical size of optimized files).
SavingsRate applies to the entire volume (space used by optimized files after optimization / total logical size of the optimization).

Disabling Data Deduplication

To turn off Data Deduplication, run the Unoptimization job. To undo volume optimization, run the following command:

Start-DedupJob -Type Unoptimization -Volume <Desired-Volume>

Important

The Unoptimization job will fail if the volume does not have sufficient space to hold the unoptimized data.

Frequently Asked Questions

Is there a System Center Operations Manager Management Pack available to monitor Data Deduplication? Yes. Data Deduplication can be monitored through the System Center Management Pack for File Server. For more information, see the Guide for System Center Management Pack for File Server 2012 R2 document.

Running Data Deduplication