Table of contents
TOC
Collapse the table of content
Expand the table of content

Known issues with Storage Replica

Ned pyle|Last Updated: 4/5/2017
|
1 Contributor

Applies To: Windows Server 2016

This section discusses known issues with Storage Replica in Windows Server 2016.

After removing replication, disks are offline and you cannot configure replication again

In Windows Server 2016, you may be unable to provision replication on a volume that was previously replicated or may find un-mountable volumes. This may occur when some error condition prevents removal of replication or when you reinstall the operating system on a computer that was previously replicating data.

To fix, you must clear the hidden Storage Replica partition off the disks and return them to a writeable state using the Clear-SRMetadata cmdlet.

  • To remove all orphaned Storage Replica partition databases slots and remount all partitions, use the -AllPartitions parameter as follows:

    Clear-SRMetadata -AllPartitions  
    
  • To remove all orphaned Storage Replica log data, use the -AllLogs parameter as follows:

    Clear-SRMetadata -AllLogs  
    
  • To remove all orphaned failover cluster configuration data, use the -AllConfiguration parameter as follows:

    Clear-SRMetadata -AllConfiguration  
    
  • To remove individual replication group metadata, use the -Name parameter and specify a replication group as follows:

    Clear-SRMetadata -Name RG01 -Logs -Partition  
    

The server may need to restart after cleaning the partition database; you can suppress this temporarily with -NoRestart but you should not skip restarting the server if requested by the cmdlet. This cmdlet does not remove data volumes nor data contained within those volumes.

During initial sync, see event log 4004 warnings

In Windows Server 2016, when configuring replication, both the source and destination servers may show multiple StorageReplica\Admin event log 4004 warnings each during initial sync, with a status of "insufficient system resources exist to complete the API". You are likely to see 5014 errors as well. These indicate that the servers do not have enough available memory (RAM) to perform both initial synchronization as well as run workloads. Either add RAM or reduce the used RAM from features and applications other than Storage Replica.

When using guest clusters with Shared VHDX and a host without a CSV, virtual machines stop responding after configuring replication

In Windows Server 2016, when using Hyper-V guest clusters for Storage Replica testing or demonstration purposes, and using Shared VHDX as the guest cluster storage, the virtual machines stop responding after you configure replication. If you restart the Hyper-V host, the virtual machines start responding but replication configuration will not be complete and no replication will occur.

This behavior occurs when you are using fltmc.exe attach svhdxflt to bypass the requirement for the Hyper-V host running a CSV. Use of this command is not supported and is intended only for test and demonstration purposes.

The cause of the slowdown is a by-design interoperability issue between the Storage QoS feature in Windows Server 2016 and the manually attached Shared VHDX filter. To resolve this issue, disable the Storage QoS filter driver and restart the Hyper-V host:

SC config storqosflt start= disabled  

Cannot configure replication when using New-Volume and differing storage

When using the New-Volume cmdlet along with differing sets of storage on the source and destination server, such as two different SANs or two JBODs with differing disks, you may not be able to subsequently configure replication using New-SRPartnership. The error shown may include:

Data partition sizes are different in those two groups  

Use the New-Partition** cmdlet to create volumes and format them instead of New-Volume, as the latter cmdlet may round the volume size on differing storage arrays. If you have already created an NTFS volume, you can use Resize-Partition to grow or shrink one of the volumes to match the other (this cannot be done with ReFS volumes). If using Diskmgmt or Server Manager, no rounding will occur.

When attempting to use Test-SRTopology, you receive one of the following errors:

ERROR EXAMPLE 1:

WARNING: Invalid value entered for target computer name: sr-srv03. Test-SrTopology cmdlet does not accept IP address as  
input for target computer name parameter. NetBIOS names and fully qualified domain names are acceptable inputs  
WARNING: System.Exception  
WARNING:    at Microsoft.FileServices.SR.Powershell.TestSRTopologyCommand.BeginProcessing()  
Test-SRTopology : Invalid value entered for target computer name: sr-srv03. Test-SrTopology cmdlet does not accept IP  
address as input for target computer name parameter. NetBIOS names and fully qualified domain names are acceptable  
inputs  
At line:1 char:1  
+ Test-SRTopology -SourceComputerName sr-srv01 -SourceVolumeName d: -So ...  
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
    + CategoryInfo          : InvalidArgument: (:) [Test-SRTopology], Exception  
    + FullyQualifiedErrorId : TestSRTopologyFailure,Microsoft.FileServices.SR.Powershell.TestSRTopologyCommand  

ERROR EXAMPLE 2:

WARNING: Invalid value entered for source computer name

ERROR EXAMPLE 3:

The specified volume cannot be found G: cannot be found on computer SRCLUSTERNODE1

This cmdlet has limited error reporting in Windows Server 2016 and will return the same output for many common issues. The error can appear for the following reasons:

  • You are logged on to the source computer as a local user, not a domain user.
  • The destination computer is not running or is not accessible over the network.
  • You specified an incorrect name for the destination computer.
  • You specified an IP address for the destination server.
  • The destination computer firewall is blocking access to PowerShell and/or CIM calls.
  • The destination computer is not running the WMI service.
  • You did not use CREDSSP when running the Test-SRTopology cmdlet remotely from a management computer.
  • The source or destination volume specified are local disks on a cluster node, not clustered disks.

Configuring new Storage Replica partnership returns an error - "Failed to provision partition"

When attempting to create a new replication partnership with New-SRPartnership, you receive the following error:

New-SRPartnership : Unable to create replication group test01, detailed reason: Failed to provision partition ed0dc93f-107c-4ab4-a785-afd687d3e734.
At line: 1 char: 1
+ New-SRPartnership -SourceComputerName srv1 -SourceRGName test01
+ Categorylnfo : ObjectNotFound: (MSFT_WvrAdminTasks : root/ Microsoft/. . s) CNew-SRPartnership], CimException
+ FullyQua1ifiedErrorId : Windows System Error 1168 ,New-SRPartnership

This is caused by selecting a data volume that is on the same partition as the System Drive (i.e. the C: drive with its Windows folder). For instance, on a drive that contains both the C: and D: volumes created from the same partition. This is not supported in Storage Replica; you must pick a different volume to replicate.

Attempting to grow a replicated volume fails due to missing update

When attempting to grow or extend a replicated volume, you receive the following error:

PS C:\> Resize-Partition -DriveLetter d -Size 44GB
Resize-Partition : The operation failed with return code 8
At line:1 char:1
+ Resize-Partition -DriveLetter d -Size 44GB
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (StorageWMI:ROOT/Microsoft/.../MSFT_Partition
[Resize-Partition], CimException
+ FullyQualifiedErrorId : StorageWMI 8,Resize-Partition

If using the Disk Management MMC snapin, you receive this error:

Element not found

This occurs even if you correctly enable volume resizing on the source server using Set-SRGroup -Name rg01 -AllowVolumeResize $TRUE.

This issue was fixed in Cumulative Update for Windows 10 Version 1607 and Windows Server 2016: December 9, 2016 (KB3201845).

Attempting to grow a replicated volume fails due to missing step

If you attempt to resize a replicated volume on the source server without setting -AllowResizeVolume $TRUE first, you will receive the following errors:

PS C:\> Resize-Partition -DriveLetter I -Size 8GB
Resize-Partition : Failed
Activity ID: {87aebbd6-4f47-4621-8aa4-5328dfa6c3be}
At line:1 char:1
+ Resize-Partition -DriveLetter I -Size 8GB
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (StorageWMI:ROOT/Microsoft/.../MSFT_Partition) [Resize-Partition], CimException
    + FullyQualifiedErrorId : StorageWMI 4,Resize-Partition

Storage Replica Event log error 10307:

Attempted to resize a partition that is protected by Storage Replica .

DeviceName: \Device\Harddisk1\DR1
PartitionNumber: 7
PartitionId: {b71a79ca-0efe-4f9a-a2b9-3ed4084a1822}

Guidance: To grow a source data partition, set the policy on the replication group containing the data partition.

Set-SRGroup -ComputerName [ComputerName] -Name [ReplicationGroupName] -AllowVolumeResize $true

Before you grow the source data partition, ensure that the destination data partition has enough space to grow to an equal size. Shrinking of data partition protected by Storage Replica is blocked.

Disk Management Snap-in Error:

An unexpected error has occurred 

After resizing the volume, remember to disable resizing with Set-SRGroup -Name rg01 -AllowVolumeResize $FALSE. This parameter prevents admins from attempting to resize volumes prior to ensuring that there is sufficient space on the destination volume, typically because they were unaware of Storage Replica's presence.

Attempting to move a PDR resource between sites on an asynchronous stretch cluster fails

When attempting to move a physical disk resource-attached role - such as a file server for general use - in order to move the associated storage in an asynchronous stretch cluster, you receive an error.

If using the Failover Cluster Manager snap-in:

Error
The operation has failed.
The action 'Move' did not complete.
Error Code: 0x80071398
The operation failed because either the specified cluster node is not the owner of the group, or the node is not a possible owner of the group

If using the Cluster powershell cmdlet:

PS C:\> Move-ClusterGroup -Name sr-fs-006 -Node sr-srv07
Move-ClusterGroup : An error occurred while moving the clustered role 'sr-fs-006'.
The operation failed because either the specified cluster node is not the owner of the group, or the node is not a
possible owner of the group
At line:1 char:1
+ Move-ClusterGroup -Name sr-fs-006 -Node sr-srv07
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Move-ClusterGroup], ClusterCmdletException
+ FullyQualifiedErrorId : Move-ClusterGroup,Microsoft.FailoverClusters.PowerShell.MoveClusterGroupCommand

This occurs due to a by-design behavior in Windows Server 2016. Use Set-SRPartnership to move these PDR disks in an asynchronous stretched cluster.

Attempting to add disks to a two-node asymmetric cluster returns "No disks suitable for cluster disks found"

When attempting to provision a cluster with only two nodes, prior to adding Storage Replica stretch replication, you attempt to add the disks in the second site to the Available Disks. You recieve the following error:

"No disks suitable for cluster disks found. For diagnostic information about disks available to the cluster, use the Validate a Configuration Wizard to run Storage tests." 

This does not occur if you have at least three nodes in the cluster. This issue occurs because of a by-design code change in Windows Server 2016 for asymmetric storage clustering behaviors.

To add the storage, you can run the following command on the node in the second site:

Get-ClusterAvailableDisk -All | Add-ClusterDisk

The SMB Bandwidth limiter fails to throttle Storage Replica bandwidth

When specifying a bandwidth limit to Storage Replica, the limit is ignored and full bandwidth used. For example:

Set-SmbBandwidthLimit -Category StorageReplication -BytesPerSecond 32MB

This issue occurs because of an interoperability issue between Storage Replica and SMB. This issue should be fixed in a later update to Windows Server 2016. There is no workaround using Windows Server, but you can use external QoS network policies to attain the desired effect.

Event 1241 warning repeated during initial sync

When specifying a replication partnership is asynchronous, the source computer repeatedly logs warning event 1241 in the Storage Replica Admin channel. For example:

Log Name:      Microsoft-Windows-StorageReplica/Admin
Source:        Microsoft-Windows-StorageReplica
Date:          3/21/2017 3:10:41 PM
Event ID:      1241
Task Category: (1)
Level:         Warning
Keywords:      (1)
User:          SYSTEM
Computer:      sr-srv05.corp.contoso.com
Description:
The Recovery Point Objective (RPO) of the asynchronous destination is unavailable.

LocalReplicationGroupName: rg01
LocalReplicationGroupId: {e20b6c68-1758-4ce4-bd3b-84a5b5ef2a87}
LocalReplicaName: f:\
LocalPartitionId: {27484a49-0f62-4515-8588-3755a292657f}
ReplicaSetId: {1f6446b5-d5fd-4776-b29b-f235d97d8c63}
RemoteReplicationGroupName: rg02
RemoteReplicationGroupId: {7f18e5ea-53ca-4b50-989c-9ac6afb3cc81}
TargetRPO: 30

Guidance: This is typically due to one of the following reasons: 

The asynchronous destination is currently disconnected. The RPO may become available after the connection is restored.

The asynchronous destination is unable to keep pace with the source such that the most recent destination log record is no longer present in the source log. The destination will start block copying. The RPO should become available after block copying completes.

This is expected behavior during initial sync and can safely be ignored. This behavior may change in a later release. If you see this behavior during ongoing asynchronous replication, investigate the partnership to determine why replication is delayed beyond your configured RPO (30 seconds, by default).

Event 4004 warning repeated after rebooting a replicated node

Under rare and usually unreproducable circumstances, rebooting a server that is in a partnership leads to replication failing and the rebooted node logging warning event 4004 with an access denied error.

Log Name:      Microsoft-Windows-StorageReplica/Admin
Source:        Microsoft-Windows-StorageReplica
Date:          3/21/2017 11:43:25 AM
Event ID:      4004
Task Category: (7)
Level:         Warning
Keywords:      (256)
User:          SYSTEM
Computer:      server.contoso.com
Description:
Failed to establish a connection to a remote computer.

RemoteComputerName: server
LocalReplicationGroupName: rg01
LocalReplicationGroupId: {a386f747-bcae-40ac-9f4b-1942eb4498a0}
RemoteReplicationGroupName: rg02
RemoteReplicationGroupId: {a386f747-bcae-40ac-9f4b-1942eb4498a0}
ReplicaSetId: {00000000-0000-0000-0000-000000000000}
RemoteShareName:{a386f747-bcae-40ac-9f4b-1942eb4498a0}.{00000000-0000-0000-0000-000000000000}
Status: {Access Denied}
A process has requested access to an object, but has not been granted those access rights.

Guidance: Possible causes include network failures, share creation failures for the remote replication group, or firewall settings. Make sure SMB traffic is allowed and there are no connectivity issues between the local computer and the remote computer. You should expect this event when suspending replication or removing a replication partnership.

Note the Status: "{Access Denied}" and the message A process has requested access to an object, but has not been granted those access rights. This is a known issue within Storage Replica. As a workaround, simply restart the Storage Replica service.

We are working on providing an update that permanently resolves this issue. If you are interested in assisting us and you have a Microsoft Premier Support agreement, please email SRFEED@microsoft.com so that we can work with you on filing a backport request.

Error "Failed to bring the resource 'Cluster Disk x' online." with a stretch cluster

When attempting to bring a cluster disk online after a successful failover, where you are attempting to make the original source site primary again, you receive an error in Failover Cluster Manager. For example:

Error
The operation has failed.
Failed to bring the resource 'Cluster Disk 2' online.

Error Code: 0x80071397
The operation failed because either the specified cluster node is not the owner of the resource, or the node is not a possible owner of the resource.

If you attempt to move the disk or CSV manually, you receive an additional error. For example:

Error
The operation has failed.
The action 'Move' did not complete.

Error Code: 0x8007138d
A cluster node is not available for this operation

This issue is caused by one or more uninitialzed disks being attached to one or more cluster nodes. To resolve the issue, initialize all attached storage using DiskMgmt.msc, DISKPART.EXE, or the Initialize-Disk PowerShell cmdlet.

We are working on providing an update that permanently resolves this issue. If you are interested in assisting us and you have a Microsoft Premier Support agreement, please email SRFEED@microsoft.com so that we can work with you on filing a backport request.

GPT error when attempting to create a new SR partnership

When running New-SRPartnership, it fails with error:

Disk layout type for volume \\?\Volume{GUID}\ is not a valid GPT style layout.
New-SRPartnership : Unable to create replication group SRG01, detailed reason: Disk layout type for volume
\\?\Volume{GUID}\ is not a valid GPT style layout.
At line:1 char:1
+ New-SRPartnership -SourceComputerName nodesrc01 -SourceRGName SRG01 ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (MSFT_WvrAdminTasks:root/Microsoft/...T_WvrAdminTasks) [New-SRPartnership]
, CimException
+ FullyQualifiedErrorId : Windows System Error 5078,New-SRPartnership

In the Failover Cluster Manager GUI, there is no option to setup Replication for the disk.

When running Test-SRTopology, it fails with:

WARNING: Object reference not set to an instance of an object.
WARNING: System.NullReferenceException
WARNING:    at Microsoft.FileServices.SR.Powershell.MSFTPartition.GetPartitionInStorageNodeByAccessPath(String AccessPath, String ComputerName, MIObject StorageNode)
   at Microsoft.FileServices.SR.Powershell.Volume.GetVolume(String Path, String ComputerName)
   at Microsoft.FileServices.SR.Powershell.TestSRTopologyCommand.BeginProcessing()
Test-SRTopology : Object reference not set to an instance of an object.
At line:1 char:1
+ Test-SRTopology -SourceComputerName nodesrc01 -SourceVolumeName U: - ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Test-SRTopology], NullReferenceException
+ FullyQualifiedErrorId : TestSRTopologyFailure,Microsoft.FileServices.SR.Powershell.TestSRTopologyCommand 

This is caused by the cluster functional level still being set to Windows Server 2012 R2 (i.e. FL 8). Storage Replica is supposed to return a specific error here but instead returns an incorrect error mapping.

Run Get-Cluster | fl * on each node.

If ClusterFunctionalLevel = 9, that is the Windows 2016 ClusterFunctionalLevel version needed to implement Storage Replica on this node. If ClusterFunctionalLevel is not 9, the ClusterFunctionalLevel will need to be updated in order to implement Storage Replica on this node.

To resolve the issue, raise the cluster functional level by running the PowerShell cmdlet: Update-ClusterFunctionalLevel https://technet.microsoft.com/itpro/powershell/windows/failoverclusters/update-clusterfunctionallevel

See also

© 2017 Microsoft