Share via


Storage Replica Overview

 

Updated: November 19, 2015

This topic provides an overview of the new Storage Replica feature in Windows Server 2016 Technical Preview.

Goals & supported replication scenarios

This guide outlines how your business can benefit from this new functionality and the different replication scenarios that are supported by Storage Replica. It assumes that you have a previous working knowledge of Windows Server, Failover Clustering, File Servers, and Hyper-V, to include basic administration.

Supported Storage Replica scenarios in Windows Server 2016 Technical Preview

Using this guide and Windows Server 2016 Technical Preview, you can deploy storage replication in a stretch cluster, between cluster-to-cluster, and from server-to-server scenarios (see Figures 1-3). To reiterate from the Windows Server 2016 Technical Preview EULA, this feature is provided “AS-IS” and is not supported in production environments.

Storage_SR_StretchCluster

FIGURE 1: Storage replication in a stretch cluster using Storage Replica

Cluster to Cluster Replication

FIGURE 2: Cluster-to-cluster storage replication using Storage Replica

Server to Server Storage Replication

FIGURE 3: Server-to-server storage replication using Storage Replica

Note

You can also configure server-to-self replication, using four separate volumes on one computer. However, this guide does not cover this scenario.

Storage Replica Features

Windows Server 2016 Technical Preview implements the following features in Storage Replica:

Feature

Details

Type

Host-based

Synchronous

Yes

Asynchronous

Yes (server to server only)

Storage hardware agnostic

Yes

Replication unit

Volume (Partition)

Windows Server Stretch Cluster creation

Yes

Server to server replication

Yes

Cluster to cluster replication

Yes

Transport

SMB3

Network

TCP/IP or RDMA

RDMA

iWARP*, InfiniBand*

Replication network port firewall requirements

Single IANA port (TCP 445 or 5445)

Multipath/Multichannel

Yes (SMB3)

Kerberos support

Yes (SMB3)

Over the wire encryption and signing

Yes (SMB3)

Per-volume failovers allowed

Yes

Management UI in-box

Windows PowerShell, Failover Cluster Manager

*Subject to further testing. Both InfiniBand and iWARP may require additional long haul equipment and cabling.

Background

High level industry terms

Disaster Recovery (DR) refers to a contingency plan for recovering from site catastrophes so that the business continues to operate. Data DR means multiple copies of production data in a separate physical location. For example, a stretch cluster, where half the nodes are in one site and half are in another. Disaster Preparedness (DP) refers to a contingency plan for preemptively moving workloads to a different location prior to an oncoming disaster, such as a hurricane.

Service level agreements (SLAs) define the availability of a business’ applications and their tolerance of down time and data loss during planned and unplanned outages. Recovery Time Objective (RTO) defines how long the business can tolerate total inaccessibility of data. Recovery Point Objective (RPO) defines how much data the business can afford to lose.

Synchronous Replication

Synchronous replication guarantees that the application writes data to two locations at once before completion of the IO. This replication is more suitable for mission critical data, as it requires network and storage investments, as well as a risk of degraded application performance. Synchronous replication is suitable for both HA and DR solutions.

When application writes occur on the source data copy, the originating storage does not acknowledge the IO immediately. Instead, those data changes replicate to the remote destination copy and return an acknowledgement. Only then does the application receive the IO acknowledgment. This ensures constant synchronization of the remote site with the source site, in effect extending storage IOs across the network. In the event of a source site failure, applications can failover to the remote site and resume their operations with assurance of zero data loss.

Mode

Diagram

Steps

Synchronous

Zero Data Loss

RPO

Storage_SR_SynchronousV2

  1. Application writes data

  2. Log data is written and the data is replicated to the remote site

  3. Log data is written at the remote site

  4. Acknowledgement from the remote site

  5. Application write acknowledged

t & t1 : Data flushed to the volume, logs always write through

Asynchronous Replication

Contrarily, asynchronous replication means that when the application writes data, that data replicates to the remote site without immediate acknowledgment guarantees. This mode allows faster response time to the application as well as a DR solution that works geographically.

When the application writes data, the replication engine captures the write and immediately acknowledges to the application. The captured data then replicates to the remote location. The remote node processes the copy of the data and lazily acknowledges back to the source copy. Since replication performance is no longer in the application IO path, the remote site’s responsiveness and distance are less important factors. There is risk of data loss if the source data is lost and the destination copy of the data was still in buffer without leaving the source.

With its higher than zero RPO, asynchronous replication is less suitable for HA solutions like Failover Clusters, as they are designed for continuous operation with redundancy and no loss of data.

Mode

Diagram

Steps

Asynchronous

Near zero data loss

(depends on multiple factors)

RPO

Storage_SR_AsynchronousV2

  1. Application writes data

  2. Log data written

  3. Application write acknowledged

  4. Data replicated to the remote site

  5. Log data written at the remote site

  6. Acknowledgement from the remote site

t & t1 : Data flushed to the volume, logs always write through

Changes and improvements between Technical Preview 1 and Technical Preview 2

  • Cluster to cluster replication. You can now replicate between two separate clusters, including those using Storage Spaces Direct.

  • Updated stretch cluster provisioning. The Failover Cluster Management Storage Replica provisioning wizard experience now includes a more logical flow and improved guidance.

  • Additional Windows PowerShell options. Cmdlets now fully support remoting as well as implanting additional features. A new Test-SRTopology cmdlet now ensures your servers meet the requirements for Storage Replica and assists you in determining log sizes. It outputs an HTML report.

  • Improved provisioning experience. Enabling replication no longer takes the source volume offline during initial provisioning. This means no blocked application IOs during the source server initial configuration phase.

  • Improved removal and re-provisioning experience. Removing replication from a stretch cluster now supports Failover Cluster Manager and no longer requires Windows PowerShell. Removal of replication will not routinely leave orphaned metadata that prevents reconfiguring of replication on the same volumes.

  • Updated diagnostic and logging information. Failover Cluster Manager and Windows PowerShell report more details around replication state. The event log system contains better guidance and messaging.

  • Updated and consistent naming. All management references to Windows Volume Replication and WVR now use the Storage Replica and Storage Replica format.

  • Improved replication performance. Synchronous replication performance, especially for smaller IOs, is better.

  • Server to self. You can configure a server to replicate to itself, using separate source and destination volumes.

Changes and improvements between Technical Preview 2 and Technical Preview 3 & 4

  • Failover Cluster Manager updates.:

    • The snap-in now contains a storage column showing disk partition format (GPT versus MBR).

    • Failover Cluster Manager now shows all disks in a replication group.

    • You can choose to display ineligible disks and see the reason disks are not eligible for replication.

    • There is a new wizard that allows you to add additional disks to an existing replication group.

    • There is a new wizard page to configure consistency groups (write ordering) during creation of a replication group.

    • A property page now exists for replication on the log disks that allows you to change log sizes.

    • A property page now exists for replication on the data disks that shows if consistency groups are configured.

    • Attempting to configure replication with one more nodes missing the Storage Replica feature is now prevented.

    • The snap-in is now much more responsive and reliable when managing Storage Replica.

  • Additional Windows PowerShell options.

    • Cmdlets now fully support remoting in all scenarios, including standalone servers.

    • A new Clear-SRMetadata cmdlet allows simplified removal or orphaned partition database entries, logs, and cluster configurations.

    • The Test-SRTopology cmdlet has been significantly revised to include further tests and requirements checks and use much-improved initial sync and log size estimation algorithms.

    • The Export-SRConfiguration cmdlet was added.

    • The new Get-SRAccess, Grant-SRAccess, and Remove-SRAccess cmdlets allow easy configuration of the security needed for cluster-to-cluster replication.

  • Seeding. You can now specify that a destination volume already contains matching or similar blocks to the source volume during setup of a new partnership. Storage Replica then sends only the differing blocks. This happens automatically when recovering from a long term offline destination where the logs have wrapped.

  • Consistency groups. Write ordering (aka consistency group) support added for new replication groups, in order to meet guarantee requirements of products like Microsoft SQL Server when used with multiple replicated volumes.

  • Updated state. The health and status state values shown in Windows PowerShell and failover cluster manager have been improved based on customer feedback for easier readability and understanding.

  • Progress. The Group objects shown by (Get-SRGroup).Replicas now shows replication progress for bytes copied (NumOfBytesRecovered, NumOfBytesRemaining) during initial sync or resync.

  • Improved replication performance. Synchronous replication performance is better.

  • Stretch Cluster site awareness. Windows Failover Clusters now implement site awareness and preferred site properties for the nodes. This can be used by Storage Replica and Hyper-V for failover policies.

  • Fixes. Many bug fixes, stability improvements, provisioning, and de-provisioning improvements, and known issues resolved.

Key Evaluation Points and Behaviors

  • Test only. You cannot deploy Storage Replica in production environments using Windows Server 2016 Technical Preview. This version is only for evaluation purposes in a test lab environment.

  • Performance. The Windows Server 2016 Technical Preview version of Storage Replica has not been fully optimized for performance.

  • Network bandwidth and latency with fastest storage. There are physical limitations around synchronous replication. Because SR implements an IO filtering mechanism using logs and requiring network round trips, synchronous replication is likely make application writes slower. By using low latency, high-bandwidth networks as well as high-throughput disk subsystems for the logs, you will minimize performance overhead.

  • The destination volume is not accessible while replicating. When you configure replication, the destination volume dismounts, making it inaccessible to any writes by users or visible in typical interfaces like File Explorer. Block-level replication technologies are incompatible with allowing access to the destination target’s mounted file system in a volume; NTFS and ReFS do not support users writing data to the volume while blocks change underneath them.

  • The Microsoft implementation of asynchronous replication is different than most. Most industry implementations of asynchronous replication rely on snapshot-based replication, where periodic differential transfers move to the other node and merge. SR asynchronous replication operates just like synchronous replication, except that it removes the requirement for a serialized synchronous acknowledgment from the destination. This means that SR theoretically has a lower RPO as it continuously replicates. However, this also means it relies on internal application consistency guarantees rather than using snapshots to force consistency in application files. SR guarantees crash consistency in all replication modes

  • Storage Replica is not DFSR. Volume-level block storage replication are not good candidates for branch office scenarios. Branch office networks tend to be highly latent, highly utilized, and lower bandwidth, which makes synchronous replication impractical. Branch office often replicates data in a “one-to-many” configuration with read-only targets, such as for software distribution, and SR is not capable of this in the first release. When replicating data from a branch office back to a main office, SR dismounts the destination volume to prevent direct access.

    It is important to note, nevertheless, that many customers use DFSR as a disaster recovery solution even though often impractical for that scenario - DFSR cannot replicate open files and is designed to minimize bandwidth usage at the expense of performance, leading to large recovery point deltas. SR may allow you to retire DFSR from some of these types of disaster recovery duties.

  • Storage Replica is not backup. Some IT environments deploy replication systems as backup solutions, due to their zero data loss options when compared to daily backups. SR replicates all changes to all blocks of data on the volume, regardless of the change type. If a user deletes all data from a volume, SR will replicate the deletion instantly to the other volume, irrevocably removing the data from both servers. Do not use SR as a replacement for a point-in-time backup solution.

  • Storage Replica is not Hyper-V Replica or SQL AlwaysOn Availability Groups. Storage Replica is a general purpose, storage-agnostic engine. By definition, it cannot tailor its behavior as ideally as application-level replication. This may lead to specific feature gaps that encourage you to deploy or remain on specific application replication technologies.

Note

This document contains a list of known issues and expected behaviors as well as a frequently asked questions section.

Storage Replica terminology

This guide frequently uses the following terms:

  • The source is a computer’s volume that allows local writes and replicates outbound. Also known as “primary”.

  • The destination is a computer’s volume that does not allow local writes and replicates inbound. Also known as secondary 

  • A replication partnership is the synchronization relationship between a source and destination computer for one or more volumes.

  • A replication group is the organization of volumes and their replication configuration within a partnership, on a per server basis. A group may contain one or more volumes.

See Also