This blog post covers the scenarios and motivations that drive the backup of a Replica VM, and product guidance to administrators. Why backup a Replica VM? Ever since the advent of Hyper-V Replica in Windows Server 2012, customers have been interested in backing up the Replica VM. Traditionally, IT administrators have taken backups of the VM that contains the running workload (the primary VM) and backup products have been built to cater to this need. So when a significant proportion of customers talked about the backup of Replica VMs, we were intrigued. There are a few key scenarios where backup of a Replica VM becomes useful: Reduce the impact of backup on the running workload: Taking the backup of a VM involves the creation of a snapshot/diff-disk to baseline the changes that need to be backed up. For the duration of the backup job, the workload is running on a diff-disk and there is an impact on the system when that happens. By offloading the backup to the Replica site, the running workload is no longer impacted by the backup operation. Of course, this is applicable only to deployments where the backup copy is stored on the remote site. For example, the daily backup operation might store the data locally for quicker restore times, but monthly or quarterly backup for long-term retention that are stored remotely can be done from the Replica VM. Limited bandwidth between sites: This is typical of Branch Office-Home Office (BO-HO) kind of deployments where there are multiple smaller remote branch office sites and a larger central Home Office site. The backup data for the branch offices is stored in the home office, and an appropriate amount of bandwidth is provisioned by administrators to transfer the backup data between the two sites. The introduction of disaster recovery using Hyper-V Replica creates another stream of network traffic, and administrators have to re-evaluate their network infrastructure. In most cases, administrators either could not or were not willing to increase the bandwidth between sites to accommodate both backup and DR traffic. However they did come to the realization that backup and DR were independently sending copies of the same data over the network – and this was an area that could be optimized. With Hyper-V Replica creating a VM in the Home Office site, administrators could save on the network transfer by backing up the Replica VM locally rather than backing up the primary VM and sending the data over the network. Backup of all VMs in the Hoster datacenter: Some customers use the Hoster datacenter as the Replica site, with the intention of not building a secondary datacenter of their own. Hosters have SLAs around the protection of all customer VMs in their datacenters – typically once a day backup. Thus the backup of Replica VMs becomes a requirement for the success of their business. Thus various customer segments found that the backup of a Replica VM has value for their specific scenarios. Data consistency A key aspect of the backup operation is related to the consistency of the backed-up data. Customers have a clear prioritization and preference when it comes to data consistency of backed up VMs: Application-consistent backup Crash-consistent backup And this prioritization applied to Replica VMs as well. Conversations with customers indicated that they were comfortable with crash-consistency for a Replica VM, if application-consistency was not possible. Of course, anything less than crash-consistency was not acceptable and customers preferred that backups fail rather than have inconsistent data getting backed up. Attempting application-consistency Typical backup products try to ensure application-consistency of the data being backed up (using the VSS framework) – and this works out well when the VM is running. However, the Replica VM is always turned off until a failover is initiated, and VSS is unable to guarantee application-consistent backup for a Replica VM. Thus getting application-consistent backup of a Replica VM is not possible. Guaranteeing crash-consistency In order to ensure that customers backing up Replica VMs always get crash-consistent data, a set of changes were introduced in Windows Server 2012 R2 that failed the backup operation if consistency could not be guaranteed. The virtual disk could be inconsistent when any one of the below conditions are encountered, and in these cases backup is expected to fail. HRL logs are being applied to the Replica VM Previous HRL log apply operation was cancelled or interrupted Previous HRL log apply operation failed Replica VM health is Critical VM is in the Resynchronization Required state or the Resynchronization in progress state Migration of Replica VM is in progress Initial replication is in progress (between the primary site and secondary site) Failover is in progress Dealing with failures These are largely treated as transient error states and the backup product is expected to retry the backup operation based on its own retry policies. With 30 second replication and apply being supported in Windows Server 2012 R2, the backup operation is expected to collide with HRL log apply more frequently – resulting in error scenario 1 mentioned above. A robust retry mechanism is needed to ensure a high backup success rate. In case the backup product is unable to retry or cope with failures then an option is to explicitly pause the replication before the backup is scheduled to run. Backing up Replica VMs using DPM The backup of Replica VMs on Windows Server 2012 R2 hosts using Data Protection Manager 2012 R2 is now supported: “Backup of primary virtual machines is supported. Backup of replica (secondary) virtual machines is supported on Hyper-V servers running Windows Server 2012 R2”. The matrix below gives the clearest picture about what deployment configurations are supported and which ones are not: Host OS on Replica (secondary/tertiary) server Host OS on Primary server Windows Server 2012 Windows Server 2012 R2 Windows Server 2012 Windows Server 2012 R2 DPM 2012 Not supported Not supported Supported Not supported DPM 2012 R2 Not supported Supported Supported Supported DPM 2012 R2 along with Windows Server 2012 R2 provides the right experience for the backup of Replica VMs. It includes the changes to the platform that ensure crash-consistency of backups, and the appropriate retry mechanism to ensure a high success rate. This has been validated with internal tests involving 2 servers with different VM mixes that are backed up by a DPM server(below). Server 1 (Total 44 VMs) Server 2 (Total 36 VMs) DPM Server Primary VMs 10 12 Not applicable Replica VMs 4 20 Not applicable Non-replicating VMs 30 4 Not applicable Host OS Windows Server 2012 R2 Windows Server 2012 R2 Windows Server 2012 R2 Host RAM 144 GB 144 GB 72 GB Network bandwidth available for replication 10 Gbps 10 Gbps 1 Gbps Storage subsystem 6 TB FC SAN storage 2.5 TB FC SAN storage 6 TB Direct attached storage Number of VHDs 2-4 VHDs based on workload 2-4 VHDs based on workload Not applicable VM Workload Mix of SQL, Exchange, and IOMeter Mix of SQL, Exchange, and IOMeter Not applicable Total test duration 48 hours Backup frequency every 3 hours Number of backup points 16 backup points expected per VM at the end of 48 hours Number of VMs backed up by the DPM server Total: 80 VMs(Server 1: 44 VMs, Server 2: 36 VMs) Replication frequency 30s for all virtual machines Recovery History of Replica VM Variable (0-5 recovery points) We measured the actual number of backup points created per VM in DPM, and matched that with our expectation of 16 recovery points per VM for the 48 hour test duration. The results of the test were extremely positive, with close to 100% success rate of the backup jobs. Server 1 (Total 44 VMs) Server 2 (Total 36 VMs) Number of VMs Expected number of backup points Actual number of backup points Number of VMs Expected number of backup points Actual number of backup points ( A ) ( A*16 ) - ( B ) ( B*16 ) - Primary VMs 10 160 160 (100%) 12 192 192 (100%) Replica VMs 4 64 64 (100%) 20 320 300 (94%) Non-replicating VMs 30 480 480 (100%) 4 64 64 (100%) Takeaways Backup of Replica VMs is now a supported scenario. Only crash-consistent backup of a Replica VM is guaranteed. A robust retry mechanism needs to be configured in the backup product to deal with failures. Or ensure that replication is paused when backup is scheduled.