Virtualization

Backup and Disaster Recovery for Server Virtualization

Adam Fazio

 

At a Glance:

  • Considerations for disaster recovery planning
  • High availability solutions for disaster recovery
  • Backing up and restoring with Windows Server BackupItem

Contents

Disaster Recovery Planning 101
Disaster Recovery and Virtualization
Physical-to-Virtual Conversion
Virtual Machine Snapshots
Backing up Hyper-V
Windows Server Backup
Backing up VMs with WSB
Considerations
Restoring VMs with WSB
Data Protection Manager
Scripted Backups
DiskShadow
Wrap-Up

As server virtualization technology evolves and industry adoption increases, organizations are recognizing benefits reaching far beyond the most popular virtualization justification: reduced infrastructure costs and increased IT agility. The next frontier is using the virtualization platform as a way to enable or enhance disaster recovery (DR) strategies.

Why is DR preparedness consistently one of the hottest topics facing the IT industry? Studies suggest that companies lose an average of $80,000 to $90,000 per hour of downtime, and that very few companies suffering from a catastrophic data loss survive long-term. This article will provide an introduction to DR using the Microsoft virtualization platform, as well as a deeper look into backup and restore options and considerations for Windows Server 2008 Hyper-V.

Disaster Recovery Planning 101

DR is the process of restoring critical services in the event of an outage and should be a part of every company's business continuity plan, which defines how the company will continue to function during or following such a disaster. These plans are the cornerstone of any DR initiative.

Some vendors claim that their DR automation technology minimizes or eliminates the need for a detailed and well-rehearsed plan. While it's fair to say that automation can improve recovery times and lessen the dependence on human intervention, let's pause for a public service announcement: you cannot successfully mitigate a disaster with technology alone. The people and processes are always as important as the technologies.

In fact, you will find it nearly impossible to select the right technologies without first knowing all the constraints and objectives that arise from the DR planning process. It's outside the scope of this article to define an entire DR plan, but I do want to emphasize those elements necessary to select the right technologies and implementations. So let's quickly outline some critical technology drivers within a DR plan.

Service Definitions and Prioritization What exactly defines the entire service you are trying to protect and how critical is it to the organization? Figure 1 shows examples of company services that would likely be included in any DR plan.

Figure 1 Service definitions and prioritization example

Service Primary Components Dependencies Business Use SLA
Public Web site Network load balancer, three Web servers, two database servers DNS, network, firewall, directory, storage area network (SAN) storage Product purchasing and order tracking; eCommerce; customer support portal; company information 99.99%
Financial system Two database servers, one application server DNS, network Recording company revenue, as required by laws and regulations 99.99%
E-mail system Three e-mail servers, one Web server DNS, network, firewall, directory Company communication; customer support 99.5%

Once you've defined the services, you can begin to identify which systems and dependencies to target for what kind of DR strategies. It could be that after you look at the entire set of services and dependencies, you find you need to consider a few different levels of DR capability, as a single DR solution for all mission-critical services would be too expensive and complex.

Service Level Agreements (SLAs) An SLA is an agreement or contract between the service provider (IT) and the customer (the organization) that defines the availability targets for a given service. These can be very lengthy or quite short and sweet; for example, "The e-mail system will be available 99.95 percent of the time during general business hours and 98 percent of the time during non-general business hours, excluding scheduled maintenance windows, measured monthly." Usually SLAs are broken out into tiers that IT services can be assigned to, measured over a pre-defined period of time.

Operating Level Agreements (OLAs) OLAs basically describe the agreements between different IT groups working to support an SLA, including the process and response times for delivery of their services. Suppose you have a mission-critical Web site with an SLA target of 99.99 percent, but a database it depends on for its content has only 95 percent as its availability target. The OLA helps clarify these mismatches and align IT teams toward the same goal.

Recovery Point and Recovery Time Objectives (RPOs/RTOs) An RTO defines how long a service can be unavailable before there is a break in continuity, while an RPO defines what the organization considers to be an acceptable level of data loss. Therefore, if a service has an SLA of 99 percent measured monthly, it has an RTO of 7 hours, 18 minutes. If you combine that with an RPO of, say, 24 hours, you can now accurately define your backup techniques and schedules.

Data Retention Policies An organization's data retention polices specify exactly how long to keep the backups and where to store them. They are usually driven by legal and regulatory requirements.

Data Categorization You should also think about the nature of the data. If you put your data into categories, you can quickly see that not all data requires the same level of DR consideration. For instance, a single database may have different availability requirements than an Active Directory with multiple domain controllers, each containing a replica of the directory. Similarly, file-server data may have very different restore procedures than CRM data.

Disaster Scenarios It's important to define all the scenarios you want to plan for, since each one will have different restore procedures, business impact, and associated costs. It's helpful to look at all the possible scenarios, and then decide which ones you want to target when working on DR planning for your environment:

  • Loss of an entire site
  • Loss of a single datacenter
  • Loss of a system (operating system or hardware failure)
  • Loss of data (data deletion or corruption)
  • Loss of a critical dependency

Clearly there are very different considerations for recovering from the loss of an entire site as opposed to a single system. You will also want to define recovery thresholds based on your SLAs. For instance, let's say an entire site is offline due to a major ISP network outage. If the SLA for the affected service is 8 hours for service restoration and 48 hours for data restoration, perhaps you would perform service failover procedures to your backup site but actually not go through the data recovery process, as you would anticipate failing back to the production site fairly quickly.

Whew! All that work and we haven't even talked about technology yet! The criticality of planning is not to be underestimated. A DR implementation with no documented plan is just a "DR hope."

Disaster Recovery and Virtualization

OK, now that we have the basics of DR planning down, what does virtualization add to the picture? Many companies report service restoration times with virtualized servers in minutes as opposed to days or weeks for their physical counterparts. Because the entire server operating system is now just a set of files, abstracted from the underlying physical hardware, new doors open when considering recoverability.

A prevalent theory today is that some or all DR goals can be met with High Availability (HA) solutions. The idea behind this is that if you have cluster nodes in separate physical locations with data synchronized between the sites, in the event of a failure the passive node can resume operations and you can recover in near real time.

This is true, but if you recall the disaster scenarios defined earlier, it is clear that this is not a solution for all of them. You need a combination of technologies to prepare for all of the scenarios, and this generally includes some type of regular backups. HA does not protect against all possible outages, and it does not completely obviate the need for some type of backup strategy.

HA with Hyper-V requires that you carefully plan the storage layer because this is a critical factor in order for recoverability to take place. For example, a 2-node Hyper-V cluster with shared storage still has the storage sub-system and data as a single point of failure, even if the cluster nodes are in separate datacenters.

However, you should know that the same 2-node Hyper-V cluster with non-shared storage is able to survive storage or data loss on one or the other nodes. This does require replication technologies in order to keep the storage in sync, and it introduces complexities as well (see Figure 2).

fig02.gif

Figure 2 Multi-site Hyper-V cluster (Click the image for a larger view)

There are some very interesting developments in the area of data replication and synchronization, but it's not something Micro­soft currently provides. On the Windows Server 2008 Multi-Site Clustering page (microsoft.com/windowsserver2008/en/us/clustering-multisite.aspx), the showcased partners are worth a look. Another resource is the Windows Server Catalog (see windowsservercatalog.com), which lists storage vendors with replication technologies certified for Windows Server 2008.

As you can see, there are many possible HA and storage configurations to consider. Again, this is why you need to have your business requirements defined and allow those to drive the technical requirements, rather than the other way around.

Physical-to-Virtual Conversion

Virtualization clearly offers some unique recovery agility, but what about physical systems that are not good virtualization candidates? Included in System Center Virtual Machine Manager (SCVMM) is the ability to perform physical-to-virtual (P2V) conversions of running Windows servers that result in a bootable Hyper-V virtual machine (VM) that is an exact replica of the physical source server. Now you can replicate this VM just like its virtualized counterparts across campus or across the country and achieve similar recovery times.

This approach is different than traditional bare-metal restore in that your recovery location no longer needs the same number or type of physical systems as your production location. So you can over-subscribe your recovery hardware and scale it out as needed depending on the impact of the disaster.

Although SCVMM does not include a scheduler for P2V conversions, since the GUI runs entirely on top of Windows PowerShell this can be easily scripted using the New-P2V cmdlet. In fact, all wizards in SCVMM will show the code they are using to execute a job, and you can copy the code from a test P2V in your environment and modify it for future automated use. Figure 3 shows some sample code; you can run the SCVMM P2V wizard in your environment to get a unique, customizable Windows PowerShell script.

Figure 3 Code produced by the SCVMM P2V wizard

$Credential = get-credential

New-MachineConfig -VMMServer <VMM SERVER> -SourceComputerName "<SOURCE P2V SERVER>" 
-Credential $Credential -RunAsynchronously 

$VMHost = Get-VMHost -VMMServer <VMM SERVER> | where {$_.Name -eq "<TARGET HYPER-V HOST>"}
$MachineConfig = Get-MachineConfig -VMMServer <VMM SERVER> | where {$_.Name -eq "<SOURCE P2V SERVER>"}

New-P2V -VMMServer <VMM SERVER> -VMHost $VMHost -RunAsynchronously -JobGroup 
e823f50d-dbc7-4a41-9087-fb01bb44dc26 -SourceNetworkConnectionID "00:14:D1:3C:66:2F" 
-PhysicalAddress "00:14:D1:3C:66:2F" -PhysicalAddressType Static -VirtualNetwork "External" 
-MachineConfig $MachineConfig 

$VMHost = Get-VMHost -VMMServer <VMM SERVER> | where {$_.Name -eq "<TARGET HYPER-V HOST>"}
$MachineConfig = Get-MachineConfig -VMMServer <VMM SERVER> | where {$_.Name -eq "<SOURCE P2V SERVER>"}

New-P2V -VMMServer <VMM SERVER> -VMHost $VMHost -RunAsynchronously -JobGroup 
e823f50d-dbc7-4a41-9087-fb01bb44dc26 -VolumeDeviceID "C" -Dynamic -IDE -Bus 0 -LUN 0 -MachineConfig $MachineConfig 

$Credential = get-credential
$VMHost = Get-VMHost -VMMServer <VMM SERVER> | where {$_.Name -eq "<TARGET HYPER-V HOST>"}
$MachineConfig = Get-MachineConfig -VMMServer <VMM SERVER> | where {$_.Name -eq "<SOURCE P2V SERVER>"}

New-P2V -Credential $Credential -VMMServer <VMM SERVER> -VMHost $VMHost -Path 
"C:\ProgramData\Microsoft\Windows\Hyper-V" -Owner "DOMAIN\username" -RunAsynchronously -JobGroup 
e823f50d-dbc7-4a41-9087-fb01bb44dc26 -Trigger -Name "<SOURCE P2V SERVER>" -MachineConfig 
$MachineConfig -CPUCount 1 -MemoryMB 512 -RunAsSystem -StartAction NeverAutoTurnOnVM 
-UseHardwareAssistedVirtualization $false -StopAction SaveVM 

Virtual Machine Snapshots

Although not technically a backup, a VM snapshot provides a point in time to which you can revert back using differencing disks and a copy of the VM configuration file. If the disaster involves accidental data deletion inside the VM, this can be considered a DR feature as the VM can be rolled back to the snapshot, undoing the damage. (We'll take a look at Volume Shadow Copy Service, or VSS, snapshots later.)

Backing up Hyper-V

Host-Based Backups One exciting benefit of server virtualization is the prospect of no longer having to individually back up the virtualized systems. Now that these systems are simply files living on a host's file system, you can just back up the files and call it a day, right? Not exactly. Because these are live computers consisting of in-memory data, data on disk, system configurations, and open files, there are a few things to consider. So how do we ensure backup data consistency given all these moving parts?

A significant improvement to the Windows Server backup story came with Windows Server 2003 and the advent of VSS, which provides a standard set of extensible APIs that VSS writers (hooks in applications and services that help provide consistent shadow copies) use in order to create backups of open files and applications. With the help of the VSS service, providers, and writers, the backup application can generate a point-in-time copy of a volume very quickly, one that the application is aware of and can process appropriately.

Hyper-V comes with its own VSS writer that allows software makers to create compelling backup solutions. The writer lets backup applications achieve host-based VSS backups of running VMs. If the operating system running within the VM has the Hyper-V Integration Components installed as well as the VSS service (available in Windows XP SP1 and Windows Server 2003 and later), the host-based backup will occur as if it were run inside the guest; the backup will be performed with the VM running and the data will be consistent (see Figure 4).

fig04.gif

Figure 4 VSS backup (Click the image for a larger view)

However, if the guest operating system does not support the Integration Components or VSS, the backup process requires that the guest machine be put into a saved state and that a host-based VSS snapshot is taken of VM data files that can be used for point-in-time recovery. Saved-state VSS snapshots will incur some VM downtime (this can typically be limited to 5-10 minutes), with full backup-to-tape procedures taking place against the VSS copy of the data.

Guest-Based Backups In a physical environment, servers and applications need to be backed up on an individual basis, and such backups can certainly continue in a virtualized datacenter. In this situation, the same considerations need to be taken into account when backing up a VM, such as network capacity requirements for network-based backups and performance impact to the system during the backup window. With guest-based backups, you can choose to have a dedicated physical NIC in the host that is bound to a virtual network that all guests use.

Windows Server Backup

Included with Windows Server 2008 is the VSS-capable Windows Server Backup (WSB), which can be used to perform Hyper-V host- and guest-based backups of your VMs. Because it's fully VSS-capable, it can perform host-based backups of your running VMs, which of course is preferable.

But if you have VMs without the Integration Components installed, VSS will not be used. In that case, you have a couple of options from which to choose. You can still use WSB to back up a VM that does not have the Integration Components installed, which means the VM's state will be saved and then the backup will grab the VM's virtual disks and configuration files.

However, this may not be desirable with an application such as Exchange because the application will not be aware a backup has run and application logs will not be truncated. Moreover, downtime will occur on the VM, which will vary depending on how long the backup takes.

Alternatively, a backup can be run from inside the VM just as if it were a physical machine using either NTBackup or WSB, depending on the VM's OS. Let's see how to use WSB for supported guests that have the Integration Components installed.

Backing up VMs with WSB

Hyper-V does not automatically register its VSS writer for use with WSB. You must manually add the registry key and value shown in Figure 5 before WSB will support a Hyper-V backup. You can add them via the command line, like so:

reg add "HKLM\Software\Microsoft\windows nt\
  currentversion\WindowsServerBackup\Application
  Support\{66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE}"
reg add "HKLM\Software\Microsoft\windows nt\
  currentversion\WindowsServerBackup\Application
  Support\{66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE}" /v
  "Application Identifier" /t REG_SZ /d Hyper-v

Figure 5 Key and value for registering the Hyper-V VSS writer

Path Registry Key or Value Type
HKLM\Software\Microsoft\windows nt\currentversion\WindowsServerBackup\ Application Support\ {66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE} Key n\a
HKLM\Software\Microsoft\windows nt\currentversion\WindowsServerBackup\ Application Support\{66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE}\Application Identifier Value REG_SZ (Hyper-V, for example)

This does not require a reboot, as WSB searches for this key/value at the backup runtime. The following command will show if the entry has been set:

reg query "HKLM\Software\Microsoft\windows nt\
  currentversion\WindowsServerBackup\Application
  Support\{66841CD4-6DED-4F4B-8F17-FD23F8DDC3DE}" /s

Here's how to install WSB. Click Start | Server Manager. In the left pane, click Features and then click Add Features in the right pane. On the Select Features page, expand Windows Server Backup Features and select the checkboxes for Windows Server Backup and Command-line Tools. Now follow these steps to configure your backup:

  1. Go to Start | Administrative Tools | Windows Server Backup.
  2. If backing up a remote host, choose Connect to Another Computer and type in the Hyper-V host.
  3. Choose either Backup Once or Backup Schedule.
  4. Select Backup configuration—Full server or Custom. If you chose Custom, make sure you get all volumes that contain data related to the VM you are backing up, including VM configuration data, virtual disks, and snapshots.
  5. Choose the location to store the backup.
  6. Choose either VSS Full or Copy Backup. For host-based backups where no other backups are occurring with the VMs, choose VSS full backup.
  7. Once you have confirmed the details, select Backup.

Considerations

  • You must back up all volumes related to a VM, including Virtual Hard Disks (VHDs), VM configuration files, and snapshots.
  • If you are creating a Backup Schedule, you must use a dedicated local volume that will be formatted and used exclusively by WSB. In contrast, if you are performing a Backup Once job, you can store the backup on a non-dedicated local volume, a removable device, or a network share.
  • If the Integration Components are not installed within the VM being backed up, WSB will save the state of the running VM in order to ensure backup data consistency.
  • Once completed, the backup set is portable and can be used with any Hyper-V host.

Restoring VMs with WSB

Although WSB does have the ability to restore individual files, this feature does not use VSS and therefore may result in an inconsistent restore if the VM was running at the time of backup. To restore running VMs, you need to restore the entire volume(s).

In order to do this, you must go to Start | Administrative Tools | Windows Server Backup and, in the Actions Pane, select Recover. Choose the server from which to recover data (the one where the WSB backup data is located), then choose the date from which to restore the data. Now you can select the recovery type.

Here's where you have to make a decision. If you need the entire VM, including its configuration, snapshots, and virtual disks (in the event of a complete host failure, for example), select Application Restore and then Hyper-V, as shown in Figure 6. In this case, you do not have the option to restore individual files. You will have to restore everything included in that backup set. Note that this will not overwrite existing Hyper-V and VM configuration data that has changed since the backup.

fig06.gif

Figure 6 Restoring a Hyper-V backup (Click the image for a larger view)

If you need only the VHD itself and the configuration data and snapshots for the VM are healthy, you can select Files and folders and pick the individual VHD file you need. Note that this process does not use the VSS writer; the VM should have been backed up with this in mind, with its state saved first.

If you've suffered total system and data loss and need to recover the Hyper-V host itself, including the Windows Server 2008 operating system and all the VMs running therein, you must boot to the Windows Recovery Environment and perform your restore from there. This can be done from the Windows Server 2008 setup disk or a pre-configured disk partition.

Data Protection Manager

We have taken a look at the backup and restore steps and considerations for Hyper-V hosts and guests using the reliable, free, built-in WSB, but WSB is not an enterprise-class data protection solution. Where it leaves off, Data Protection Manager (DPM) 2007 SP1 picks up. Currently scheduled for release in late 2008, DPM SP1 will support Hyper-V and offers some compelling features:

  • Single management console for all Hyper-V hosts and guests.
  • Continuous data protection, which takes VSS-based snapshots at intervals of up to 15 minutes, snapping only the changed bits in the process.
  • Hyper-V cluster awareness that allows the backup to follow the VM as it moves between cluster nodes.

•DPM server to DPM server replication.

  • Support for disk and tape media (disk to disk, disk to tape, or disk to disk to tape).
  • Backup and restore capabilities across the whole spectrum of data, which includes Hyper-V Hosts and Guests; agentless VSS backups of running Guests; support for restoring individual VMs; failover cluster data; and best-in-class application-specific features for SQL Server, Exchange, SharePoint, Hyper-V, and Virtual Server.
  • Pre- and post-backup scripting.

If you currently use a third-party backup solution, watch for updates to the application; most vendors are working hard to get Hyper-V host-based solutions to market.

Scripted Backups

WSB includes a command-line interface, WBadmin.exe, as well as a set of Windows PowerShell cmdlets for use with scripting. When using these, the same backup rules apply as outlined previously, along with the need to manually register the Hyper-V VSS writer through the registry.

Figure 7 shows some WBAdmin commands. For full WBAdmin documentation, see go.microsoft.com/fwlink/?LinkId=124380. As you can see, there is nothing in WB​Admin to configure the backup policy itself, but there is a Windows PowerShell snap-in to manage these settings. You can test to see if this snap-in is registered with the following command:

Get-PSSnapin -Registered

Figure 7 WBAdmin commands

WBAdmin Commands Description
ENABLE BACKUP Enables or modifies a scheduled daily backup.
DISABLE BACKUP Disables running scheduled daily backups.
START BACKUP Runs a backup.
STOP JOB Stops the currently running backup or recovery.
GET VERSIONS Lists details of backups recoverable from a specific location.
GET ITEMS Lists items contained in the backup.
START RECOVERY Runs a recovery.
GET STATUS Reports the status of the currently running job.
GET DISKS Lists the disks that are currently online.
START SYSTEMSTATERECOVERY Runs a system state recovery.
START SYSTEMSTATEBACKUP Runs a system state backup.
DELETE SYSTEMSTATEBACKUP Deletes system state backup(s).

And you can use the following to load a snap-in named Win­dows.Serv­erBackup:

Add-PSSnapin windows.serverBackup 

Once this is loaded, you have access to the Windows PowerShell cmdlets for WSB, as shown in Figure 8. For a verbose description of each cmdlet, run this command:

Get-Command -PSSnapin windows.serverBackup | select name | get-help –full

Figure 8 Windows Server Backup cmdlets

Cmdlet Description
Add-WBBackupTarget Adds a backup target to the backup policy.
Add-WBVolume Adds a volume to the backup policy.
Get-WBBackupTarget Gets backup targets from a policy.
Get-WBDisk Gets all disks.
Get-WBPolicy Gets current backup policy.
Get-WBSchedule Gets backup schedule in policy.
Get-WBSummary Gets backup history and summary.
Get-WBVolume Gets all volumes.
New-WBBackupTarget Creates a new backup target.
New-WBPolicy Creates a new empty policy.
Remove-WBBackupTarget Removes a backup target from the policy.
Remove-WBPolicy Deletes the backup policy.
Remove-WBVolume Removes a volume from the policy.
Set-WBPolicy Saves the WBPolicy object to create a scheduled backup.
Set-WBSchedule Sets the schedule to the backup policy.

There is another utility built-in to Windows Server 2008 that can also make use of the Hyper-V VSS writer and adds some flexibility to your scripting options. DiskShadow.exe allows a shadow copy to be made and mounted as a drive, which lets administrators make a more selective backup than is possible using WSB. And it's important for you to remember that DiskShadow does not accept Windows PowerShell pipeline input; instead, it requires commands to be passed to it through a script, which might look something like the following:

Delete Shadows Volume C:
Set Context Persistent
Begin Backup 
Writer Verify {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
Add Volume C: ALIAS MyShadow 
Create
End Backup
Expose %MyShadow% X: 
Exit

This script first deletes any existing shadow copies of drive C:, then ensures that the shadow copy will persist after DiskShadow has run. Next it creates a transactional block—if any of the steps fail, the whole process fails. In this block, DiskShadow verifies that the writer for Hyper-V is loaded and adds drive C: to the list of drives to be backed up.

Drive C: will get a GUID in order to identify it, and this GUID will be stored in an environment variable that is named "MyShadow." Once that is complete, the shadow copy is created.

The backup is exposed as drive X: using the environment variable. Various things can be done with the data on X:, and then Disk­Shadow can be run again with the command Unexpose X: to remove the drive.

Note that restoring Hyper-V VMs that have been backed up via DiskShadow is currently a manual process (the VM must be recreated, SnapShots are not preserved, and so forth). While this has some obvious disadvantages, the data is protected.

DR can be an arduous process that seemingly never ends. But server virtualization brings new possibilities, due to both the technologies and the resulting lower-cost entry point. Microsoft is providing not only virtualization but an entire ecosystem. Together, the server virtualization platform and the System Center family offer more holistic solutions to the ever-growing complex challenges organizations face, including DR.

I would like to thank James O'Neill for his contribution to this article.

Adam Fazio began in Microsoft Consulting Services more than 2 years ago and now resides within the U.S. Public Sector practice. He has serviced the IT industry for more than 10 years working on a variety of infrastructure projects and datacenter operations for Fortune 100s to Internet start-ups. He is currently the technical escalations lead for the global Virtualization Rapid Deployment Program at Microsoft.