For most administrators, the most important benefit of a VSS-based backup solution is that it allows for very rapid restoration of lots of data. VSS solutions are most useful for deployments that include large databases that require a restoration time of less than 60 minutes. This requirement is beyond the capabilities of current streaming or tape-based backup solutions. A VSS solution provides the following benefits:
-
Faster restore time
-
The ability to back up and restore larger amounts of data in a typical backup window than you can back up by using a traditional streaming online backup solution
A common misconception about VSS solutions is that they allow for backups to occur almost instantaneously and without an effect on a production server. This may be true from the point of view of an application; however, a VSS backup can require just as much underlying preparation and generate as much load as a streaming backup, especially when you are using clones. Backing up to disk and restoring to disk may give you more throughput and performance than using a tape-based solution. However, this does not change the fact that data must be copied from one location to another, regardless of the backup method chosen. With a VSS solution, this copy process can be optimized and scheduled, but the process must occur and copying lots of data necessarily consumes system resources.
Most production Exchange Server I/O involves many small, random I/O transactions to the databases. During backup and restore, the I/O throughput of your storage subsystem can become a bottleneck that artificially throttles your backup and restore speed. Make sure that you have sufficient throughput and load balancing to guarantee that you can meet your backup and restore needs.
Each Exchange Server 2003 storage group consists of up to five databases, transaction log files, and a checkpoint file. VSS considers both the database (*.edb) and streaming (*.stm) files as the database component, whereas the transaction logs (*.log) and checkpoint file (*.chk) are part of the log component.
If you use VSS for your backup solution, we recommend that you run the Windows Server 2003 operating system with Service Pack 1 (SP1). Contact your storage vendor to determine whether Windows Server 2003 with SP1 is supported. For information about a VSS update package that is available if you cannot upgrade to Windows Server 2003 with SP1, see the Microsoft Knowledge Base article 833167, A Volume Shadow Copy Service (VSS) Update Package Is Available for Windows Server 2003. For a list of additional hotfixes that you must apply if you are not running Windows Server 2003 with SP1, see “Appendix” later in this article.
You must make sure that any potential VSS solution for Exchange Server 2003 falls within the VSS framework and is a supported solution. For information about supported VSS solutions, see the Microsoft Knowledge Base article 822896, Exchange Server 2003 Data Backup and Volume Shadow Copy Services.
Running checksum integrity verification is an I/O-intensive and memory-intensive operation. We recommend that for stand-alone and clustered Exchange servers, you offload this work to a backup server that mounts and runs checksum integrity verification on the read-only shadow copy. When you can, it is always best to run the checksum integrity verification against shadow copies that are not hosted on the same physical disks as the production LUNs.
VSS Backup Types
You can use a full, copy, differential, or incremental backup type for your entire server or single storage group. For more information about VSS backup types, see Backup Operations.
Full Backup Use the full backup type for Exchange Server deployments. This backup type performs a backup of all the databases, transaction log files, and checkpoint files in a storage group, and after the backup is complete, truncates the log files.
Log file truncation is the process of deleting excess transaction log files that are not necessary to restore or roll forward the most recent backup. You must verify the checksum integrity of the most recent backup before log file truncation occurs. Truncation removes log files that are required to roll the system forward from a backup previous to the most recent backup. Although truncation does not invalidate previous backups, after truncation, you can restore the database only to the point in time at which the previous backup was taken.
Copy Backup A copy backup performs the same steps as a full backup, but it does not truncate the transaction log files. You can use a copy backup to create a copy of the database for testing or analysis purposes.
Incremental Backup You must be running Exchange Server 2003 with Service Pack 1 (SP1) or a later version to use an incremental backup type. The incremental backup backs up the transaction logs to record changes that occurred since the last incremental or full backup, and then truncates the transaction logs. To restore from an incremental backup, you must first restore the last full backup, and then restore all the incremental backups. The incremental backup can give you a faster backup window, but it can increase the restore time and log replay time.
Differential Backup A differential backup type requires Exchange Server 2003 with SP1 or later. A differential backup backs up the transaction logs to record changes that occurred since the last full backup, and does not truncate the transaction logs. To restore from a differential backup, you must first restore the last full backup, and then the most current differential backup. The differential backup can give you a faster backup window, at the expense of capacity and restore time.
A shadow backup typically involves the following stages, managed by the requestor and writer:
- Synchronize Removes the previous shadow copy set from the backup server and synchronizes with the production LUN.
- Fracture Freezes writes on the source LUNs when the shadow copies are synchronized, fractures the shadow copy synchronization, and resumes writes to the source LUN.
- Transport and Checksum Transports and exposes shadow copy data and transaction log LUNs to the mount host. Runs checksum integrity verification against the shadow copy set. For more information about checksum integrity verification, see "Exchange Requestors and Checksum Integrity Verification" later in this article.
- Log Truncation Completes backup by truncating storage group transaction logs on success and flags the full backup as complete.
VSS Restore Process
You can choose to restore an entire storage group, or, if the databases are hosted on separate LUNs, which is not a best practice, you can restore one or more databases in the storage group.
To restore even a single database, you must first take all databases in the storage group offline. Then, after the restore has finished, automatic database recovery (transaction log file replay) is invoked for the entire storage group by mounting any database in the storage group.
For this automatic recovery to succeed, the following minimum conditions must be met:
-
The database file names and logical file paths must be the same as when the backup was done. For example, if the file names were Priv1.edb and Priv1.stm, and the files were stored in the path D:\Databases, the restore location must also be D:\Databases and you must not change the file names.
-
The storage group prefix must match the file names of any transaction log files that are to be replayed.
-
In cases where you are restoring to the original server, these conditions are automatically met unless you have changed database paths since the backup was taken.
-
Some VSS requestors allow restoration to alternate servers. This might be useful for mounting databases on laboratory servers or for advanced recovery scenarios in which the original server is unavailable. For more information about backing up and restoring Exchange Server 2003, see the Exchange 2003 Disaster Recovery Operations Guide.
Recovery occurs in one of two ways:
- Roll-forward recovery A roll-forward recovery is a recovery to the time of failure. A roll-forward recovery can be done if the current log LUN is available. In this case, you can restore the database files from backup, but not the transaction log files, and use the current logs on the server to roll the database forward. Assuming that all log files that were generated since the time of backup are available, no data is lost by restoring from backup.
- Point-in-time recovery A point-in-time recovery is a recovery only of the data in the last backup. All newer data is lost. When you use a point-in-time recovery, only the transaction log files that are part of the backup set are used. Additional log files generated since the time of backup are not used, and those databases are recovered only to the point of the backup.
Exchange Server Clustering
Many enterprise solutions take advantage of Windows Clustering to increase server availability. When you run Windows Server 2003 with SP1 and Exchange Server 2003 in a cluster, a new feature named maintenance mode is available to help with some restoration methodologies. Clustering adds some unique challenges to VSS that you must understand and plan for in order to be successful. Make sure that you are aware of the backup and restore implications of your clustering solution.
During a backup, the checksum integrity verification is run against the shadow copy. Checksum integrity verification is a memory-intensive and disk-intensive operation that most administrators do not want to run on a cluster node hosting a production Exchange Virtual Server. During checksum integrity verification, the LUN is presented as read-only. This can cause problems with the disk signature of the original LUN and cause it to go offline. That is why most cluster solutions implement a backup server that mounts the backed-up LUNs to run the checksum integrity verification.
During a restore, the cluster physical disk resources are monitored with IsAlive and LooksAlive heartbeat requests. Restore solutions that dismount the production LUN and mount the backup LUN might encounter a timing problem: if the cluster service sends these heartbeat requests to the physical disk during the switch between the production and backup LUN, the cluster physical disk resource can fail, causing a cluster failover. Solutions that resynchronize the backup LUN to the production LUN are not at risk for cluster failover.
If you are running Exchange in a clustered environment, and you use a shadow copy backup or restore provider that causes LUNs to become temporarily unavailable to the cluster, we strongly recommend that you use the Microsoft Windows Server 2003 operating systems with Service Pack 1 (SP1) and that the provider takes advantage of the disk resource maintenance mode feature. For more information about the disk resource maintenance mode feature, see Microsoft Knowledge Base Article 903650, Extended Maintenance Mode Functionality for Cluster Physical Disk Resources in Windows Server 2003.
Alternatively, if you cannot run Windows Server 2003 SP1 or your VSS provider does not yet support disk resource maintenance mode, you can reduce, but not eliminate, the possibility of a cluster failover during critical operations by increasing the IsAlive and LooksAlive values for the resource to 5 minutes. Note that you should not leave these values at 5 minutes; revert them to typical values for regular operation. For information about increasing the IsAlive and LooksAlive values, see Frequently Asked Questions.