Backup and Restore (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Article
03/05/2012

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Backup and restore are the primary mechanisms that customers use for maintaining copies of their databases in case of disaster, for moving data over to another system, and for offline storage/archive of data. Backup and restore strategy discussions usually occur in the context of operations considerations for the database engine and focus on developing a strategy for reducing data loss and maintaining the desired data availability. The backup and restore strategy should conform to the business requirements for data loss (recovery point objective [RPO]) and the ability to access that data if there is a failure (recovery time objective [RTO]).

Common, more advanced approaches include partial backups and piecemeal restore sequences, which allow for partial availability and more specific backup/restore scenarios. Another common practice is to use file-level "backups," which are initiated via replication at a storage-vendor level. This practice is frequently discussed within a high availability and disaster recovery (HA/DR) context, but, in many cases, the actual copies of data and log files at a storage level are used for backup and recovery/restore sequences.

Best Practices

The following section provides some best practice guidance and some pitfalls to avoid. References are provided for further information.

Design Guidelines

One pitfall to watch for is the lack a firm understanding of the business requirements/service level agreements (SLAs) that must be met in terms of downtime; the backup and restore strategy should be planned around these requirements. A backup and recovery strategy that does not meet business requirements can fail if a disaster occurs.
Some recoveries may have dependencies on various vendors’ response time. It is important to ensure that the SLA for each vendor is compatible with the overall recovery strategy.
Placing backup files in a location away from the production databases is critical. Never subject your only backup to the same risks as the production database (for example, backup files residing on same disks as production database).
While there are ways to confirm that the pages that have been backed up are not corrupt (for example, you can use checksum on the backup), the best way to confirm that you have a valid backup is to perform an actual restore of the backup. Ideally, restores should be performed on other systems, and utilizing different hardware. Performing an actual restore can also provide a better understanding of how long a recovery process may take and can help to ensure that all required files are in fact present in the backup. The time for the recovery process involves more than just the restore time; it also includes the time it takes to get the backup files moved over to the system, to execute restore commands, and then to actually perform the restore. The use of backup compression can greatly reduce the time involved in moving files and performing the restore.
In multi-database applications, it is important to have a way to synchronize the restore point across each database. One option is to write a marked transaction to the log of each database before starting the backup process as described in the section "Marked Transactions" in the article BEGIN TRANSACTION (Transact-SQL).1 This allows a point in time restore using the STOPATMARK option as described in the section "Restoring the transaction log to a mark" in the article RESTORE (Transact-SQL).2

These references provide best practices and recommendations for backup compression and tuning:
- The article Tuning the Performance of Backup Compression in SQL Server 20083 provides guidance related to tuning options for backup performance.
- The article Tuning Backup Compression Part 24 describes additional considerations for tuning backup compression, and discusses how backup compression interacts with other important features in Microsoft SQL Server 2008.
While the performance gains from backup compression are important, less obvious is the benefit gained from the improvements in restoration time for compressed backups. Unless there is a critical time window, users are typically not stalled while waiting for backups to complete. Users do, however, usually need to wait for restore operations to complete. Backup compression can significantly reduce this delay.
The use of Transparent Data Encryption (TDE) for a database will negate potential gains from the use of backup compression.
Some other best practices/recommendations on achieving high-performance backups on large-scale systems are described in the following articles:
- The article Backup More Than 1GB per Second Using SQL2008 Backup Compression5 describes how to increase performance for backup compression.
- The article Scheduling Sub-Minute Log Shipping in SQL Server 20086 discusses considerations that you should be aware of when scheduling frequent log shipping jobs.

Case Studies and References

The following references provide additional information.

The white paper Configure disaster recovery across SharePoint farms by using SQL Server log shipping7 discusses how to configure a Microsoft Office SharePoint Server 2007 system (based on SQL Server) for disaster recovery scenarios using SQL Server backup/restore functionality.
The article A Technical Case Study: Fast and Reliable Backup and Restore of Multi-Terabytes Database over the Network8 discusses the challenges of designing robust backup and restore solutions for very large databases.
The case study First American Title Insurance Company9 describes how one of the largest title insurance companies has greatly reduced IO requirements by the use of backup compression.
The case study Large Food Service Company10 describes how the combination of data compression and backup compression has reduced storage requirements.

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

The SQL Server backup/restore functionality needs to be included in discussions about backup storage and performance, recovery procedures, and time to recover. Many of these steps are documented in a "run book," a data center procedures guide, or an application requirements guide. For Tier-1 customers, considerations such as partial availability, size of the backup, and integration with their current backup/restore model (for example, third-party tool or disk or tape storage, and security of backups) are all critical considerations.
A Guide for SQL Server Application Backup Vendors 11 provides useful information about the SQL Writer Service component and its role in the Volume Shadow Copy Service (VSS) snapshot creation and restore process for SQL Server databases.
Discuss the enterprise’s ability to handle data loss and potential downtime. Business users are often unprepared for either but ensuring zero downtime and zero potential data loss is usually viable for only selected workloads. Understanding the requirements can help with the modeling a high-level backup/restore strategy.
Does the customer use a centralized backup system or a centralized backup tool? If so, understand how it works with SQL Server. The customer may have to contact the vendor if they are using a third-party solution. This may create an additional dependency for the deployment plans.
The ability to restore from backups and the time this restore will take are as important, if not more important, than the backup strategy. While restoration from backup is a frequently neglected topic, it is best to consider it early in planning your strategy. Often there are fewer skilled staff available during recovery than during planning.
During recovery, a clear chain of authority for decision making and escalation is important and details of how to contact more experienced staff and of how to contact vendors should be readily available. This includes details such as vendor contract numbers.
The general advice on recovery strategies is that they should be prepared and documented by the most experienced staff but tested by the least experienced staff. The least experienced staff will often be the ones that need to implement the strategy.
Understand HA/DR requirements and match them to the architectures that are discussed in HA/DR technical reference guides. The backup/restore strategy needs to be aligned with the HA/DR strategy. More detail is provided in the best practices sections in the HA/DR guides.
Database backup and restore is important, but it is only one component in the availability of the application to the user. The application may require that other databases are also synchronized for consistency, and this might influence the window of time that is available for a necessary restore process.