Backup and Restore (DW)---a Technical Reference Guide for Designing Mission-Critical DW Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.
Backup and restore are the primary mechanisms that customers use for maintaining copies of their databases in case of disaster, for moving data over to another system, and for offline storage/archive of data. Backup and restore strategy discussions usually occur in the context of operations considerations for the database engine and focus on developing a strategy for reducing data loss and maintaining the desired data availability. The backup and restore strategy should conform to the business requirements for data loss (recovery point objective [RPO]) and the ability to access that data if there is a failure (recovery time objective [RTO]).
Common, more advanced approaches include partial backups and piecemeal restore sequences, which allow for partial availability and more specific backup/restore scenarios. Another common practice is to use file-level "backups," which are initiated via replication at a storage-vendor level. This practice is frequently discussed within a high availability and disaster recovery (HA/DR) context, but, in many cases, the actual copies of data and log files at a storage level are used for backup and recovery/restore sequences.
Although the database (DB) backup/restore mechanism can be the same, data warehouse backup/restore strategies can be significantly different from OLTP systems. Strategies differ because of the volume of data, the window of system availability to perform backups, mission criticality of the data warehouse, developing a key incremental backup strategy (to avoid having to recover 200 TB in one shot), and so on.
The following section provides some best practice guidance and some pitfalls to avoid. References are provided for further information.
Trap: Not having a firm understanding of business requirements/SLA to meet in terms of downtime, and not planning your backup/restore strategy around that. A backup and recovery strategy which does not meet the business requirements is a setup for perceived failure by the business in the case of disaster.
These pointers provide best practices and recommendations on backup compression and tuning:
Some other best practices/recommendations on achieving high performance backups on large scale systems:
Placing backup files in a location away from the production databases is critical. Never subject the only backup to the same potential failure component as could happen to the production database (for example, backup files residing on the same disks as production database).
While there are ways to confirm the pages being backed-up are not corrupt (checksum on backup) the best way to confirm you have a valid backup is to do an actual restore of the backup. Practicing this process and procedure can also give a better understanding of how long a recovery process may take. It isn’t just how long it takes to do the restore, but also the time it takes to get backup files moved over to the system, execute restore commands, and then actually do the restore.
Case Studies and References
The following references provide additional information.
The white paper Configure disaster recovery across SharePoint farms by using SQL Server log shipping6 discusses how to configure a Microsoft Office SharePoint Server 2007 system (based on SQL Server) for disaster recovery scenarios using SQL Server backup/restore functionality.
A Guide for SQL Server Backup Application Vendors
Very Large Data Warehouses Challenge Backup
8: an excellent article on the pitfalls and best practices for managing DW backups and restores.
How to Design a Backup Strategy in SQL Server
9: This is a data warehouse specific guide with some good ideas on handling backups, but also managing log files, archives, and so on.
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
The SQL Server backup/restore functionality needs to be included in discussions about backup storage and performance, recovery procedures, and time to recover. Many of these steps are documented in a "run book," a data center procedures guide, or an application requirements guide. For Tier-1 customers, considerations such as partial availability, size of the backup, and integration with their current backup/restore model (for example, third-party tool or disk or tape storage, and security of backups) are all critical considerations.
Discuss the enterprise’s ability to handle data loss and potential downtime. Business users are often unprepared for either but ensuring zero downtime and zero potential data loss is usually viable for only selected workloads. Understanding the requirements can help with the modeling a high-level backup/restore strategy.
Understand how the application code plans to call the DBMS. Will it use Static or Dynamic SQL? To what extent will it use SP? Will the answer set be very large over slow TP links to the end user? For more discussion, see orthogonal architecture on Application Data Layer.
Understand HA/DR requirements and match them to the architectures that are discussed in HA/DR technical reference guides. The backup/restore strategy needs to be aligned with the HA/DR strategy. More detail is provided in the best practices sections in the HA/DR guides.
Understand if projected volumes can be met simply by scaling up, or if a future desire is to open the system to large number of users, and scale-out will be necessary. If possible, design the system for scale-out.
If an ISV application, ask how the current deployment compares with the earlier ones to get a good handle on hardware requirements and potential operational considerations.
Following are the full URLs for the hyperlinked text.
1 Tuning the Performance of Backup Compression in SQL Server 2008 http://sqlcat.com/technicalnotes/archive/2008/04/21
2 Tuning Backup Compression Part 2 http://sqlcat.com/technicalnotes/archive/2009/02/16/tuning-backup-compression-part-2.aspx
3 A Technical Case Study: Fast and Reliable Backup and Restore of Multi-Terabytes Database over the Network http://sqlcat.com/whitepapers/archive/2009/08/13
4 Backup More Than 1GB per Second Using SQL2008 Backup Compression http://sqlcat.com/msdnmirror/archive/2008/03/02
5 Scheduling Sub-Minute Log Shipping in SQL Server 2008 http://sqlcat.com/technicalnotes/archive/2009/02/24
6 Configure disaster recovery across SharePoint farms by using SQL Server log shipping http://sqlcat.com/whitepapers/archive/2009/06/05
7 A Guide for SQL Server Backup Application Vendors http://technet.microsoft.com/en-us/library/cc966520.aspx
8 Very Large Data Warehouses Challenge Backup http://www.information-management.com/issues/20050601/1028753-1.html
9 How to Design a Backup Strategy in SQL Server http://www.petri.co.il/design-a-backup-strategy-in-sql-server.htm