High Availability and Disaster Recovery (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Microsoft SQL Server 2008 R2 (as well as SQL Server 2008 and SQL Server 2005) offers a variety of technologies for customers to pick from to meet the high availability and disaster recovery (HA/DR) requirements of their application portfolio. Customers typically adopt at least two of these technologies to accomplish their HA/DR goals; Tier-1 deployments in particular, which are often either mission or business critical, generally require more than one technology.

Examples referenced in the sections that follow show that customers can achieve up to 99.99% or 99.999% availability, even when upgrading to new versions of SQL Server and/or upgrading their hardware. Furthermore, Windows security patching can be completed while meeting availability goals.

Availability goals can be met with a variety of configurations. A popular architecture combines failover clustering for HA and database mirroring for DR. Additionally, log shipping can be added to achieve tertiary site DR protection. (More information about these options can be found later in this document.)

Best Practices

The following resources provide examples of customer scenarios for implementing SQL Server HA/DR, in addition to general SQL Server HA/DR reference material. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)

Case Studies and References

Customers have successfully configured SQL Server to achieve up to 99.999% availability. Examples of successful architectures are described in the following case studies and white papers:

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

  • Understand the business motivations and regulatory requirements that are driving the customer's HA/DR requirements. Understand how your customer categorizes the workload from an HA/DR perspective. There is likely to be an alignment between the needs and categorization.

  • Ask for both the recovery time objective (RTO) and the recovery point objective (RPO) for different workload categories, for both a failure within a data center (local high availability) and a total data center failure (disaster recovery). While RPO and RTO vary for different workloads because of business, cost, or technological considerations, customers may prefer a single technical solution for ease in operations. However, a single technical solution may require trade-offs that need to be discussed with customers so that their expectations are set appropriately.

  • Understand the network infrastructure at the primary and the planned (or proposed) DR site and the communication links between the two—these are important factors in selecting an acceptable architecture. Also consider the planned maintenance operations (regular backup and index maintenance) while selecting the architecture; the constraints for maintenance workloads may be greater than those for regular production workloads because of the time in which the maintenance operations must be completed.

  • Ask if there is an organizational preference for a particular HA/DR technology. Customers may have a preference because of previous experiences, established operational procedures, or simply the desire for uniformity across databases from different vendors. Understand the motives behind a preference: A customers' preference for HA/DR may not be because of the functions and features of the HA/DR technology. For example, a customer may decide to adopt a third-party solution for DR to maintain a single operational procedure. For this reason, using HA/DR technology provided by a SAN vendor (such as EMC SRDF) is a popular approach.

  • Understand the implications of both hardware and software maintenance (including Windows security patching) when discussing the design and adoption of an HA/DR solution. Database mirroring is often adopted to minimize the service disruption that can be caused by maintaining hardware and software.

  • Review the following material for a better understanding of HA/DR options. (This section may be unnecessary if you have extensive prior experience with SQL Server HA/DR options.)

    SQL Server 2008 R2 offers several HA/DR options:

    • Failover clustering 12With failover clustering, the nodes share disks, but only a single node has access to the database at a time. It is possible to install additional SQL Server failover cluster instances across the nodes; however, this configuration cannot be used to re-direct workloads for a single database (for example, separating reads from writes).

    • Database mirroring 13 Database mirroring is used both for high availability and disaster recovery. Database mirroring also provides read access to the data by making it possible to create a database snapshot on the mirror. The read workload can run against the database snapshot. While the creation of database snapshots on a mirror partner has the potential to be useful, very few customers are deploying it for the following reasons:

      The name of the database in the snapshot is different, which requires making a change in order to connect.

      Database mirroring provides a static snapshot of the database. New snapshots have to be created (with potentially new names unless a redirection mechanism is also configured) to get more current data. In other words, accessing current data on the mirror is not practical. (This challenge is addressed in the next release of SQL Server, code named “Denali.”)

      No out of the box solution is provided for refreshing database snapshots, particularly when users are connected to the snapshots. It is possible to automate this process however.

      Many applications are not designed with read-only databases in mind. For example, even when a database snapshot is available, an application might update internal tables when a user connects to the database, even if the user is only performing reporting.

      The key takeaway is that database mirroring offers high availability and disaster recovery, but it is not practical for a readable secondary solution for current data.

    • Log shipping 14 Similar to database mirroring, the log shipping feature can be used to set up procedures that enable read-only access at the backup site so that secondary sites can receive log backups. However, because of latency and deployment challenges, customers may find it impractical to use the secondary sites for real-time reporting.

    • Replication 15 Transactional and peer-to-peer transactional replication are practical and commonly deployed solutions for offloading read workloads. However, they do not ensure zero data loss for high availability and disaster recovery. The challenges for very large deployments include:

      Replication can be complex to monitor and troubleshoot.

      Replication may require schema changes (for example, adding a unique key).

      Replication may have very high transaction throughput.

Appendix

Following are the full URLs for the hyperlinked text.

1 Proven Customer Deployed Architectures & Scenarios for SQL Server HA/DRhttp://ecn.channel9.msdn.com/o9/te/NorthAmerica/2010/pptx/DAT401.pptx

2 Proven SQL Server Architectures for High Availability and Disaster Recoveryhttp://sqlcat.com/whitepapers/archive/2010/06/07/proven-sql-server-architectures-for-high-availability-and-disaster-recovery.aspx

3 SAP with Microsoft SQL Server 2005: Best Practices for High Availability, Maximum Performance, and Scalabilityhttps://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/sap_sql2005_best%20practices.doc

4 Microsoft SQL Server: The Platform for SAPhttp://ecn.channel9.msdn.com/o9/te/NorthAmerica/2010/pptx/DAT314.pptx

5 Failure Is Not an Option: Zero Data Loss and High Availabilityhttp://sqlcat.com/whitepapers/archive/2010/11/03/failure-is-not-an-option-zero-data-loss-and-high-availability.aspx

6 SQL Server 2008: High Availability—Always Onhttps://www.microsoft.com/sqlserver/2008/en/us/high-availability.aspx

7 Selecting a High Availability Solutionhttps://msdn.microsoft.com/en-us/library/bb510414.aspx

8 High Availability and Disaster Recovery at ServiceU: A SQL Server 2008 Technical Case Studyhttp://sqlcat.com/whitepapers/archive/2009/08/04/high-availability-and-disaster-recovery-at-serviceu-a-sql-server-2008-technical-case-study.aspx

9 bwin: Global Online Gaming Company Deploying SQL Server 2008 to support 100 Terabyteshttps://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001470

10 Using Replication for High Availability and Disaster Recoveryhttp://sqlcat.com/whitepapers/archive/2009/09/23/using-replication-for-high-availability-and-disaster-recovery.aspx

11 SQL Server High Availability and Disaster Recovery for SAP Deployment at QR: A Technical Case Studyhttp://sqlcat.com/whitepapers/archive/2010/09/20/sql-server-high-availability-and-disaster-recovery-for-sap-deployment-at-qr-a-technical-case-study.aspx

12 Getting Started with SQL Server 2008 R2 Failover Clusteringhttps://msdn.microsoft.com/en-us/library/ms189134.aspx

13 Database Mirroring Overviewhttps://msdn.microsoft.com/en-us/library/ms189852.aspx

14 Log Shippinghttps://msdn.microsoft.com/en-us/library/bb895393.aspx

15 Replicationhttps://technet.microsoft.com/en-us/library/ms151215.aspx