Failover Clustering (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Failover clustering is a popular solution to meet the high availability (HA) needs of mission-critical applications. It has been available for several releases of Windows Server and Microsoft SQL Server, and has been significantly improved in Windows Server 2008 and SQL Server 2008. Failover clustering uses a shared storage model and requires a storage area network (SAN). SQL Server failover clustering works well with other technologies, such as database mirroring and log shipping. It is typically used to provide fault tolerance at the server level within a data center, but can also be extended to a remote data center—a solution referred to as stretch cluster, extended cluster, or geocluster.

Best Practices

The following resources provide examples of customer scenarios for implementing failover clustering, in addition to general failover clustering reference material. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)

  • The white paper SQL Server 2008 Failover Clustering1 covers failover cluster architecture and concepts for Windows Server 2003 and Windows Server 2008, in addition to SQL Server 2008. The white paper also covers installation of a SQL Server 2008 failover cluster, upgrades and updates to SQL Server 2008 failover clustering, and maintenance and administration of SQL Server 2008.

  • The white paper Six Failover Clustering Benefits Realized from Migrating to SQL Server 20082 contains quick tips and benefits of failover clustering in SQL Server 2008. This paper lists the significant and immediate benefits of using SQL Server 2008 failover clustering.

  • Unlike database mirroring, failover clustering supports distributed transactions. For more information, see the section "Microsoft Distributed Transaction Coordinator" in the white paper SQL Server 2008 Failover Clustering.3

  • Failover clustering has been improved in Windows Server 2008 and later versions, as described in the following articles:

    Note in particular though, that while Windows Server 2008 R2 clusters can be created across multiple subnets, SQL Server 2008 R2 requires cluster members to be part of the same subnet. When creating geoclusters, it is common to use virtual subnets to work around this limitation.

  • By using volume mount points, you can mount a target partition onto a folder on another physical disk, as shown in the article How to configure volume mount points on a server cluster in Windows Server 2008.6 You can mitigate the limitation on the number of drive letters provided by the English alphabet through the use of volume mount points.

Case Studies and References

Example architectures are explained in the following case studies:

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

  • Understand prioritized HA/DR requirements for the application.

  • Are your customers comfortable with a shared storage solution?

    • Failover clustering requires a shared storage solution, such as a SAN, and this means working closely with storage administrators.

    • With shared storage, there is only one copy of data. The shared storage system needs to include redundancy to avoid being a single point of failure.

  • What is the recovery point objective (RPO)? Failover clustering is often deployed alongside database mirroring, with clustering used for local HA, and database mirroring used for DR.

    • A zero data loss solution may require augmenting failover clustering with either synchronous database mirroring or synchronous SAN replication but the performance impact of synchronous database mirroring needs to be considered, particularly if there is a significant distance between cluster nodes.

    • To avoid the performance impact, asynchronous database mirroring might be considered but does not provide a zero data loss solution.

  • What is the recovery time objective (RTO)? A database mirroring HA solution may provide a better failover time than a failover clustering solution.

  • Consider a geocluster (or stretch cluster) as a combined HA/DR solution. This solution requires software to enable the cluster and storage-level replication and from the storage vendor.

Appendix

Following are the full URLs for the hyperlinked text.

1 SQL Server 2008 Failover Clusteringhttp://sqlcat.com/whitepapers/archive/2009/07/08/sql-server-2008-failover-clustering.aspx

2 Six Failover Clustering Benefits Realized from Migrating to SQL Server 2008http://sqlcat.com/top10lists/archive/2008/11/20/six-failover-clustering-benefits-realized-from-migrating-to-sql-server-2008.aspx

3 SQL Server 2008 Failover Clusteringhttp://sqlcat.com/whitepapers/archive/2009/07/08/sql-server-2008-failover-clustering.aspx

4 What's New in Failover Clusters in Windows Server 2008https://technet.microsoft.com/en-us/library/cc770625(WS.10).aspx

5 What's New in Failover Clustershttps://technet.microsoft.com/en-us/library/dd443539(WS.10).aspx

6 How to configure volume mount points on a server cluster in Windows Server 2008https://support.microsoft.com/kb/947021

7 High Availability and Disaster Recovery at ServiceU: A SQL Server 2008 Technical Case Studyhttp://sqlcat.com/whitepapers/archive/2009/08/04/high-availability-and-disaster-recovery-at-serviceu-a-sql-server-2008-technical-case-study.aspx

8 SQL Server High Availability and Disaster Recovery for SAP Deployment at QR: A Technical Case Studyhttp://sqlcat.com/whitepapers/archive/2010/09/20/sql-server-high-availability-and-disaster-recovery-for-sap-deployment-at-qr-a-technical-case-study.aspx