Database Availability Group Design Examples
Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2
Topic Last Modified: 2010-10-01
The ability of a database availability group (DAG) to contain as many as 16 Mailbox servers, combined with the ability to extend a DAG across multiple physical locations and Active Directory sites, provides a large number of architectural design possibilities for DAGs.
You can use design examples for DAGs in a variety of environments:
Two-member DAG, which is suited for small office and branch office deployments
Four-member DAG that provides high availability within a single datacenter by locating all members in the same datacenter
Four-member DAG that provides high availability within a single datacenter, and site resilience for that datacenter, by locating two of the members in the primary datacenter and two of the members in a second datacenter
The design you use for your DAGs and the distribution of mailbox database copies will be based on your organization's service level agreements (SLAs) and the recovery time objective and recovery point objective for the mailbox service and data as stated in those SLAs.
Looking for management tasks related to high availability and site resilience? See Managing High Availability and Site Resilience.
A two-member DAG is the smallest possible DAG that can provide high availability. Two-member DAGs are best suited for organizations that require some form of high availability for mailbox services and data, but that don't require site resilience. This configuration works especially well in small office and branch office deployments because it enables redundancy for the Client Access, Mailbox, and Hub Transport server roles using only two Exchange servers. The following figure illustrates this configuration.
There are several aspects worth noting about this configuration:
In this design, only the Client Access, Mailbox, and Hub Transport server roles are co-located. Although it's supported to co-locate the Unified Messaging server role, we don't recommend that configuration for performance reasons.
To achieve high availability for Client Access and Hub Transport server roles, some form of load balancing should be used between the clients and those server roles. Because these server roles are co-located with a Mailbox server that's a member of a DAG, Windows Network Load Balancing can't be used (because Network Load Balancing and Windows failover clustering can't be installed on the same server). Instead, a non-Windows Network Load Balancing solution must be used (for example, a hardware load balancer or a third-party software-based load balancer).
As with all DAGs that contain an even number of members, a two-member DAG requires a witness server to maintain quorum. The witness server (not pictured) is a Windows server that isn't and will never be a member of the DAG. For example, smaller organizations that use this configuration may use a file server or a directory server as the witness server. Quorum is maintained as long as more than half of the quorum voters are available and in communication. A two-member DAG with a witness server provides three quorum voters. (Each DAG member and the witness server can vote whenever they are available and in communication.) Therefore, a two-member DAG can survive the failure or outage of a single voter (for example, either of the DAG members, or just the witness server) without an interruption in service. However, the loss of two of the voters (for example, a DAG member and the witness server) will result in a loss of quorum, which will result in an interruption in service.
A four-member DAG in a single datacenter deployment provides greater resilience to failures than a two-member or three-member DAG. Larger DAGs inherently provide greater resilience because they can sustain more failures without an interruption in service. Whereas a two-member or three-member DAG can sustain the loss of only a single voter without losing quorum and compromising service, a four-member DAG, which by definition has five quorum voters, can sustain the loss of two voters without losing quorum and compromising service.
The following figure illustrates a four-member DAG with all members located in a single datacenter.
Using a four-member DAG, you can create a maximum of four copies of each database. This is a sufficient number of database copies to enable the use of alternate data protection scenarios, such as flexible mailbox protection. Flexible mailbox protection enables you to combine the Microsoft Exchange Server 2010 high availability and Extensible Storage Engine (ESE) resilience features with other built-in protection features, such as lagged mailbox database copies, retention policies, the Recoverable Items folder, and the hold policy, to create a solution that can reduce the need for other forms of protection, such as using Redundant Array of Independent Disks (RAID) or making data backups. For more information about flexible mailbox protection, see Understanding Backup, Restore and Disaster Recovery. For more information about using replication for your backups and using just a bunch of disks (JBOD), see Mailbox Server Storage Design.
A four-member DAG extended across two datacenters provides both datacenters high availability and site resilience for the mailbox services and data. This configuration is illustrated in the following figure.
There are several aspects worth noting about this configuration:
The witness server for the DAG should be located in the primary datacenter. Generally, the primary datacenter is the datacenter containing the majority of the user population. Using a witness server in the primary datacenter enables continued functionality for the majority of the user population in the event of a wide area network (WAN) outage. You can use multiple DAGs to eliminate the WAN as a single point of failure and to allow service and data access to remain functional for multiple datacenters in the event of a WAN outage. For more information, see the next example.
There's no direct routing that allows traffic from the replication network on one DAG member server to the MAPI network on another DAG member server, or the reverse, or between multiple replication networks in the DAG. For example, you would want to block traffic between the MAPI network on each DAG member and the replication networks on each other DAG network. (In the previous figure, the MAPI network on MBX1A shouldn't have any network connectivity with the replication networks on MBX1B or MBX2B.) You can use router access control lists (ACLs) to block this traffic. In addition, if you're using Dynamic Host Configuration Protocol (DHCP) for the replication network, you can use DHCP to configure static routes for the DAG members.
Because this DAG configuration is intended to provide site resilience, the Time to Live (TTL) value for the Exchange client access namespaces (Microsoft Office Outlook Web App, Autodiscover, Microsoft Exchange ActiveSync, Outlook Anywhere, POP3, IMAP4, SMTP, and the RPC Client Access array) should be set to 5 minutes in both the internal and external DNS zones.
In this example, the Exchange server roles are deployed on dedicated hardware. Because the Client Access and Hub Transport server roles aren't co-located with the Mailbox server in the DAG, Windows Network Load Balancing is used to load balance the Client Access and Hub Transport server roles.
As illustrated in the previous example, using a single four-member DAG extended across two datacenters can provide high availability and site resilience for the mailbox services and data. However, if a WAN outage occurs, only the primary datacenter retains service because it contains the majority of the voters. The datacenter with the minority of voters loses majority, and the DAG members in that datacenter lose quorum and go offline.
To deploy highly available Mailbox servers in a multiple datacenter environment, where each datacenter is actively serving a local user population, we recommend that you deploy multiple DAGs, where each DAG has a majority of voters in a different datacenter, as illustrated in the following figure.
Because DAG1 and DAG2 contain an even number of members, they use a witness server. Although multiple DAGs can use the same witness server, multiple witness servers in separate datacenters are used to maintain service to each datacenter's local user population in the event of a WAN outage.
Users located in Portland would have their active mailbox database located on PDXMBX3 and/or PDXMBX4, with passive database copies on REDMBX3 and/or REDMBX4. Similarly, users located in Redmond would have their active mailbox database located on REDMBX1 and/or REDMBX2, with passive database copies on PDXMBX1 and/or PDXMBX2. If all network connectivity is lost between Redmond and Portland, the following occurs:
For DAG1, members REDMBX1 and REDMBX2 would be in the majority and would continue to service users in the Redmond datacenter because they can communicate with the DAG1's witness server, HUB1.
For DAG2, members PDXMBX3 and PDXMBX4 would be in the majority and would continue to service users in the Portland datacenter because they can communicate with DAG2's witness server, HUB2.
As previously mentioned, larger DAGs inherently provide greater resilience because they can sustain more failures without an interruption in service. One design strategy that can help increase resilience when dealing with DAG member failures is to leverage the existing Hub Transport servers in the DAG's primary datacenter. This strategy involves adding the Mailbox server role (without any databases or database copies) to the Hub Transport server, and then adding that server to the DAG. In this scenario, the Mailbox server role is being used only for voting and quorum purposes. The more voters in a DAG, the more voter failures the DAG can sustain and still maintain quorum.
For example, consider a four-member DAG extended across two datacenters. The primary datacenter contains two DAG members and the witness server, and a second datacenter contains two DAG members. As illustrated in the following figure, there are five quorum voters. Therefore, this DAG can lose two voters and still maintain quorum. If the DAG loses a third voter, it loses quorum and requires manual administrative intervention to restore service.
Using the same servers in this example, you can add the Mailbox server role to the Hub Transport servers REDHUB1, REDHUB2, and PDXHUB1, and then add these servers to DAG1 (assuming these servers are capable of running Windows failover clustering).
At this point, you don't create any production mailbox databases on these servers. You also don't replicate any database copies to these servers. In this configuration, you can delete the default mailbox database and stop the Microsoft Exchange Information Store service (which can also be optionally disabled).
|Although the Microsoft Exchange Information Store service isn't needed for a Mailbox server that doesn't contain a database to participate in quorum voting, the Microsoft Exchange Replication service must be running for the Mailbox server to participate in quorum and DAG functions.|
After the Mailbox servers that don't contain databases are added as members of the DAG, they become participants in quorum for the DAG. In this configuration, DAG1 now has seven quorum voters. As a result, it can lose three servers and still maintain quorum.