Deploying High Availability and Site Resilience

Article
07/23/2014

Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2

To create a highly available Mailbox server in previous versions of Exchange, you would install Exchange on a server that was configured as a member of a Microsoft Windows failover cluster. If you wanted a highly available Mailbox server, you had to build and configure the cluster prior to running Exchange Setup. The Exchange Setup program (and other Exchange components, such as the Exchange store and the Microsoft Exchange System Attendant service) was cluster-aware, and therefore behaved differently from when it was run on a stand-alone server. If Exchange was already installed on a stand-alone Windows server, you couldn't configure that server for high availability without first removing Exchange, building a cluster, and then reinstalling Exchange using the cluster-aware version of Setup.

Microsoft Exchange Server 2010 uses the concept known as incremental deployment for both high availability and site resilience. Unlike previous versions, Exchange 2010 no longer uses the cluster resource model for high availability. As a result of this architectural change, there is no longer a cluster-aware version of Setup, and you no longer configure high availability during Setup. Instead, you simply install all Exchange 2010 servers as standalone servers, and then incrementally configure mailbox servers and mailbox databases for high availability and site resilience as needed.

Overview of the Deployment Process

While the actual steps used by each organization may vary slightly, the overall process for deploying Exchange 2010 in a highly available or site resilient configuration is generally the same. After performing the necessary planning and design tasks for building and deploying a database availability group (DAG) and creating mailbox database copies, you would:

Create a DAG. For detailed steps, see Create a Database Availability Group.
If necessary, pre-stage the cluster name object (CNO). Pre-staging the CNO is required when deploying a DAG with Mailbox servers running Windows Server 2012. Pre-staging is also required in environments where computer account creation is restricted or where computer accounts are created in a container other than the default computers container. For detailed steps, see Pre-stage the Cluster Name Object for a Database Availability Group.
Add two or more Mailbox servers to the DAG. For detailed steps, see Manage Database Availability Group Membership.
Configure the DAG properties as needed:
1. Optionally configure DAG encryption and compression, replication port, DAG IP addresses, and other DAG properties. For detailed steps, see Configure Database Availability Group Properties.
2. If the DAG contains three or more Mailbox servers that are deployed in multiple Active Directory sites, Datacenter Activation Coordination (DAC) mode should be enabled. For more information, see Understanding Datacenter Activation Coordination Mode.
3. For detailed steps about how to create a DAG network, see Create a Database Availability Group Network. To manage a DAG network, see Configure Database Availability Group Network Properties.
Add mailbox database copies across Mailbox servers in the DAG. For detailed steps, see Add a Mailbox Database Copy.

Example Deployment: Four Member DAG in Two Datacenters

This example details how an organization, Contoso, Ltd., is configuring and deploying a four-member DAG that will be extended across two physical locations: a primary datacenter referred to as Active Directory SITEA and a second datacenter referred to as Active Directory SITEB. SITEA is located in Redmond, Washington, and SITEB is located in Dublin, Ireland.

Base Infrastructure

Each location contains the infrastructure elements that are necessary to operate a messaging infrastructure based on Exchange 2010, namely:

Directory services (either Active Directory or Active Directory Domain Services (AD DS))
Domain Name System (DNS) name resolution
One or more Exchange 2010 Client Access servers
One or more Exchange 2010 Hub Transport servers
One or more Exchange 2010 Mailbox servers

Note

The Client Access, Hub Transport, and Mailbox server roles can be co-located on a single computer. In this example deployment, the server roles are installed on separate computers.

The following figure illustrates the Contoso configuration.

Database availability group extended across two sites

Database Availability Group Across Two Sites

Except for the Mailbox servers, all of the servers in the Contoso environment are running the Windows Server 2008 R2 Standard operating system. The Mailbox servers, which were planned with DAGs in mind, are running Windows Server 2008 R2 Enterprise.

In addition to the preceding infrastructure components, each location contains other messaging elements, such as Edge Transport servers and Unified Messaging servers.

Network Configuration

As illustrated in the previous figure, the solution involves the use of multiple subnets and multiple networks. Each Mailbox server in the DAG has two network adapters on separate subnets. In each Mailbox server, one network adapter will be used for the MAPI network (192.168.x.x) and one network adapter will be used for the Replication network (10.0.x.x). Only the MAPI network provides connectivity to Active Directory, DNS services, other Exchange servers and clients. The adapter used for the Replication network in each member provides connectivity only to the Replication network adapters in the other members of the DAG.

The settings for each network adapter in each node are detailed in the following table.

Name	IPv4 address	Subnet mask	Default gateway
MBX1A (MAPI)	192.168.1.4	255.255.255.0	192.168.1.1
MBX2A (MAPI)	192.168.1.5	255.255.255.0	192.168.1.1
MBX1B (MAPI)	192.168.2.4	255.255.255.0	192.168.2.1
MBX2B (MAPI)	192.168.2.5	255.255.255.0	192.168.2.1
MBX1A (Replication)	10.0.1.4	255.255.0.0	None
MBX2A (Replication)	10.0.1.5	255.255.0.0	None
MBX1B (Replication)	10.0.2.4	255.255.0.0	None
MBX2B (Replication)	10.0.2.5	255.255.0.0	None

As shown in the preceding table, adapters used for Replication networks don't use default gateways. To provide network connectivity between each of the Replication network adapters, Contoso uses persistent static routes, which they configure by using Netsh.exe tool. Netsh.exe is a tool you can use to configure and monitor Windows-based computers at a command prompt. With the Netsh.exe tool, you can direct the context commands you enter to the appropriate helper, and the helper then carries out the command. A helper is a dynamic-link library file (.dll) that extends the functionality of the Netsh.exe tool by providing configuration, monitoring, and support for one or more services, utilities, or protocols.

To configure routing for the Replication network adapters on MBX1A and MBX2A, the following command was run on each server.

netsh interface ipv4 add route 10.0.2.0/24 <NetworkName> 10.0.1.254

To configure routing for the Replication network adapters on MBX1B and MBX2B, the following command was run on each server.

netsh interface ipv4 add route 10.0.1.0/24 <NetworkName> 10.0.2.254

The following additional network settings have also been configured:

The Register this connection's addresses in DNS check box is selected for each DAG member's MAPI network adapter, and cleared for each Replication network adapter.
At least one DNS server address is configured for each DAG member's MAPI network adapter, and none are configured for the Replication network adapters. For redundancy, Contoso is using multiple DNS server addresses for their MAPI network adapters.
Contoso doesn't use IPv6, and they disabled the protocol on their servers.
Contoso doesn't use the Windows Firewall and have turned it off on their servers.

After the network adapters have been configured, Contoso is ready to create a DAG and add the Mailbox servers to the DAG.

Database Availability Group Creation and Configuration

The administrator has decided to create a Windows PowerShell command-line interface script that performs several tasks:

It uses the New-DatabaseAvailabilityGroup cmdlet to create the DAG. Because SITEA is considered to be the primary datacenter, Contoso has chosen to use a witness server in the same datacenter, namely, HUB-A.
It uses the Set-DatabaseAvailabilityGroup cmdlet to preconfigure an alternate witness server and alternate witness directory in case a site switchover is ever necessary.
It uses the Add-DatabaseAvailabilityGroupServer cmdlet to add each of the four Mailbox servers to the DAG.
It uses the Set-DatabaseAvailabilityGroup cmdlet to configure the DAG for DAC mode. For more information about DAC mode, see Understanding Datacenter Activation Coordination Mode.

The following are the commands used in the script:

New-DatabaseAvailabilityGroup -Name DAG1 -WitnessServer HUB-A -WitnessDirectory C:\DAGWitness\DAG1.contoso.com -DatabaseAvailabilityGroupIPAddresses 192.168.1.8,192.168.2.8

The preceding command creates a DAG named DAG1, configures Hub-A to act as the witness server, configures a specific witness directory (C:\DAGWitness\DAG1.contoso.com), and configures two IP addresses for the DAG (one for each subnet on the MAPI network).

Set-DatabaseAvailabilityGroup -Identity DAG1 -AlternateWitnessDirectory C:\DAGWitness\DAG1.contoso.com -AlternateWitnessServer HUB-B

The preceding command configures DAG1 to use an alternate witness server of Hub-B and an alternate witness directory on Hub-B that uses the same path that was configured on Hub-A.

Note

Using the same path isn't required; Contoso has chosen to do this to standardize their configuration.

Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer MBX1A
Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer MBX1B
Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer MBX2A
Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer MBX2B

The preceding commands add each of the Mailbox servers, one at a time, to the DAG. The commands also install the Windows Failover Clustering component on each Mailbox server (if it isn't already installed), create a failover cluster, and join each Mailbox server to the newly created cluster.

Set-DatabaseAvailabilityGroup -Identity DAG1 -DatacenterActivationMode DagOnly

The preceding command enables DAC mode for the DAG.

Mailbox Databases and Mailbox Database Copies

After creating the DAG and adding the Mailbox servers to the DAG, Contoso prepares to create mailbox databases and mailbox database copies. To meet their criteria for failure resistance, Contoso is planning to configure each mailbox database with three non-lagged database copies, and one lagged database copy. The lagged copy will have a configured log replay delay of three days.

This configuration provides a total of four copies for each database (one active, two non-lagged passives, and a lagged passive). Contoso plans on having four active databases per server. With four active databases per server, and three passive copies of each database, the Contoso solution contains 16 total database copies.

As shown in the following figure, Contoso is taking a balanced approach to their database layout.

Database copy layout for Contoso, Ltd

Database Copy Layout for Contoso, Ltd

Each Mailbox server hosts an active mailbox database copy, two non-lagged passive database copies, and one lagged passive database copy. The lagged copy of each active mailbox database is hosted on a Mailbox server in the other site.

To create this configuration, the administrator runs several commands.

On MBX1A, run the following commands.

Add-MailboxDatabaseCopy -Identity DB1 -MailboxServer MBX2A
Add-MailboxDatabaseCopy -Identity DB1 -MailboxServer MBX2B
Add-MailboxDatabaseCopy -Identity DB1 -MailboxServer MBX1B -ReplayLagTime 3.00:00:00 -SeedingPostponed
Suspend-MailboxDatabaseCopy -Identity DB1\MBX1B -SuspendComment "Seed from MBX2B" -Confirm:$False
Update-MailboxDatabaseCopy -Identity DB1\MBX1B -SourceServer MBX2B
Suspend-MailboxDatabaseCopy -Identity DB1\MBX1B -ActivationOnly

On MBX2A, run the following commands.

Add-MailboxDatabaseCopy -Identity DB2 -MailboxServer MBX1A
Add-MailboxDatabaseCopy -Identity DB2 -MailboxServer MBX1B
Add-MailboxDatabaseCopy -Identity DB2 -MailboxServer MBX2B -ReplayLagTime 3.00:00:00 -SeedingPostponed
Suspend-MailboxDatabaseCopy -Identity DB2\MBX2B -SuspendComment "Seed from MBX1B" -Confirm:$False
Update-MailboxDatabaseCopy -Identity DB2\MBX2B -SourceServer MBX1B
Suspend-MailboxDatabaseCopy -Identity DB2\MBX2B -ActivationOnly

On MBX1B, run the following commands.

Add-MailboxDatabaseCopy -Identity DB3 -MailboxServer MBX2B
Add-MailboxDatabaseCopy -Identity DB3 -MailboxServer MBX2A
Add-MailboxDatabaseCopy -Identity DB3 -MailboxServer MBX1A -ReplayLagTime 3.00:00:00 -SeedingPostponed
Suspend-MailboxDatabaseCopy -Identity DB3\MBX1A -SuspendComment "Seed from MBX2A" -Confirm:$False
Update-MailboxDatabaseCopy -Identity DB3\MBX1A -SourceServer MBX2A
Suspend-MailboxDatabaseCopy -Identity DB3\MBX1A -ActivationOnly

On MBX2B, run the following commands.

Add-MailboxDatabaseCopy -Identity DB4 -MailboxServer MBX1B
Add-MailboxDatabaseCopy -Identity DB4 -MailboxServer MBX1A
Add-MailboxDatabaseCopy -Identity DB4 -MailboxServer MBX2A -ReplayLagTime 3.00:00:00 -SeedingPostponed
Suspend-MailboxDatabaseCopy -Identity DB4\MBX2A -SuspendComment "Seed from MBX1A" -Confirm:$False
Update-MailboxDatabaseCopy -Identity DB4\MBX2A -SourceServer MBX1A
Suspend-MailboxDatabaseCopy -Identity DB4\MBX2A -ActivationOnly

In the preceding examples for the Add-MailboxDatabaseCopy cmdlet, the ActivationPreference parameter wasn't specified. The task automatically increments the activation preference number with each copy that's added. The original database always has a preference number of 1. The first copy added with the Add-MailboxDatabaseCopy cmdlet is automatically assigned a preference number of 2. Assuming no copies are removed, the next copy added is automatically assigned a preference number of 3, and so forth. Thus, in the preceding examples, the passive copy in the same datacenter as the active copy has an activation preference number of 2; the non-lagged passive copy in the remote datacenter has an activation preference number of 3, and the lagged passive copy in the remote datacenter has an activation preference number of 4.

Although there are two copies of each active database across the WAN in the other location, seeding over the WAN was only performed once. This is because Contoso is leveraging the Exchange 2010 ability to use a passive copy of a database as the source for seeding. Using the Add-MailboxDatabaseCopy cmdlet with the SeedingPostponed parameter prevents the task from automatically seeding the new database copy being created. Then, the administrator can suspend the un-seeded copy, and by using the Update-MailboxDatabaseCopy cmdlet with the SourceServer parameter, the administrator can specify the local copy of the database as the source of the seeding operation. As a result, seeding of the second database copy added to each location happens locally and not over the WAN.

Note

In the preceding example, the non-lagged database copy is seeded over the WAN, and that copy is then used to seed the lagged copy of the database that's in the same datacenter as the non-lagged copy.

Contoso has configured one of the passive copies of each mailbox database as a lagged database copy to provide protection against the extremely rare but catastrophic case of database logical corruption. As a result, the administrator is configuring the lagged copies as blocked for activation by using the Suspend-MailboxDatabaseCopy cmdlet with the ActivationOnly parameter. This ensures that the lagged database copies won't be activated if a database or server failover occurs.

Validating the Solution

After the solution has been deployed and configured, the administrator performs several tasks that validate the solution's readiness prior to moving production mailboxes to the databases in the DAG. The solution should be tested and inspected using several methods, including failure simulations. To validate the solution, the administrator performs several tasks.

To verify the overall health of the DAG, the administrator runs the Test-ReplicationHealth cmdlet. This cmdlet checks several aspects of the replication and replay status to provide information about each Mailbox server and database copy in the DAG.

To verify replication and replay activity, the administrator runs the Get-MailboxDatabaseCopyStatus cmdlet. This cmdlet can provide real-time status information about a specific mailbox database copy or for all mailbox database copies on a specific server. For more information about monitoring the health and status of replicated databases in a DAG, see Monitoring High Availability and Site Resilience.

To verify switchovers work as expected, the administrator uses the Move-ActiveMailboxDatabase cmdlet to perform a series of database switchovers and server switchovers. When these tasks have completed successfully, the administrator uses the same cmdlet to move the active database copies back to their original locations.

To verify the expected behaviors in various failure scenarios, the administrator performs several tasks that either simulate failures or actually cause failures to occur. For example, the administrator might:

Unplug the power cord on MBX1A, thereby triggering a server failover. The administrator then verifies that DB1 becomes active on another server (preferably MBX2A, based on the activation preference values).
Unplug the network cable for the MAPI network adapter on MBX2A, thereby triggering a server failover. The administrator then verifies that DB2 becomes active on another server (preferably MBX1A, based on the activation preference values).
Take the disk used by the active copy of DB3 offline, thereby triggering a database failover. The administrator then verifies that DB3 becomes active on another server (preferably MBX2B, based on activation preference values).

There may be other failure scenarios that are tested by an organization, based on the business needs. After simulating a single failure (such as pulling the power plug), and verifying the solution's recovery behavior, the administrator may revert the solution back to its original configuration. In some cases, the solution may be tested for multiple concurrent failures. Ultimately, your solution test plan will dictate whether the solution is reverted back to its original configuration after each failure simulation has been completed.

In addition, an administrator may decide to disconnect the network connection between the two datacenters, thereby simulating a site failure. Performing a datacenter switchover is a much more involved and coordinated process; however, it's a recommended process if the solution being deployed is intended to provide site resilience for the messaging services and data. For details on datacenter switchovers, see Datacenter Switchovers.

Transitioning to Operations

After the solution has been deployed, it can be extended further using incremental deployment. At this point, management of the solution would also transition to operation processes, in which the following tasks would be performed:

Monitor the health and status of DAGs and mailbox database copies. For more information, see Monitoring High Availability and Site Resilience.
Perform database and server switchovers as needed. For detailed steps to perform a database switchover, see Move the Active Mailbox Database. For detailed steps to perform a server switchover, see Perform a Server Switchover. If necessary, initiate a datacenter switchover. For more information about datacenter switchovers, see Datacenter Switchovers.

For more information about managing the solution, see Managing High Availability and Site Resilience.

Deploying High Availability and Site Resilience

Overview of the Deployment Process

Example Deployment: Four Member DAG in Two Datacenters

Base Infrastructure

Network Configuration

Database Availability Group Creation and Configuration

Mailbox Databases and Mailbox Database Copies

Validating the Solution

Transitioning to Operations

Additional resources