The proper planning and preparation involves not only the deployment of the second datacenter resources, such as live Client Access and Hub Transport servers, but also pre-configuration of those resources to minimize the changes required as part of a datacenter switchover operation.
Note: |
|---|
|
Client Access and Hub Transport services are required in the second datacenter even when automatic activation of the mailbox databases in the second datacenter is blocked. These services are necessary in order to perform database switchovers, as well as to perform testing and validation of the services and data in the second datacenter.
|
To better understand the how a datacenter switchover process works, it's helpful to understand the basic operation of an Exchange 2010 datacenter switchover.
As illustrated in the following figure, a site resilient deployment consists of a DAG that has members in both datacenters.
Database availability group with members in two datacenters.gif)
When a DAG is extended across multiple datacenters, it should be designed so that either the majority of the DAG members are located in the primary datacenter or, when each datacenter has the same number of members, the primary datacenter hosts the witness server. This design guarantees that service will provided in the primary datacenter even if network connectivity between the two datacenters fails. It also means that when the primary datacenter fails, however, quorum will be lost for the members in the second datacenter.
Partial datacenter failures are also possible and will happen. The presumption is that if enough functionality is lost in the primary datacenter to preclude effective service and management then a datacenter switchover should be performed to activate the second datacenter. The activation process involves the administrator configuring the surviving servers of partially operational state to cease service. Activation can then proceed in the second datacenter. This is done to preclude both sets of services to try and operate at the same time.
As a result of the loss of the quorum, the DAG members in the second datacenter cannot automatically come online. Thus, activating the mailbox servers in the second datacenter also requires a step where the DAG member servers are forced to create quorum, at which point the servers in the failed datacenter are internally (but only temporarily) removed from the DAG. This provides a partial-service solution that's stable and able to experience some level of additional failures and still continue to function.
Note: |
|---|
|
One prerequisite of being able to experience additional failures is that the DAG has at least four members and the four members are spread between two Active Directory sites (e.g., at least two members in each datacenter).
|
This is the basic process used to re-establish Mailbox role functionality in the second datacenter. The activation of the other roles in the second datacenter does not involve explicit actions on the impacted servers in the second datacenter. Instead, servers in the second datacenter become the service endpoints for those services normally hosted by the primary datacenter. For example, a user normally hosted in the primary datacenter might use https://mail.contoso.com/owa to connect to Outlook Web App. After the datacenter failure, these service endpoints are moved to endpoints in the second datacenter as part of the switchover operation. During the switchover operation, the service endpoints for the primary datacenter are re-targeted at alternate IP addresses for the same services in the second datacenter. This minimizes the amount of changes that must be made to configuration information stored in Active Directory during the switchover process. Generally, there are two ways to complete this step:
-
Update DNS records; or
-
Reconfigure DNS and load balancer(s) to enable and disable alternate IP addresses, thus moving services between datacenters.
A strategy for testing the solution must be established. It must be factored into the SLA. Periodic validation of the deployment is the only way to guarantee the deployment does not degrade over time.
Careful completion of these planning steps will directly impact the success of a datacenter switchover. For example, poor namespace design can cause difficulties with certificates, and an incorrect certificate configuration can preclude users from being able to access services.
After the deployment is validated, we recommend that all parts of the configuration that directly affect the success of a datacenter switchover be explicitly documented. In addition, it might be prudent to enhance the change management processes around those segments of the deployment.
For more information about datacenter switchovers, including activating a secondary datacenter, and re-activating a failed (primary) datacenter, see Datacenter Switchovers.
Return to top