Installing Cluster Continuous Replication on Windows Server 2008

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3

Although the process for deploying cluster continuous replication (CCR) on Windows Server 2008 is similar to deploying CCR on Windows Server 2003, there are some significant differences. Before deploying CCR, we recommend that you thoroughly review Cluster Continuous Replication. In addition, make sure that you meet all of the requirements specified in Planning for Cluster Continuous Replication.

Installation of CCR on Windows Server 2008 occurs in several different phases:

  1. Configuring hardware setup, starting with the cluster network formation and configuration.

  2. Forming the cluster, beginning with the first node and then the second.

  3. Configuring the cluster networks and tolerance for missed cluster heartbeats.

  4. Configuring and securing the file share witness.

  5. Installing the active and passive Mailbox server roles into the cluster. The clustered mailbox server (CMS) is created during the installation of the active Mailbox server role.

    Note

    We recommend that you complete each phase before you start the next phase. After you complete all phases, we recommend that you verify the CCR solution before putting it into production.

There are some post-installation tasks that must be performed as well:

  • Tuning failover control settings.

  • Tuning the default configuration of the transport dumpster.

  • Verifying the ability to move a CMS between the nodes in the cluster.

  • Enabling multiple networks for continuous replication activity.

Before performing any of the following referenced procedures, you must first make sure that the intended computers have the required operating system components for Windows Server 2008 installed. For detailed steps about how to install Microsoft Exchange prerequisites on Windows Server 2008, see How to Install Exchange 2007 SP1 and SP2 Prerequisites on Windows Server 2008 or Windows Vista.

The following sections explain each of the installation phases in more detail.

Network Formation and Configuration

You must have a sufficient number of available IP addresses available when you create CMSs in a two-node CCR environment on Windows Server 2008. Windows Server 2008 failover clustering introduces new networking capabilities that are a major shift from the way things have been done in legacy clusters. For example, Windows Server 2008 failover clusters introduce support for multiple subnets, Dynamic Host Configuration Protocol (DHCP) Internet Protocol version 4 (IPv4), and IPv6. When running in a Windows Server 2008 failover cluster, Microsoft Exchange Server 2007 Service Pack 1 (SP1) includes support for geographically dispersed clusters for failover across two subnets. This support includes both single copy clusters (SCCs), as well as Mailbox servers in a CCR environment.

Note

Although DHCP IPv4 is supported in Windows Server 2008 failover clusters, we recommend using static IP addresses in production environments. If DHCP IPv4 is used in a failover cluster, we recommend that you configure the DHCP servers to grant leases of unlimited length.

Beginning with Windows Server 2008 failover clustering, individual cluster nodes can now be located on separate, routed networks. This requires that resources that depend on IP Address resources (for example, Network Name resources), implement an OR logic because it is unlikely that every cluster node will have a direct local connection to every network the cluster knows about. This facilitates IP Address and Network Name resources coming online when services or applications fail over to remote nodes.

All online IP addresses associated with a Network Name resource will be dynamically registered in the Domain Name System (DNS) (if configured for dynamic updates) with the list ordered such that those IP Address resources that are online are returned first to the clients. Because cluster nodes can be placed on different, routed networks, and the communication mechanisms have been changed to use reliable session protocols implemented over User Datagram Protocol (UDP) (unicast), the networking requirements for geographically dispersed clusters are no longer applicable. As a result, organizations can deploy a failover cluster across two physical data centers without having to use virtual LAN (VLAN) technology to span the cluster subnets across the two locations.

IP addresses are required for both the public and private networks. Requirements related to private and public addresses are as follows:

  • Private addresses   Each node requires one IP address for each network adapter that is used for the cluster private network. You can use a static IPv4 address or a dynamically assigned IPv6 address. You must use IP addresses that are not on the same subnet or network as one of the public networks. We recommend that you use 10.10.10.10 and 10.10.10.11 with a subnet mask of 255.255.255.0 as the private IP addresses for the nodes.

  • Public addresses   Each node requires one IP address for each network adapter that is used for the cluster public network, sometimes referred to as a mixed network. Additionally, IP addresses are required for the failover cluster and for the CMS so that they can be accessed by clients and administrators. You must use IP addresses that are not on the same subnet or network as one of the private networks. You can use static IPv4 addresses, DHCP IPv4 addresses, or static IPv6 addresses.

    Important

    All network adapters for a cluster network must use the same version of TCP/IP, that is, they must all use only IPv4 or all must use both IPv4 and IPv6.

Network Best Practices for Clustered Mailbox Servers

We also recommend that you follow these best practices for your cluster network:

  • Use meaningful names   Building a cluster gives you many opportunities to use meaningful names for cluster nodes, cluster network interfaces, the cluster name, and CMS names. For example, the network used to communicate with other Exchange servers and clients can be called Public. The network used to communicate between the cluster nodes can be called Private. Use names that can be related to each other without having to review a topology map. Another useful convention is to relate the nodes of a cluster to the name of the CMS. For example, use mbx01, mbx01-node1, and mbx01-node2 for the CMS and the two nodes, respectively.

  • Use private IP addresses for the private network interfaces   For a list of IP address ranges and subnet masks that can be used for the private network interfaces on each node, see the following table.

    Address ranges and subnet masks for private network interfaces

    Network / node IP address range Subnet mask

    Private / NODE1

    10.10.10.10-255

    255.255.255.0

    Private / NODE2

    10.10.10.11-255

    255.255.255.0

Note the following:

  • If your public network uses a 10.x.x.x network and 255.255.255.0 subnet mask, we recommend that you use alternate private network IP addresses and a subnet mask.

  • We do not recommend the use of any type of fault-tolerant adapter or teaming for the private network. If you require redundancy for your private network, use multiple network adapters configured only for cluster use. It is important to verify that your firmware and driver are at the most current revision if you use this technology. For information about compatibility on a server cluster, contact your network adapter manufacturer. For more information about network adapter teaming in failover cluster deployments, see Microsoft Knowledge Base article 254101, Network adapter teaming and server clustering.

Forming the Failover Cluster

A failover cluster is formed when the first node is added to the cluster. This process gives the cluster a unique network name and a unique network IP address. The network name and IP address, which collectively are the cluster's network identity, move between nodes in the cluster as nodes go online and offline. Generally, the cluster's network identity is rarely used in the administration of a CMS.

If you are familiar with deploying failover clusters or Exchange clusters from previous versions, you will find deployment of a cluster for CCR quite different. If you are new to cluster solutions, you will find deployment to be much less complex than typical cluster configurations.

You can build a new cluster using the instructions in How to Create a Windows Server 2008 Failover Cluster for Cluster Continuous Replication.

Adding Additional Nodes

After you install the Cluster service on the first node, you will find that it takes less time to install it on the second node. This is because the Setup program uses the network configuration settings configured on the first node as a basis for configuring the network settings on subsequent nodes. Before you add and configure the second node, you should validate the cluster configuration. You can verify that the Cluster service is running and the cluster is operational by running Cluster.exe group from a command prompt. It should produce output similar to the following.

C:\>cluster group
Listing status for all available resource groups:
Group                   Node            Status
------------------ ---------------      ------
Cluster Group         <NODEName>        Online

We also recommend that you review the event logs for errors and warnings that might require attention before proceeding. For detailed steps about how to add the second node to the cluster, see How to Create a Windows Server 2008 Failover Cluster for Cluster Continuous Replication.

Configuring the Cluster Networks

After both nodes have been added to the cluster, the cluster networking components must be configured. Specifically, you must configure networks for cluster and client access, and you must configure tolerance settings for missed cluster heartbeats. We also recommend that you rename the cluster networks with more meaningful names.

The following table details the available options for configuring cluster networks for the cluster heartbeat.

Options for configuring cluster networks

Option Description

Allow the cluster to use this network (private network)

Select only this option if you want the Cluster service to exclusively use this network for inter-node communication traffic. Clients will not be able to connect to the CMS using this network.

Allow the cluster to use this network and allow clients to connect through this network (mixed network)

Select both of these options if you want the Cluster service to use the network adapter for the cluster inter-node communication and for communication with external clients. The Cluster service will use this network for inter-node communication, and clients will be able to connect to the CMS using this network.

Do not allow the cluster to use this network (unmanaged network)

Select only this option if you do not want to use the network in the cluster, or have the Cluster service manage the network. The Cluster service will not be able to use this network for inter-node communication, and clients will not be able to connect to the CMS using this network.

CMSs deployed in a CCR environment require at least two network cards in both nodes to be supported. In Exchange 2007 SP1, any network that is managed by the Cluster service and enabled for both cluster use and client connections (for example, configured as a mixed network) can be used for continuous replication functions, including seeding, log shipping, and reseeding. This is accomplished using a new cmdlet in Exchange 2007 SP1 called Enable-ContinuousReplicationHostName.

Note

One option for configuring cluster networks is to create a preliminary network configuration and then run the Validate a Configuration wizard in the Failover Cluster Management tool with only the network tests selected (for example, skip the Inventory, Storage, and System Configuration tests). When only the network tests are run, the process does not take a long time. Using the validation report, you can make any corrections needed in the network configuration. After the entire cluster has been configured, we recommend that you rerun the Validate a Configuration wizard and select all tests.

Configuring Tolerance Settings for Missed Cluster Heartbeats

After cluster communications and network priority have been configured, we recommend that you configure specific tolerance settings for missed cluster heartbeats. Doing so configures the Cluster service monitoring of network connectivity between cluster nodes to be tolerant of minor interruptions. This prevents failovers in some cases where the network outage is brief. We recommend that you configure private and mixed cluster networks on all nodes to account for ten missed heartbeats. This setting level corresponds to approximately 12 seconds.

For detailed steps about how to configure the cluster networking components, see How to Configure Cluster Networks for a Failover Cluster.

Configuring TTL Settings for the Clustered Mailbox Server Network Name Resource

There are two deployment scenarios in which exercising outage or recovery options will involve a change of the IP address assigned to the CMS:

  • A CMS is deployed in a multiple subnet environment.

  • A standby cluster is used to recover a failed cluster.

In both scenarios, the name of the CMS does not change, but the IP address assigned to the CMS changes. Clients and other servers that communicate with a CMS that has changed IP addresses will not be able to reestablish communications with the CMS until DNS has been updated with the new IP address, and any local DNS caches have been updated. To minimize the amount of time it takes to have the DNS changes known to clients and other servers, we recommend setting a DNS TTL value of five minutes for the CMS Network Name resource.

Note

In most environments, we recommend setting the DNS TTL value only for the CMS Network Name resource. However, in environments with non-Exchange management tools that connect to the cluster by its name for management purposes, we recommend setting a TTL value of five minutes for the cluster's Network Name resource, as well.

By default, the Cluster service uses a setting of 20 minutes for the DNS TTL value of Network Name resources. Although the DNS management tools can be used to manually adjust the TTL value for the host name directly in the DNS database, the value in the DNS database will be overwritten and set to the Cluster service default of 20 minutes every time the network name registration in DNS is refreshed. A refresh of the network name registration in DNS will occur whenever the CMS is started, moved, or brought back online after a failure or failover.

In Windows Server 2008, a new private property has been added to Network Name resources in failover clusters. The new property is called HostRecordTTL, and you can configure it by using Cluster.exe.

Note

The property is available only on Windows Server 2008 failover clusters. There is no such property in Windows Server 2003 failover clusters. For failover clusters running Windows Server 2003, the Cluster service default of 20 minutes will always apply.

For detailed steps about how to configure the DNS TTL values for the CMS Network Name resource for use in a multiple subnet CMS or standby cluster deployment, see How to Configure the DNS TTL Value for a Clustered Mailbox Server Network Name Resource.

Configuring the Cluster Quorum

After the cluster networks have been configured, the next step is to configure the failover cluster to use a Node and File Share Majority quorum resource. For detailed steps about how to configure a failover cluster to use the Node and File Share Majority quorum model, see How to Configure the Node and File Share Majority Quorum.

Validating the Failover Cluster

Windows Server 2008 includes a new wizard called the Validate a Configuration wizard that you can use to verify the health and configuration of a failover cluster. We recommend running this wizard prior to installing Exchange 2007 in the cluster. By running this wizard before you install Exchange 2007, you can identify and address configuration problems within the cluster that might prevent Exchange from being installed.

The Validate a Configuration wizard includes four groups of tests that are designed to verify that the cluster meets the requirements necessary to be supported by Microsoft. These are requirements that are in addition to the requirement that the cluster solution carry the Designed for Windows Server 2008 compatibility logo.

The four groups of tests are: Inventory, Network, Storage, and System Configuration. Because CCR does not use shared storage, there is no need to run the Storage group of tests. If you run the Storage group of tests against a failover cluster that does not have any clustered storage resources, such as a failover cluster designed for CCR, the Storage group of tests will fail. All failures of the Storage group of tests can be safely ignored because the lack of shared storage is expected for a failover cluster designed for CCR.

For detailed steps about how to validate the failover cluster, see How to Validate a Windows Server 2008 Failover Cluster Configuration.

Clustered Mailbox Server Installation and Configuration

You can install the Mailbox server role on a cluster by performing a few steps on each node. After the cluster has been formed and validated, and after the cluster has been configured to use the Majority Node Set (MNS) quorum with file share witness, you should first install the Mailbox server role on the active node. For detailed steps about how to install the Mailbox server role on the active node, see How to Install the Active Clustered Mailbox Role in a CCR Environment on Windows Server 2008.

After you have installed the Mailbox server role and a CMS on the active node and verified the first storage group's configuration, you should install the Mailbox server role on the passive node. For detailed steps about how to install the Mailbox server role on the passive node, see How to Install the Passive Clustered Mailbox Role in a CCR Environment on Windows Server 2008.

After you install the Mailbox server role, you optionally can tune failover settings. For more information about tuning failover, see How to Tune Mount and Failover Settings for Cluster Continuous Replication.

Post-Setup Tasks

After the Mailbox server role has been installed on both nodes and a CMS has been created, you should perform some post-setup tasks. These tasks include:

  • Enabling multiple networks for continuous replication activity.

  • Tuning failover control settings.

  • Tuning the default configuration of the transport dumpster.

  • Verifying the ability to move a CMS between the nodes in the cluster.

Enabling Multiple Networks for Continuous Replication Activity

In the release to manufacturing (RTM) version of Microsoft Exchange Server 2007, all log file copying and seeding occurs over the public network. In Exchange 2007 SP1, any cluster network configured as a mixed network can be enabled for continuous replication activity. This activity includes storage group seeding and reseeding, and log shipping.

In Exchange 2007 SP1, only cluster networks designated as mixed can be enabled for continuous replication. A mixed network is any cluster network that is configured for both cluster (inter-node communication) and client access traffic. Cluster networks configured for cluster access, but not for client access (sometimes referred to as private networks) cannot be enabled for continuous replication.

Support for log shipping over a mixed network is configured using a new cmdlet called Enable-ContinuousReplicationHostName. Similarly, turning off this feature is accomplished using the Disable-ContinuousReplicationHostName cmdlet. After a CMS exists in a CCR environment, an administrator can run Enable-ContinuousReplicationHostName on both nodes of the cluster and specify two IP addresses and host names. After doing this, the system randomly selects a mixed network for log copying after successful configuration and upon confirming that the mixed network is operational.

For detailed steps about how to enable cluster networks for continuous replication activity, see How to Enable Redundant Cluster Networks for Log Shipping and Seeding on Windows Server 2008.

Note

In addition to the host name, IP address, and cluster group that is created on the failover cluster, each time you run the Enable-ContinuousReplicationHostName cmdlet, you are also creating a computer account in the Active Directory domain that contains the CMS. By default in Windows Server 2008, the maximum number of computer accounts that can be added by a user that has not been delegated domain administrator privileges and has not been granted the Create Computer Objects and Delete Computer Objects access control entries (ACEs) is 10. An Exchange administrator that frequently runs the Enable-ContinuousReplicationHostName and Disable-ContinuousReplicationHostName cmdlets and does not have domain administrator privileges or the aforementioned ACEs could reach the 10 account limit quickly. There are available workarounds for this issue, which are documented in Knowledge Base article 307532, How to troubleshoot the Cluster service account when it modifies computer objects. Additional information can be found in Knowledge Base article 251335, Domain Users Cannot Join Workstation or Server to a Domain.

Seeding and reseeding in a CCR environment is performed using the Update-StorageGroupCopy cmdlet. In Exchange 2007 SP1, this cmdlet has been extended to include a new parameter called DataHostNames. This parameter is used to specify which network should be used for seeding or reseeding. The value is a multiple valued list of two names: either a fully qualified domain name (FQDN) or a host name. One of these names must identify the passive node.

Tuning Failover Control Settings

CCR includes attributes that let you control the failover behavior of a CMS. You can configure these attributes by using the Set-MailboxServer cmdlet. These attributes are provided so that you can control the following two decision algorithms:

  • Algorithm 1   Algorithm 1 controls whether a database is mounted at failover time. At failover time, if the database is detected to have lost less than a configured amount of logs, it is automatically mounted. The acceptable number of lost logs can be configured using a value called AutoDatabaseMountDial. This parameter, which is represented in Active Directory by an Exchange Server attribute called msExchDataLossForAutoDatabaseMount, has three values: Lossless, Good Availability, and Best Availability. Lossless is zero logs lost; Good Availability is three logs lost; and Best Availability, which is the default, is six logs lost. For detailed steps about how to configure these values, see How to Tune Mount and Failover Settings for Cluster Continuous Replication.

  • Algorithm 2   Algorithm 2 lets you determine if it is more important to be online with old data than to be offline. If the database fails to mount based on algorithm 1, you can establish the time to do a second check. The time to wait is configured by the ForcedDatabaseMountAfter attribute. The value is in units of hours with a default of unlimited.

    Important

    When the value for ForcedDatabaseMountAfter is reached, the database will be mounted regardless of whether the storage group copy is 1 log behind, 10 logs behind, or 1,000 logs behind, which could result in significant data loss. For this reason, this parameter should not be used if service level agreements (SLAs) guarantee a maximum on the amount of data loss that can be incurred.

Tuning the Transport Dumpster

The transport dumpster is a feature of the Hub Transport server role that must be configured when using local continuous replication (LCR) or CCR, and this feature is used only by LCR and CCR environments. The transport dumpster submits recently delivered mail after an unscheduled outage. The transport dumpster should always be turned on when using LCR or CCR. The transport dumpster is enabled organization wide by setting the amount of storage available per storage group and setting the time to retain mail in the transport dumpster.

The Hub Transport server maintains a queue of mail that was recently delivered to a CMS. In the event of a failover that is not Lossless, CCR automatically requests every Hub Transport server in the site to resubmit mail from the transport dumpster queue. In LCR environments, the request for resubmission is performed as part of the Restore-StorageGroupCopy task. When resubmission occurs, the information store automatically deletes the duplicates and redelivers mail that was lost. You can use the Set-TransportConfig cmdlet to change the default configuration settings for the transport dumpster, which are applied at the storage group level. Alternatively, in Exchange 2007 SP1, you can also use the Exchange Management Console to configure the transport dumpster values.

We recommend that you configure the maximum size per storage group (the MaxDumpsterSizePerStorageGroup parameter) to a size that is 1.5 times the size of the maximum message that can be sent. For example, if the maximum size for messages is 10 megabytes (MB), you should configure the MaxDumpsterSizePerStorageGroup parameter with a value of 15 MB. For organizations without maximum message sizes, we recommend that you configure the maximum size per storage group with a value of 1.5 times the size of the average message size sent in the organization.

We also recommend configuring the maximum retention time per storage group (the MaxDumpsterTime parameter) to a value of 07.00:00:00, which is 7 days. This amount of time is sufficient to allow for an extended outage to occur without loss of e-mail. When using the transport dumpster feature, additional disk space is needed on the Hub Transport server to host the transport dumpster queues. The amount of storage space required is approximately equal to the value of MaxDumpsterSizePerStorageGroup multiplied by the number of storage groups on all Mailbox servers using continuous replication in the Active Directory site containing the Hub Transport server.

For detailed steps about how to enable and configure the transport dumpster, see How to Configure the Transport Dumpster.

Verifying the CCR Solution

After you complete the installation of a CCR solution, or after you make significant configuration changes, we recommend that you verify the health and status of the CMS, and that both nodes are correctly configured to support the CMS.

The recommended way to verify the health and status of the CMS is to run the Test-ReplicationHealth, Get-StorageGroupCopyStatus, and Get-ClusteredMailboxServerStatus cmdlets:

  • The Test-ReplicationHealth cmdlet is new in Exchange 2007 SP1. This cmdlet is designed for proactive monitoring of continuous replication and the continuous replication pipeline. The Test-ReplicationHealth cmdlet checks all aspects of replication, cluster services, and storage group replication and replay status to provide a complete overview of the replication system. For more information about the Test-ReplicationHealth cmdlet, see Test-ReplicationHealth.

  • The Get-StorageGroupCopyStatus cmdlet provides current replication status information for each storage group. For detailed steps about how to view the status of storage groups in a CCR environment, see How to View the Status of a Storage Group in a CCR Environment.

  • The Get-ClusteredMailboxServerStatus cmdlet provides basic operational status for the CMS. For detailed steps about how to obtain basic operational status for a CMS, see How to View the Status of a Clustered Mailbox Server.

The recommended way to verify that both nodes are able to bring the CMS online is to use the Move-ClusteredMailboxServer cmdlet to move the CMS to each node. In Exchange 2007 SP1, you can also use the Manage Clustered Mailbox Server wizard in the Exchange Management Console to move a CMS between nodes to verify that both nodes can bring the CMS online.