Installing Cluster Continuous Replication on Windows Server 2003

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007, Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3

Before deploying cluster continuous replication (CCR), we recommend that you thoroughly review Cluster Continuous Replication. In addition, make sure that you meet all of the requirements specified in Planning for Cluster Continuous Replication. Installation of a CCR environment on Windows Server 2003 occurs in several different phases:

  1. Configuring hardware setup, starting with the cluster network formation and configuration.

  2. Forming the cluster, beginning with the first node and then the second.

  3. Configuring and securing the file share witness, and configuring the cluster networks and tolerance for missed cluster heartbeats.

  4. Installing the active and passive Mailbox server roles into the cluster. The clustered mailbox server (CMS) is created during the installation of the active Mailbox server role.

    Note

    We recommend that you complete each phase before you start the next phase. After you complete all phases, we recommend that you verify the CCR solution before putting it into production.

There are some post-installation tasks for the CMS that should be performed as well:

  • Tuning failover control settings.

  • Tuning the default configuration of the transport dumpster.

  • Verifying the ability to move a CMS between the nodes in the cluster.

  • Enabling one or more mixed networks for log shipping and seeding.

The following sections explain each of the installation phases in more detail.

Network Formation and Configuration

You must have a sufficient number of static IP addresses available when you create CMSs in a two-node CCR environment. IP addresses are required for both the public and private networks, and all IP addresses for each cluster network must be on the same subnet. Requirements related to private and public addresses are as follows:

  • Private addresses   Each node requires one static IP address for each network adapter that is used for the cluster private network. You must use static IP addresses that are not on the same subnet or network as one of the public networks. We recommend that you use 10.10.10.10 and 10.10.10.11 with a subnet mask of 255.255.255.0 as the private IP addresses for the nodes.

  • Public addresses   Each node requires one static IP address for each network adapter that is used for the cluster public network. Additionally, static IP addresses are required for the failover cluster and for the CMS so that they can be accessed by clients and administrators. You must use static IP addresses that are not on the same subnet or network as one of the private networks.

Network Best Practices for Clustered Mailbox Servers

We also recommend that you follow these best practices for your cluster network:

  • Use meaningful names   Building a cluster gives you many opportunities to use meaningful names for cluster nodes, cluster network interfaces, the cluster name, and CMS names. For example, the network used to communicate with other Exchange servers and clients can be called Public. The network used to communicate between the cluster nodes can be called Private. Use names that can be related to each other without having to review a topology map. Another useful convention is to relate the nodes of a cluster to the name of the CMS. For example, use mbx01, mbx01-node1, and mbx01-node2 for the CMS and the two nodes, respectively.

  • Use private IP addresses for the private network interfaces    For a list of IP address ranges and subnet masks that can be used for the private network interfaces, see the following table.

    Address ranges and subnet masks for private network interfaces

    Network IP address range Subnet mask

    Private 1

    10.10.10.10-255

    255.255.255.0

    Private 2

    10.10.10.11-255

    255.255.255.0

Note the following:

  • If your public network uses a 10.x.x.x network and 255.255.255.0 subnet mask, we recommend that you use alternate private network IP addresses and a subnet mask.

  • We do not recommend the use of any type of fault-tolerant adapter or teaming for the private network. If you require redundancy for your private network, use multiple network adapters set to Internal Communication Only and define their network priority in the cluster configuration. It is important to verify that your firmware and driver are at the most current revision if you use this technology. Contact your network adapter manufacturer for information about compatibility on a server cluster. For more information about network adapter teaming in server cluster deployments, see Microsoft Knowledge Base article 254101, Network adapter teaming and server clustering.

To configure the networks in the cluster for use with a Microsoft Exchange Server 2007 CCR solution, configure the public and private networks by following the steps that are described in How to Configure Network Connections for Cluster Continuous Replication.

Forming the Failover Cluster

A failover cluster is formed when the first node is added to the cluster. This process gives the cluster a unique network name and a unique network IP address. The network name and IP address, which collectively are the cluster's network identity, move between nodes in the cluster as nodes go online and offline. Generally, the cluster's network identity is rarely used in the administration of a CMS.

If you are familiar with deploying failover clusters or Exchange clusters from previous versions, you will find deployment of a cluster for CCR quite different. If you are new to cluster solutions, you will find deployment to be much less complex than typical cluster configurations.

You can build a new cluster using the instructions in How to Create a Windows Server 2003 Failover Cluster for Cluster Continuous Replication. This procedure includes graphical user interface and command-line interface instructions for forming the failover cluster, adding the second node to the failover cluster, and configuring the cluster to use a Majority Node Set (MNS) quorum.

Note

CCR on Windows Server 2003 requires a quorum model called the MNS quorum with file share witness. This quorum model is available in Windows Server 2003 Service Pack 2 (SP2), which is required for Exchange 2007 Service Pack 1 (SP1). To use the MNS quorum with file share witness with the release to manufacturing (RTM) version of Exchange 2007 and Windows Server 2003 SP1, you must install a hotfix on each node prior to deploying CCR. The hotfix is described in Knowledge Base article 921181, An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Microsoft Windows Server 2003 Service Pack 1-based server clusters. For detailed steps about how to install the hotfix, see How to Install the Majority Node Set File Share Witness Feature.

Post-Installation Configuration of the Failover Cluster

After the failover cluster has been formed with both nodes and configured with an MNS quorum, there are some post-installation tasks that must be performed prior to installing Exchange on either node. You must configure the cluster networks, tolerance for missed cluster heartbeats, and the file share witness component of the MNS quorum.

Configuring the Cluster Networks

After both nodes have been added to the cluster, the cluster networking components need to be configured. Specifically, the cluster networks, cluster network priority, and tolerance settings for missed cluster heartbeats need to be configured. The following table details the available options for configuring cluster networks.

Options for configuring cluster networks

Option Description

Client access only (public network)

Select this option if you want the Cluster service to use this network adapter only for external communication with other clients. No inter-node communication or cluster database update traffic will take place on this network adapter.

Internal cluster communications only (private network)

Select this option if you want the Cluster service to use this network only for the cluster inter-node communication and cluster database update traffic.

All communications (mixed network)

Select this option if you want the Cluster service to use the network adapter for the cluster inter-node communication and cluster database update traffic, and for communication with external clients. This option is selected by default for all networks.

CMSs deployed in a CCR environment require at least two network cards in both nodes to be supported. In a CCR environment, we recommend configuring one network as a private network and configuring the other network as a mixed network. If one network is configured as a private network and the other network is configured as a public network, the private network represents a single point of failure for the CMS.

For detailed steps about how to configure the cluster networking components, see How to Configure the Cluster Networking Components and Priority.

Configuring Tolerance Settings for Missed Cluster Heartbeats

After cluster communications and network priority have been configured, we recommend that you configure specific tolerance settings for missed cluster heartbeats. Doing so configures the Cluster service monitoring of network connectivity between cluster nodes to be tolerant of minor interruptions. This prevents failovers in some cases where the network outage is brief. We recommend that you configure private and mixed cluster networks on both nodes to account for ten missed heartbeats. This setting level corresponds to approximately 12 seconds.

For detailed steps about how to configure Cluster service tolerance for missed heartbeats, see How to Configure Tolerance Settings for Missed Cluster Heartbeats.

Configuring the File Share Witness

After the cluster has been formed and configured, the file share witness must be configured. CCR uses the file share witness on a third computer to avoid an occurrence of network partition within the cluster, also known as split brain syndrome. Split brain syndrome in a CCR environment occurs when:

  • All networks designated to carry internal cluster communications fail.

  • Both nodes cannot receive heartbeat signals from each other.

  • Both nodes become the active node by bringing, or attempting to bring, the CMS online.

The file share for the file share witness can be hosted on any server running the Microsoft Windows operating system. However, we recommend that you use a Hub Transport server in the Active Directory directory service site containing the cluster nodes to host it. A Hub Transport server is recommended to ensure that an Exchange administrator has full authority and control over the share. For detailed steps about how to configure the file share for use as the file share witness, see How to Configure the File Share Witness.

Clustered Mailbox Server Installation and Configuration

You can install the Mailbox server role on a cluster by performing a few steps on each node. After the cluster has been formed and validated and after the cluster has been configured to use the MNS quorum with file share witness, you should first install the Mailbox server role on the active node. Installing the active node is a process that installs the Mailbox server role on the node, and then creates a CMS on the node.

For detailed steps about how to install the Mailbox server role on the active node, see How to Install the Active Clustered Mailbox Role in a CCR Environment on Windows Server 2003.

Note

If you are installing the active node on a computer running Windows Server 2003 that is not located in the same Active Directory site as the domain controller assigned the primary domain controller (PDC) role, you must first create a computer account with the intended name for the CMS. The computer account must be enabled, and the computer object must be available in the local Active Directory site. If a computer account for the CMS does not exist and the PDC is not in the local Active Directory site, Setup will not continue.

After you install the Mailbox server role on the active node, we recommend that you verify that the configuration of the first storage group's database and transaction logs is as you planned. You may need to move them before you proceed with the second node. By default, the initial storage group and database are placed in %ProgramFiles%\Microsoft\Exchange Server\Mailbox\First Storage Group.

For detailed steps about how to configure the first storage group for a cluster, see How to Move a Storage Group and Its Database in a CCR Environment.

After you have installed the Mailbox server role and a CMS on the active node and verified the first storage group's configuration, you should install the Mailbox server role on the passive node. Installing the passive node is a process that installs the Mailbox server role on the node. For detailed steps about how to install the Mailbox server role on the passive node, see How to Install the Passive Clustered Mailbox Role in a CCR Environment on Windows Server 2003.

Post-Setup Tasks

After the Mailbox server role has been installed on both nodes and a CMS has been created, you should perform some post-setup tasks. These tasks include:

  • Tuning failover control settings.

  • Tuning the default configuration of the transport dumpster.

  • Verifying the ability to move a CMS between the nodes in the cluster.

  • Enabling multiple networks for continuous replication activity.

Tuning Failover Control Settings

CCR includes attributes that let you control the failover behavior of a CMS. You can configure these attributes by using the Set-MailboxServer cmdlet. These attributes are provided so that you can control the following two decision algorithms:

  • Algorithm 1   Algorithm 1 controls whether a database is mounted at failover time. At failover time, if the database is detected to have lost less than a configured amount of logs, it is automatically mounted. The acceptable number of lost logs can be configured using a value called AutoDatabaseMountDial. This parameter, which is represented in Active Directory by an Exchange Server attribute called msExchDataLossForAutoDatabaseMount, has three values: Lossless, Good Availability, and Best Availability. Lossless is zero logs lost; Good Availability is three logs lost; and Best Availability, which is the default, is six logs lost. When configuring the system for Good Availability or Best Availability, do not use spaces. For example, use GoodAvailability and BestAvailability.

  • Algorithm 2   Algorithm 2 lets you determine if it is more important to be online with old data than to be offline. If the database fails to mount based on algorithm 1, you can establish the time to do a second check. The time to wait is configured by the ForcedDatabaseMountAfter attribute. The value is in units of hours with a default of unlimited.

    Important

    When the value for ForcedDatabaseMountAfter is reached, the database will be mounted regardless of whether the storage group copy is 1 log behind, 10 logs behind, or 1,000 logs behind, which could result in significant data loss. For this reason, this parameter should not be used if service level agreements (SLAs) guarantee a maximum on the amount of data loss that can be incurred.

For more information about tuning failover, see How to Tune Failover and Mount Settings for Cluster Continuous Replication.

Tuning the Transport Dumpster

The transport dumpster is a feature of the Hub Transport server role that submits recently delivered mail after an unscheduled outage. The transport dumpster should always be turned on when using CCR or local continuous replication (LCR). The transport dumpster is enabled organization wide by setting the amount of storage available per storage group and setting the time to retain mail in the transport dumpster.

The Hub Transport server maintains a queue of mail that was recently delivered to a CMS. In the event of a failover that is not Lossless, CCR automatically requests every Hub Transport server in the site to resubmit mail from the transport dumpster queue. The information store automatically deletes the duplicates and redelivers mail that was lost. You can use the Exchange Management Console, or the Set-TransportConfig cmdlet in the Exchange Management Shell to change the default configuration settings for the transport dumpster, which are applied at the storage group level.

We recommend configuring the MaxDumpsterSizePerStorageGroup parameter, which specifies the maximum size of the transport dumpster queue for each storage group, to a size that is 1.5 times the size of the maximum message that can be sent. For example, if the maximum size for messages is 10 megabytes (MB), you should configure the MaxDumpsterSizePerStorageGroup parameter with a value of 15 MB. We also recommend configuring the MaxDumpsterTime parameter, which specifies how long an e-mail message should remain in the transport dumpster queue, to a value of 7.00:00:00, which is seven days. This should be sufficient time to allow for an extended outage to occur without loss of e-mail messages. When using the transport dumpster feature, additional disk space will be needed on the Hub Transport server to host the transport dumpster queues. The amount of storage space required is approximately equal to the value of MaxDumpsterSizePerStorageGroup multiplied by the number of storage groups on all CMSs in a CCR environment and all LCR-enabled storage groups in the Active Directory site containing the Hub Transport server.

For detailed steps about how to enable and configure the transport dumpster, see How to Configure the Transport Dumpster.

Verifying the CCR Solution

After you complete the installation of a CCR solution, or after you make significant configuration changes, we recommend that you verify the health and status of the CMS, and that both nodes are correctly configured to support the CMS.

The recommended way to verify the health and status of the CMS is to run the Get-StorageGroupCopyStatus and Get-ClusteredMailboxServerStatus cmdlets:

The recommended way to verify that both nodes are able to bring the CMS online is to use the Move-ClusteredMailboxServer cmdlet to move the CMS to each node.

Enabling Multiple Networks for Continuous Replication Activity

In the RTM version of Exchange 2007, all log file copying and seeding occurs over the public network. In Exchange 2007 SP1, any redundant cluster network configured as a mixed network can be enabled for continuous replication activity. This activity includes storage group seeding and reseeding, and log shipping.

In Exchange 2007 SP1, only cluster networks designated as mixed can be enabled for continuous replication. A mixed network is any cluster network that is configured for both cluster (inter-node communication) and client access traffic. Cluster networks configured for cluster access, but not for client access (sometimes referred to as private networks) cannot be enabled for continuous replication.

Support for log shipping over a mixed network is configured using the Enable-ContinuousReplicationHostName cmdlet. Similarly, turning off this feature is accomplished using the Disable-ContinuousReplicationHostName cmdlet. After a CMS exists in a CCR environment, an administrator can run Enable-ContinuousReplicationHostName on both nodes of the cluster and specify two IP addresses and host names. After doing this, the system randomly selects a mixed network for log copying after successful configuration and upon confirming that the mixed network is operational.

For detailed steps about how to enable cluster networks for continuous replication activity, see How to Enable Redundant Cluster Networks for Log Shipping and Seeding on Windows Server 2003.

Note

In addition to the host name, IP address, and cluster group that is created on the failover cluster, each time you run the Enable-ContinuousReplicationHostName cmdlet, you are also creating a computer account in the Active Directory domain that contains the CMS. By default, in Windows Server 2003, the maximum number of computer accounts that can be added by a user who has not been delegated domain administrator privileges and has not been granted the Create Computer Objects and Delete Computer Objects access control entries (ACEs) is 10. An Exchange administrator who frequently runs the Enable-ContinuousReplicationHostName and Disable-ContinuousReplicationHostName cmdlets and does not have domain administrator privileges or the aforementioned ACEs could reach the 10 account limit quickly. There are available workarounds for this issue, which are documented in Knowledge Base article 307532, How to troubleshoot the Cluster service account when it modifies computer objects. Additional information can be found in Knowledge Base article 251335, Domain Users Cannot Join Workstation or Server to a Domain.

Seeding and reseeding in a CCR environment is performed using the Update-StorageGroupCopy cmdlet. In Exchange 2007 SP1, this cmdlet has been extended to include a new parameter called DataHostNames. This parameter is used to specify which network should be used for seeding or reseeding. The value is a multiple valued list of two names: either a fully qualified domain name (FQDN) or a host name. One of these names must identify the passive node.