Understanding Exchange Server 2003 Clustering

 

Windows Clustering technologies can help you achieve scalability, availability, reliability, and fault tolerance for your Exchange 2003 organization. A cluster consists of individual computers (also called nodes) that function cohesively in a Cluster service. These computers act as network service providers or as reserve computers that assume the responsibilities of failed nodes. Depending on how you configure your cluster, clustering can simplify the process of recovering a single server from disasters.

Note

The clustering solution described in this topic (Windows Clustering) is not supported on front-end servers. Front-end servers should be stand-alone servers, or should be load balanced using Windows Server 2003 Network Load Balancing (NLB). For information about NLB and front-end and back-end server configurations, see "Ensuring Reliable Access to Exchange Front-End Servers" in System-Level Fault Tolerant Measures.

In a clustering environment, Exchange runs as a virtual server (not as a stand-alone server) because any node in a cluster can assume control of a virtual server. If the node running the EVS experiences problems, the EVS goes offline for a brief period until another node takes control of the EVS. All recommendations for Exchange clustering are for active/passive configurations. For information about active/passive and active/active cluster configurations, see "Cluster Configurations" later in this topic.

A recommended configuration for your Exchange 2003 cluster is a four-node cluster comprised of three active nodes and one passive node. Each of the active nodes contains one EVS. This configuration is cost-effective because it allows you to run three active Exchange servers, while maintaining the failover security provided by one passive server.

Recommended configuration of a four-node Exchange cluster

dffb0365-e309-4ecf-aebd-18180cd7410f

Note

All four nodes of this cluster are running Windows Server 2003, Enterprise Edition and Exchange 2003 Enterprise Edition. For information about the hardware, network, and storage configuration of this example, see "Four-Node Cluster Scenario" in the Exchange Server 2003 Deployment Guide.

This section discusses the following aspects of Exchange 2003 clustering:

  • Windows Clustering

  • Exchange Virtual Servers

  • Quorum disk resource

  • Cluster configurations

  • Windows and Exchange version requirements

  • Example of a two-node cluster topology

  • Understanding failovers

  • IP addresses and network names

Windows Clustering

To create Exchange 2003 clusters, you must use Windows Clustering. Windows Clustering is a feature of Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition. The Windows Cluster service controls all aspects of Windows Clustering. When you run Exchange 2003 Setup on a Windows Server 2003 cluster node, the cluster-aware version of Exchange is automatically installed. Exchange 2003 uses the following Windows Clustering features:

  • Shared nothing architecture   Exchange 2003 back-end clusters require the use of a shared-nothing architecture. In a shared-nothing architecture, although all nodes in the cluster can access shared storage, they cannot access the same disk resource of that shared storage simultaneously. For example, in Figure 5.3, if Node 1 has ownership of a disk resource, no other node in the cluster can access the disk resource until it takes over the ownership of the disk resource.

  • Resource DLL   Windows communicates with resources in a cluster by using a resource DLL. To communicate with Cluster service, Exchange 2003 provides its own custom resource DLL (Exres.dll). Communication between the Cluster service and Exchange 2003 is customized to provide all Windows Clustering functionality. For information about Exres.dll, see Microsoft Knowledge Base article 810860, "XGEN: Architecture of the Exchange Resource Dynamic Link Library (Exres.dll)."

  • **Groups   **To contain EVSs in a cluster, Exchange 2003 uses Windows cluster groups. An EVS in a cluster is a Windows cluster group containing cluster resources, such as an IP address and the Exchange 2003 System Attendant.

  • Resources   EVSs include Cluster service resources, such as IP address resources, network name resources, and physical disk resources. EVSs also include their own Exchange-specific resources. After you add the Exchange System Attendant Instance resource (an Exchange-specific resource) to a Windows cluster group, Exchange automatically creates the other essential Exchange-related resources, such as the Exchange HTTP Virtual Server Instance, the Exchange Information Store Instance, and the Exchange MS Search Instance.

Exchange Virtual Servers

To create an Exchange 2003 cluster, you create a Windows Server 2003 cluster group and then add specific resources to that group. Exchange 2003 clusters create logical servers referred to as Exchange Virtual Servers (EVSs). Unlike a stand-alone (non-clustered) Exchange 2003 server, an EVS is a cluster group that can be failed over if the primary server running the EVS fails. When one cluster node fails, one of the remaining nodes assumes the responsibilities of the failed EVS. To access this new server, clients can use the same server name.

An EVS is a cluster group that requires, at a minimum, the following resources:

  • Static IP address.

  • Network name.

  • One or more physical disks for shared storage.

  • An Exchange 2003 System Attendant resource. (The System Attendant resource installs other required Exchange resources.)

The following figure illustrates Exchange 2003 cluster resources and the resource dependencies.

Exchange 2003 cluster resources and dependencies

8f6aceaa-2b4e-4bfe-b911-376ebf8e56e2

Note

In Exchange 2003, when you create a new EVS, the IMAP4 and POP3 resources are not automatically created. For more information about IMAP4 and POP3 resources, see "Managing Exchange Clusters," in the Exchange Server 2003 Administration Guide.

Client computers connect to an EVS the same way they connect to a stand-alone Exchange 2003 server. Windows Server 2003 provides the IP address resource, the Network Name resource, and disk resources associated with the EVS. Exchange 2003 provides the System Attendant resource and other required resources. When you create the System Attendant resource, all other required and dependant resources are created.

The following table lists the Exchange 2003 cluster resources and their dependencies.

Exchange 2003 cluster resources and dependencies

Resource Description Dependency

System Attendant

System Attendant is the fundamental resource that controls the creation and deletion of all the resources in the EVS.

Network Name resource and shared disk resources

Exchange store

Provides mailbox and public folder storage for Exchange.

System Attendant

SMTP

Handles relay and delivery of e-mail messages.

System Attendant

IMAP4

Optional resource that provides access to e-mail messages for IMAP4 clients.

System Attendant

POP3

Optional resource that provides access to e-mail messages for POP3 clients.

System Attendant

HTTP

Provides access to Exchange mailboxes and public folders by means of HTTP (for example, Microsoft Office Outlook® Web Access 2003).

System Attendant

Exchange MS Search Instance

Provides content indexing for the EVS.

System Attendant

Message transfer agent (MTA)

There can be only one MTA per cluster. The MTA is created on the first EVS. All additional EVSs are dependent on this MTA. The MTA is responsible for communication with an X.400 system and for interoperation with Exchange 5.5.

System Attendant

Routing service

Builds the link state tables.

System Attendant

Exchange 2003 clusters do not support the following Windows and Exchange 2003 components:

  • Active Directory Connector (ADC)

  • Exchange 2003 Calendar Connector

  • Exchange Connector for Lotus Notes

  • Exchange Connector for Novell GroupWise

  • Microsoft Exchange Event service

  • Site Replication Service (SRS)

  • Network News Transfer Protocol (NNTP)

    Note

    The NNTP service, a subcomponent of the Windows Server 2003 Internet Information Services (IIS) component, is still a required prerequisite for installing Exchange 2003 in a cluster. After you install Exchange 2003 in a cluster, the NNTP service is not functional.

Cluster Groups

When you configure an Exchange cluster, you must create groups to manage the cluster, as well as the EVSs in the cluster. Moreover, you can independently configure each EVS. When creating cluster groups, consider the following recommendations:

  • The Microsoft Distributed Transaction Coordinator (MSDTC) resource is required for Exchange Server Setup and Service Pack Setup. On a cluster that is dedicated to Exchange, it is recommended that the MSDTC resource be added to the default Cluster Group. It is further recommended that the 'Affect the Group' option be unchecked for the MSDTC resource. This prevents a failure of the MSDTC resource from affecting the default cluster group.

  • For information about adding the MSDTC resource in Windows Server 2003, see Microsoft Knowledge Base article 301600, "How to Configure Microsoft Distributed Transaction Coordinator on a Windows Server 2003 Cluster." Microsoft Knowledge Base article 301600 includes a reference to article 817064, "How to enable network DTC access in Windows Server 2003." It is an Exchange Server security best practice to not enable network DTC access for an Exchange cluster. If you are configuring the Distributed Transaction Coordinator resource for an Exchange cluster, do not enable network DTC access.

  • To provide fault tolerance for the cluster, do not add any other applications or cluster resource to the default cluster group other than the MSDTC resource, and do not use the quorum volume for anything other than the cluster quorum and MSDTC resource.

  • Assign each group its own set of Physical Disk resources. This allows the transaction log files and the database files to fail over to another node simultaneously.

  • Use separate physical disks to store transaction log files and database files. Separate hard disks prevent the failure of a single spindle from affecting more than one group. This recommendation is also relevant for Exchange stand-alone servers.

For more information about storage considerations for server clusters, see "Cluster Storage Solutions" in Planning Considerations for Clustering.

Quorum Disk Resource

The most important resource in the cluster is the quorum disk resource. The quorum disk resource maintains configuration data for the cluster, including the quorum log, the cluster database checkpoint, and the resource checkpoints. The quorum resource also provides persistent physical storage across system failures. If you are running Windows Server 2003, you can select from the following quorum types:

Note

If you are running Windows 2000, you must use the standard quorum.

  • Standard quorum (also known as a single quorum)   With a standard quorum, the quorum disk resource data is hosted on a shared physical disk resource that is accessible by all cluster nodes. When using a standard quorum, because the cluster configuration data is kept on the quorum disk resource, all cluster nodes must be able to communicate with the node that currently owns it.

  • Majority node set quorum   With a majority node set quorum, the quorum data is stored locally on the system disk of each cluster node. The Majority Node Set resource makes sure that the cluster configuration data stored on the majority node set quorum is kept consistent across the disks.

The following figure illustrates a standard quorum disk and a majority node set quorum for a four-node cluster.

A standard quorum and majority node set quorum

0882b5a2-660a-4e9f-a5fc-2216085ef55c

When a cluster is created or when network communication between nodes in a cluster fails, the quorum disk resource prevents the nodes from forming multiple clusters. To form a cluster, a node must arbitrate for and gain ownership of the quorum disk resource. For example, if a node cannot detect a cluster during the discovery process, the node attempts to form its own cluster by taking control of the quorum disk resource. However, if the node does not succeed in taking control of the quorum disk resource, it cannot form a cluster.

The quorum disk resource stores the most current version of the cluster configuration database in the form of recovery logs and registry checkpoint files. These files contain cluster configuration and state data for each individual node. When a node joins or forms a cluster, the Cluster service updates the node's individual copy of the configuration database. When a node joins an existing cluster, the Cluster service retrieves the configuration data from the other active nodes.

The Cluster service uses the quorum disk resource recovery logs to:

  • Guarantee that only one set of active, communicating nodes can operate as a cluster.

  • Enable a node to form a cluster only if it can gain control of the quorum disk resource.

  • Allow a node to join or remain in an existing cluster only if it can communicate with the node that controls the quorum disk resource.

Note

You should create new cluster groups for EVSs, and no EVS should be created in the cluster group with the quorum disk resource.

When selecting the type of quorum for your Exchange cluster, consider the advantages and disadvantages of each type of quorum. For example, to keep a standard quorum running, you must protect the quorum disk resource located on the shared disk. For this reason, it is recommended that you use a RAID solution for the quorum disk resource. Moreover, to keep a majority node set cluster running, a majority of the nodes must be online. Specifically, you must use the following equation:

<Number of nodes configured in the cluster>/2 + 1.

For detailed information about selecting a quorum type for your cluster, see "Choosing a Cluster Model" in the Windows Server 2003 Deployment Kit.

Cluster Configurations

With the clustering process, a group of independent nodes works together as a single system. Each cluster node has individual memory, processors, network adapters, and local hard disks for operating system and application files, but shares a common storage medium. A separate private network, used only for cluster communication between the nodes, can connect these servers.

In general, for each cluster node, it is recommended that you use identical hardware (for example, identical processors, identical network interface cards, and the same amount of RAM). This practice helps ensure that users experience a consistent level of performance when accessing their mailboxes on a back-end server, regardless of whether the EVS that is providing access is running on a primary or a stand-by node. For more information about the benefits of using standardized hardware on your servers, see "Standardized Hardware" in Component-Level Fault Tolerant Measures.

Note

Depending on roles of each cluster node, you may consider using different types of hardware (for example, processors, RAM, and hard disks) for the passive nodes of your cluster. An example is if you have an advanced deployment solution that uses the passive cluster nodes to perform your backup operations. For information about how you can implement different types of hardware on your cluster nodes, see Messaging Backup and Restore at Microsoft.

The following sections discuss Exchange 2003 cluster configurations—specifically, active/passive and active/active configurations. Active/passive clustering is the recommended cluster configuration for Exchange. In an active/passive configuration, no cluster node runs more than one EVS at a time. In addition, active/passive clustering provides more cluster nodes than there are EVSs.

Note

Before you configure your Exchange 2003 clusters, you must determine the level of availability expected for your users. After you make this determination, configure your hardware in accordance with the Exchange 2003 cluster that best meets your needs.

Active/Passive Clustering

Active/passive clustering is the strongly recommended cluster configuration for Exchange. In active/passive clustering, an Exchange cluster includes up to eight nodes and can host a maximum of seven EVSs. (Each active node runs an EVS.) All active/passive clusters must have one or more passive nodes. A passive node is a server that has Exchange installed and is configured to run an EVS, but remains on stand-by until a failure occurs.

In active/passive clustering, when one of the EVSs experiences a failure (or is taken offline), a passive node in the cluster takes ownership of the EVS that was running on the failed node. Depending on the current load of the failed node, the EVS usually fails over to another node after a few minutes. As a result, the Exchange resources on your cluster are unavailable to users for only a brief period of time.

In an active/passive cluster, such as the 3-active/1-passive cluster illustrated in the following figure, there are three EVSs: EVS1, EVS2, and EVS3. This configuration can handle a single-node failure. For example, if Node 3 fails, Node 1 still owns EVS1, Node 2 still owns EVS2, and Node 4 takes ownership of EVS3 with all of the storage groups mounted after the failure. However, if a second node fails while Node 3 is still unavailable, the EVS associated with the second failed node remains in a failed state because there is no stand-by node available for failover.

Effect of failures on an active/passive cluster

2db1742e-821c-4815-81d9-9155ac632f76

Active/Active Clustering

Active/active is a strongly discouraged cluster configuration for Exchange. When using an active/active configuration for your Exchange clusters, you are limited to two nodes. If you want more than two nodes, one node must be passive. For example, if you add a node to a two-node active/active cluster, Exchange does not allow you to create a third EVS. In addition, after you install the third node, no cluster node will be able to run more than one EVS at a time.

Important

Regardless of which version of Windows you are running, Exchange 2003 and Exchange 2000 do not support active/active clustering with more than two nodes. For more information, see Microsoft Knowledge Base article 329208 "Exchange virtual server limitations on Exchange 2000 clusters and Exchange 2003 clusters that have more than two nodes."

In an active/active cluster, there are only two EVSs: EVS1 and EVS2. This configuration can handle a single node failure and still maintain 100 percent availability after the failure occurs. For example, if Node 2 fails, Node 1, which currently owns EVS1, also takes ownership of EVS2, with all of the storage groups mounted. However, if Node 1 fails while Node 2 is still unavailable, the entire cluster is in a failed state because no nodes are available for failover.

Effect of failures on an active/active cluster

9480ff6e-58af-487e-a55a-b6703ed418f9

If you decide to implement active/active clustering, you must comply with the following requirements:

  • Scalability requirements   To allow for efficient performance after failover, and to help ensure that a single node of the active/active cluster can bring the second EVS online, you should make sure that the number of concurrent MAPI user connections on each active node does not exceed 1,900. In addition, you should make sure that the average CPU utilization by the Microsoft Exchange Information Store (store.exe) on each node does not exceed 40 percent.

    For detailed information about how to size the EVSs running in an active/active cluster, as well as how to monitor an active/active configuration, see "Performance and Scalability Considerations" in Planning Considerations for Clustering.

  • Storage group requirements   As with stand-alone Exchange servers, each Exchange cluster node is limited to four storage groups. In the event of a failover, for a single node of an active/active cluster to mount all of the storage groups within the cluster, you cannot have more than four total storage groups in the entire cluster.

    For more information about this limitation, see "Storage Group Limitations" in Planning Considerations for Clustering.

Because of the scalability limitations of active/active Exchange clusters, it is recommended that you deploy active/passive Exchange clusters. Active/active clusters are not recommended under any circumstances.

Example of a Two-Node Cluster Topology

Although a typical cluster topology includes more than two nodes, an easy way to explain the differences between active/passive and active/active clusters is to illustrate a simple two-node cluster topology.

In this example, both cluster nodes are members of the same domain, and both nodes are connected to the public network and a private cluster network. The physical disk resource is the shared disk in the cluster. If only one cluster node owns one EVS, the cluster is active/passive. If both nodes own one or more EVSs, or if either node owns two EVSs, the cluster is active/active.

Example of a two-node Exchange cluster

b6efd180-e762-4980-a447-b97f2cf71ab2

Windows and Exchange Edition Requirements

To create Exchange clusters, specific editions of Windows and Exchange are required. The following table lists these requirements.

Windows and Exchange edition requirements

Windows editions Exchange editions Cluster nodes available

Windows Server 2003, Enterprise Edition

Exchange Server 2003 Enterprise Edition

Up to eight

Windows Server 2003, Datacenter Edition

Exchange Server 2003 Enterprise Edition

Up to eight

Windows Server 2003 or Windows Server 2000

Exchange Server 2003 Standard Edition

None

Windows Server 2003, Standard Edition or Windows 2000 Server

Exchange Server 2003 Standard Edition or Exchange Server 2003 Enterprise Edition

None

Windows 2000 Advanced Server

Exchange Server 2003 Enterprise Edition

Up to two

Windows 2000 Datacenter Server

Exchange Server 2003 Enterprise Edition

Up to four

Note

In active/passive clustering, you can have up to eight nodes in a cluster, and it is required that each cluster have one or more passive nodes. In active/active clustering, you can have a maximum of two nodes in a cluster. For more information about the differences between active/active and active/passive clustering, see "Cluster Configurations" earlier in this topic.

Understanding Failovers

As part of your cluster deployment planning process, you should understand how the failover process works. There are two scenarios for failover: planned and unplanned.

In a planned failover:

  1. The Exchange administrator uses the Cluster service to move the EVS to another node.

  2. All EVS resources go offline.

  3. The resources move to the node specified by the Exchange administrator.

  4. All EVS resources go online.

In an unplanned failover:

  1. One (or several) of the EVS resources fails.

  2. During the next IsAlive check, Resource Monitor discovers the resource failure.

  3. The Cluster service automatically takes all dependent resources offline.

  4. If the failed resource is configured to restart (default setting), the Cluster service attempts to restart the failed resource and all its dependent resources.

  5. If the resource fails again:

    • Cluster service tries to restart the resource again.

    -or-

    • If the resource is configured to affect the group (default), and the resource has failed a certain number of times (default=3) within a configured time period (default=900 seconds), the Cluster service takes all resources in the EVS offline.
  6. All resources are failed over (moved) to another cluster node. If specified, this is the next node in the Preferred Owners list. For more information about configuring a preferred owner for a resource, see "Specifying Preferred Owners" in the Exchange Server 2003 Administrations Guide.

  7. The Cluster service attempts to bring all resources of the EVS online on the new node.

  8. If the same or another resource fails again on the new node, the Cluster service repeats the previous steps and may need to fail over to yet another node (or back to the original node).

  9. If the EVS keeps failing over, the Cluster service fails over the EVS a maximum number of times (default=10) within a specified time period (default=6 hours). After this time, the EVS stays in a failed state.

  10. If failback is configured (default=turned off), the Cluster service either moves the EVS back to the original node immediately when the original node becomes available or at a specified time of day if the original node is available again, depending on the group configuration.

IP Addresses and Network Names

A typical cluster installation includes a public network that client computers use to connect to EVSs and a private network for cluster node communication. To make sure that you have sufficient static IP addresses available, consider the following requirements:

  • Each cluster node has two static IP addresses (the public and private network connection IP addresses of each node) and one NetBIOS name.

  • The cluster itself has a static IP address and a NetBIOS name.

  • Each EVS has a static IP address and a NetBIOS name.

It is recommended that an <n>-node cluster with <e> EVSs use 2×n + e + 1 IP addresses. The +1 in this equation assumes that the quorum disk resource and the MSDTC resource will both be located in the default cluster group. For more information about these recommendations, see "Cluster Groups" earlier in this topic.

For a two-node cluster, the recommended number of static IP addresses is five plus the number of EVSs. For a four-node cluster, the recommended number is nine plus the number of EVSs.

Important

It is recommended that you use static IP addresses in any cluster deployment. Using Dynamic Host Configuration Protocol (DHCP) prevents client computers from connecting to the cluster. If the DHCP server fails to renew the IP lease, the entire cluster may fail. It is also recommended that you use a private network for cluster communication. A public network connection failure on one node prevents the cluster nodes from communicating with each other. As a result, the failure blocks affected resources from failing over and may even cause the entire cluster to fail.

The following figure provides an example of the IP addresses and other components required in a four-node Exchange cluster configuration.

Example of IP addresses in a four-node Exchange cluster

d8bdeb51-6fde-4301-8b96-d61e139aafac