Geographically Dispersed Clusters in Exchange Server 2003


Topic Last Modified: 2005-09-12

Geographically dispersed clusters provide high availability to data access. Often, the costs of implementing a geographically dispersed clustering solution are steep enough to make customers think twice and go hunting for a different solution that provides the added data and uptime security they want. However, for those who require the additional redundancy of a fault tolerant installation, geographically dispersed clusters are a great solution.

By answering some frequently asked questions, this article will help you understand the basics of geographically dispersed clusters, including performance and testing recommendations. This article will also discuss Microsoft-supported solutions and provide you with valuable resources where you can learn more about geographically dispersed clusters.

In order to implement a geographically dispersed cluster, you must work with a third-party vendor. As a result, the information in this article may contradict that of your third-party vendor.

Geographically dispersed clusters, also called stretched clusters or extended clusters, are clusters comprised of nodes that are placed in different physical sites. Geographically dispersed clusters are designed to provide failover in the event of a site loss due to power issues, natural disasters or other unforeseen events.

To truly understand how geographically dispersed clustering works with Microsoft® Exchange Server 2003, you must first understand the basic requirements for a geographically dispersed cluster solution. A geographically dispersed cluster is a combination of hardware and software. In other words, a geographically dispersed cluster is a combination of pieces supplied by different vendors. Due to the complex nature of these configurations and the configuration restrictions that are fundamental to Microsoft Windows® Clustering technology, geographically dispersed clusters should be deployed only in conjunction with vendors who provide qualified configurations. To run Exchange Server 2003 in a geographically dispersed cluster solution, the solution must meet the following criteria:

  • The nodes must share a common disk subsystem.

  • The nodes must live on the same local area network (LAN)/subnet.

  • The nodes must have a network heartbeat with low latency, less the 500ms.

For geographically dispersed clusters, the above criteria are met by replicating data at the disk level and creating a virtual LAN (VLAN) that allows the same subnet to exist over wide area network (WAN) links.

Through the use of a majority node set quorum, Windows Clustering allows you to build a cluster without a common shared disk array and a common shared quorum disk. Exchange Server supports this method of clustering. However, the solution must be listed on the Geographically Dispersed Cluster Solution section of the Windows Server Catalog. Additionally, only solutions that use synchronous replication are supported.

There are many types of replication technologies that can support a geographically dispersed cluster solution. All of these technologies use one of two types of replication: synchronous or asynchronous.

In terms of a geographically dispersed Exchange cluster, the most important difference between the two replication types is that synchronous replication is currently supported and asynchronous replication is not. Beyond that, the main differences between the two replication types involve the following:

  • How and when data blocks are written to the local and remote sites

  • How those writes are reported back the operating system

  • How synchronized the data is on the local and remote disks at any given time

With synchronous replication, if an application performs an operation on a node at one site, that operation will not complete until the change is made at the other sites.

For example, consider the case of synchronous, block-level replication. If an application at Site A writes a block of data to a disk that is mirrored to Site B, then the I/O operation will not complete until the change is made to both the disk on Site A and the disk on Site B. This is due to the method by which the replication software allows the write operation to communicate with the operating system. The replication software does not report back to the operating system that the write has been completed until the write has been committed at both Sites A and B.

With asynchronous replication, if a change is made to the data on Site A, that change will eventually make it to site B. I say "eventually" because, with asynchronous replication, the operating system and replication software do not wait for an acknowledgement from the remote site that the write has been performed.

Using the same example as above, if an application at Site A writes a block of data to a disk that is mirrored to Site B, then the I/O operation will complete as soon as the change is made to the disk at Site A. In a separate process, the replication software transfers the change to Site B and, eventually, the change is made at Site B.

If a failover occurs at a time when the data does not match due to a write delay, asynchronous replication can lead to an unsuccessful failover. Since the writes at the remote site are not synchronized with the writes at the local site, there is no way to know for sure that the data is consistent between both sites. In terms of a geographically dispersed clustering solution, there is no failover with asynchronous replication. Instead, users must bring a standby server online and mount the replicated data at the failover site.

The reason Exchange Server does not support asynchronous replication is because the replication software controls the write order and, therefore, the solution provider should support it. Moreover, with asynchronous replication, data corruption is likely to occur because Exchange has write-ordering dependency.

Inconsistent data between two sites using asynchronous replication is unavoidable. Because of this, Microsoft only supports Exchange Server in a synchronous replication scenario that uses hardware listed on the Geographically Dispersed Cluster Solution section of the Windows Server Catalog.

To learn more about replication methods, check out the following resources:

In a geographically dispersed cluster installation, the added cost of data replication is the main deviation from the standard testing and monitoring needs of an Exchange Server installation. Because of this, you should pay extra attention to disk latency. Information about disk latency testing is already well documented. For detailed information about disk performance settings and testing procedures, see Optimizing Storage for Exchange Server 2003.

Some geographically dispersed clusters require a fiber optic connection between the primary and backup sites, which can impose a constraint on the distance permitted between sites. Other solutions are able to encapsulate the data in an IP packet and can run over an Ethernet or frame connection. No matter what the connection is, you can expect that the further the distance between sites, the greater the latency will be. In turn, this increased latency equates to a less useable and supportable solution.

In terms of a geographically dispersed solution, another expense of greater latency is the number of users that can be supported on a single server. As latency increases, the number of supported uses per server decreases. As a result, this increases the amount of servers you need to support the environment, thereby increasing the total cost of the Exchange installation.

Most certified solutions support a very specific scenario. One component of this scenario is a maximum supported distance between sites. As the distance increases, the number of available solutions is going to diminish.

Aside from deciding on which solution is best for your Exchange organization, there are many other requirements you should consider when researching a geographically dispersed cluster solution. For example, be sure that you've considered the following:

  • Ask yourself, "Will I need to replicate more than just Exchange data?" For example, will you also need to replicate file servers?

  • The LAN and WAN networks must be fault tolerant enough so that, if the primary site goes down, your client computers can communicate seamlessly with the new site.

  • At the failover location, you must have an adequate number of domain controllers and global catalog servers. These domain controllers and global catalog servers must be able to support the volume of requests required for Exchange servers and client computers to function at acceptable levels. These levels should be defined in a service level agreement (SLA).

  • The hardware in your failover location must be able to support your user requests at an acceptable level. This level should be documented in an SLA.

  • The replication link from the failover site to the primary site must have sufficient bandwidth and be fast enough to support the data replication requirements established by your e-mail volume.

  • It would be beneficial if you had staff that could provide end-to-end support for your solution.

Depending on your specific implementation, there may be more requirements to consider. For example, a Web service hosted on a different server that requires Exchange, support for virtual private network (VPN) remote users, replication of a data archiving solution, and so forth.

After researching all the appropriate parts and service requirements, be sure that you familiarize yourself with the supportability guidelines from Microsoft. The following is a list of system requirements for geographically dispersed Exchange clusters:

Microsoft strongly recommends that customers obtain assurances from their replication solution vendors on the following issues:

  • Is the solution in the category of a geographically dispersed clustering solution?

  • Will the replication solution prevent all possibilities of data loss short of simultaneous outage at all sites?

  • What are the procedures for performing a failover and fail back?

  • Can the replication solution and expected latency handle the planned Exchange user load and provide a quality client experience?

This article provided an overview of geographically dispersed clustering in Exchange Server 2003, but there's a lot of detailed documentation out there that can help you decide if this solution is right for you. Check them out:


Community Additions