Planning for Cluster Continuous Replication

Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.

 

Applies to: Exchange Server 2007, Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3

Although deploying cluster continuous replication (CCR) is similar to deploying local continuous replication (LCR) and similar to deploying a single copy cluster (SCC), there are important differences that you must consider. There are general requirements that must be met for CCR, as well as hardware, software, networking, and cluster requirements that must be met.

General Requirements for Cluster Continuous Replication

Before you deploy CCR, make sure that the following system-wide requirements are met:

  • A single database per storage group must be used. When a storage group is created in a CCR environment, it can only contain a single database. This approach creates a more manageable Microsoft Exchange storage topology that increases recoverability.

  • Domain Name System (DNS) must be running. Ideally, the DNS server should accept dynamic updates. If the DNS server does not accept dynamic updates, you must create a DNS host (A) record for each clustered mailbox server and one for the cluster itself. Otherwise, Exchange does not function properly. For more information about how to configure DNS for Exchange, see Microsoft Knowledge Base article 322856, How to configure DNS to use with Exchange Server.

  • If your cluster nodes belong to a directory naming service zone that has a different name than the Active Directory directory service domain name that the computer joined, the DNSHostName property does not include the subdomain name by default. In this situation, you may need to change the DNSHostName property to ensure that some services, such as the File Replication Service (FRS), work correctly. For more information, see Knowledge Base article 240942, Active Directory DNSHostName property does not include subdomain.

  • All cluster nodes must be member servers in the same domain. Microsoft Exchange Server 2007 is not supported on nodes that are also Active Directory servers or nodes that are members of different Active Directory domains.

  • The cluster must be formed before installing Exchange 2007. For information about forming a Windows Server 2008 failover cluster, see Installing Cluster Continuous Replication on Windows Server 2008. For information about forming a Windows Server 2003 failover cluster, see Installing a Single Copy Cluster on Windows Server 2003.

  • The clustered mailbox server (CMS) names must be 15 characters or less.

  • The cluster in which Exchange 2007 is installed cannot contain Exchange Server 2003, Exchange 2000 Server, or any cluster-aware version of Microsoft SQL Server. Running Exchange 2007 in a cluster with any of these other applications is not supported. Running Exchange 2007 in a cluster with SQL Server 2005 Express Edition or another database application (such as Microsoft Office Access) is permitted, provided that the database application is not clustered.

  • Before you install Exchange 2007, make sure that the folder into which you install Exchange data is empty.

  • You must install the same version of Exchange 2007 on all nodes in the cluster that are configured as hosts of a clustered mailbox server. In addition, the operating system and the Exchange files must be installed on the same paths and drives for all nodes in the cluster. This requires that all computers have a similar, although not identical, disk configuration.

  • The Cluster service account must be a member of the local administrators group on each node that is capable of hosting a clustered mailbox server.

  • Do not install, create, or move any resources from the default cluster group to the resource group containing the clustered mailbox server. In addition, do not install, create, or move any resources from the group containing the clustered mailbox server to the default cluster group. The default cluster group should contain only the cluster IP Address, Network Name, and quorum resources. Moving or combining resources to or with the default cluster group is not supported.

    Important

    Clusters running previous versions of Exchange require a clustered instance of the Microsoft Distributed Transaction Coordinator (MSDTC). Exchange 2007 removes the requirement for the clustered MSDTC resource. Clustered mailbox servers in a CCR environment do not use and do not need the MSDTC resource installed in the failover cluster. Third-party applications might require an MSDTC resource because of COM+ dependencies. In Windows Server 2003, the MSDTC cluster resource requires the use of shared storage in the cluster. Adding shared storage to a CCR environment is not recommended. Windows Server 2008 provides a local, non-clustered MSDTC instance that removes the requirement for shared storage in a Windows Server 2008 failover cluster. For more information about MSDTC changes in Windows Server 2008, see Windows Server 2008 Help.

Hardware Requirements for Cluster Continuous Replication

For general hardware planning information, see Planning Processor Configurations and Planning Storage Configurations. The hardware requirements specific to CCR environments are as follows:

  • When using a Majority Node Set (MNS) quorum with the file share witness on Windows Server 2003, only two nodes can exist in the cluster. If one node or more than two nodes exist in the cluster, MNS quorum with file share witness feature cannot be used. Instead, a traditional MNS quorum must be used, which requires three or more nodes in the cluster.

  • When using the Node and File Share Majority quorum on Windows Server 2008, only two nodes can exist in the cluster. If one node or more than two nodes exist in the cluster, the Node and File Share Majority quorum cannot be used. Instead, a Node Majority quorum must be used, which requires three or more nodes in the cluster.

    Note

    We recommend using a two-node failover cluster that uses either the MNS quorum with file share witness or the Node and File Share Majority quorum. This eliminates the need to have a third voter node in the cluster.

  • The servers used must be listed in the Microsoft Windows Server Catalog of Tested Products for the operating system on which they will be installed. However, if shared storage is not used in the cluster, the servers do not need to be listed in the Cluster category.

  • The two servers that host the Mailbox server roles must be comparable, but not identical in:

    • CPU

    • Memory

    • Input/Output (I/O) capability

    • Networking

    • Vendor

    • Available disk storage, which includes space and I/O operations capabilities

Quorum Requirements for Cluster Continuous Replication

Generally, clustered applications are incognizant of the type of quorum being used by the cluster on which they are installed. When designing the quorum component for your CCR environment, be aware of the following recommendations and requirements:

  • On Windows Server 2008, a Node and File Share Majority quorum is the strongly recommended quorum type for CCR.

  • On Windows Server 2003, an MNS quorum with file share witness is the strongly recommended quorum type for CCR.

If either of the preceding quorum types is used for CCR, the nodes do not have to be listed in the Microsoft Windows Server Catalog of Tested Products.

If a shared storage quorum is used for CCR, the entire system must be listed in the Microsoft Windows Server Catalog of Tested Products.

In Exchange Server 2007 Service Pack 1 (SP1), Setup blocks two-node cluster configurations if a file share witness or File Share Majority is not configured. This is done because that configuration would not be able to handle losing a node in the cluster (because majority would not be maintained), which results in the cluster going offline.

Software Requirements for Cluster Continuous Replication

The software requirements for CCR environments are as follows:

  • Both nodes in the cluster must have the Windows Server 2008 Enterprise operating system or the Windows Server 2003 Enterprise Edition operating system installed on each node of the cluster using the same boot and system drive letters. You cannot have a cluster with one node running Windows Server 2008 and the other node running Windows Server 2003. Mixing operating system versions in a failover cluster is not supported.

  • If you are building a CCR environment using the release to manufacturing (RTM) version of Exchange 2007 on Windows Server 2003, both nodes in the failover cluster must have either Windows Server 2003 Service Pack 2 (SP2), or Windows Server 2003 SP1 and the hotfix from Knowledge Base article 921181, An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server clusters, installed. This hotfix is included in Windows Server 2003 SP2. If you are building a CCR environment using Exchange 2007 SP1 on Windows Server 2003, both nodes in the failover cluster must have Windows Server 2003 SP2 installed.

  • The cluster must either be a three-node cluster with a traditional MNS quorum, or a two-node cluster with an MNS quorum with file share witness. Generally, it is assumed that on Windows Server 2003, a two-node cluster using an MNS quorum with file share witness will be used, and that on Windows Server 2008, a two-node cluster with a Node and File Share Majority quorum will be used.

  • The file share witness for the MNS or the File Share Majority quorum does not need to be on a dedicated computer. It can be on any computer running Windows Server. However, we recommend that you host the file share witness on a Hub Transport server (or other Exchange server) to be under the control of the Exchange administrator.

  • Only the Mailbox server role can be installed in a cluster. No other server roles can be installed on a computer that is part of a failover cluster.

Network Requirements for Cluster Continuous Replication

It is important that the networks used for client and cluster communications are configured correctly. This section provides links to the procedures that are necessary to verify that your private and public network settings are configured correctly. In addition, you must make sure that the network connection order is configured correctly for the cluster. Consider the following when designing the network infrastructure for your CCR environment:

  • Each node must have at least two network adapters available for Windows Clustering. Clients and other servers only have to be able to access the nodes from one of the two network adapters. The other network adapters are used for intra-cluster communication. The recommended configuration is to have the private network dedicated to internal cluster communications and the public network designated as mixed.

  • The cluster public network should provide connectivity to other Exchange servers and other services, such as Active Directory and DNS. You can prevent this from being a single point of failure by using network adapter teaming or similar technology.

  • A separate cluster private network must be provided. The private network is used for the cluster heartbeat. The private network does not require DNS.

  • Heartbeat requirements may not be the most stringent public network bandwidth and latency requirement for a two datacenter configuration. You must evaluate the total network load, which includes client access, Active Directory, transport, continuous replication, and other application traffic, to determine the necessary network requirements for your environment.

  • We recommend that you use Gigabit Ethernet for CCR environments to maximize reseed time. For more information about why Gigabit Ethernet is recommended, see "Database Size and Cluster Continuous Replication" later in this topic.

  • In Exchange 2007 RTM, a resource group that contains a clustered mailbox server can only have one Network Name resource. Having more than one Network Name resource in a resource group that contains a clustered mailbox server is not supported in Exchange 2007 RTM. However, this limitation does not exist in Exchange 2007 SP1. When the clustered mailbox server has been upgraded to Exchange 2007 SP1, more than one Network Name resource can exist in the resource group that contains the clustered mailbox server.

Network Requirements for Installing CCR on Windows Server 2008

The network requirements for installing CCR on Windows Server 2008 are slightly different from the requirements for installing CCR on Windows Server 2003. Like Windows Server 2003, if you are installing CCR on Windows Server 2008, you must have sufficient IP addresses available for both nodes and for the clustered mailbox server (CMS). However, there are some additional options available on Windows Server 2008 that are not available in Windows Server 2003:

  • Cluster nodes can reside on different subnets. In Windows Server 2003, the network interface for each network on each node must be on the same subnet as the corresponding network on the other node. This requirement does not exist in Windows Server 2008. As a result, nodes within a failover cluster can communicate across network routers, and virtual LAN (VLAN) technology does not need to be used to connect the nodes.

  • When using multiple subnets in a CCR environment, DNS replication may affect a client's ability to reconnect to a CMS after a failover or handoff of the CMS between nodes has occurred. Clients and other servers that communicate with a clustered mailbox server that has changed IP addresses will not be able to reestablish communications with the clustered mailbox server until DNS has been updated with the new IP address, and any local DNS caches have been updated. To minimize the amount of time it takes to have the DNS changes known to clients and other servers, we recommend setting a DNS Time to Live (TTL) value of five minutes for the clustered mailbox server's Network Name resource. In most environments, we recommend setting the DNS TTL value only for the CMS Network Name resource. However, in environments with non-Exchange management tools that connect to the cluster by its name for management purposes, we recommend setting a lower TTL value on the cluster's Network Name resource. For detailed steps about how to configure the DNS TTL values for Network Name resources for use in a multiple subnet CMS or standby cluster deployment, see How to Configure the DNS TTL Value for a Clustered Mailbox Server Network Name Resource.

  • In Windows Server 2008 failover clustering, the capability exists where cluster IP Address resources can obtain their addressing from Dynamic Host Configuration Protocol (DHCP) servers, as well as via static entries. If the cluster nodes themselves are configured to obtain their IP addresses from a DHCP server, the default behavior will be to obtain an IP address automatically for all cluster IP Address resources. If the cluster node has statically assigned IP addresses, the cluster IP Address resources must be configured with static IP addresses as well. Thus, cluster IP Address resource IP address assignment follows the configuration of the physical node and each specific interface on the node. We do not recommend using DHCP for clustered mailbox servers. We recommend that you consider the following before using DHCP for a CMS:

    • The Cluster service will not bring online a DHCP-enabled IP Address resource if the IP address changes.

    • DHCP servers should be configured to grant an unlimited lease for all DHCP-assigned addresses used by clustered mailbox servers.

  • Windows Server 2008 and its Cluster service also support Internet Protocol version 6 (IPv6). This includes being able to support IPv6 IP Address resources and IPv4 IP Address resources either alone or in combination in a cluster. In addition, failover clusters also support Intra-site Automatic Tunneling Addressing Protocol (ISATAP), and they support only IPv6 addresses that allow for dynamic registration in DNS (AAAA host records and the IP6.ARPA reverse look-up zone). Using IPv6 addresses and IP address ranges is supported only when Exchange 2007 SP1 is deployed on a computer that is running Windows Server 2008, both IPv6 and IPv4 are enabled on that computer, and the network supports both IP address versions. If Exchange 2007 SP1 is deployed in this configuration, all server roles can send data to and receive data from devices, servers, and clients that use IPv6 addresses. A default installation of Windows Server 2008 enables support for IPv4 and IPv6. If Exchange 2007 SP1 is installed on Windows Server 2003, IPv6 addresses are not supported. For more information about Exchange 2007 SP1 support for IPv6 addresses, see IPv6 Support in Exchange 2007 SP1 and SP2.

Network Requirements for Installing CCR on Windows Server 2003

If you are installing CCR on Windows Server 2003, you must have a sufficient number of static IP addresses available when you create clustered mailbox servers in a two-node CCR environment. An IP address is needed for the cluster and for the clustered mailbox server. In addition, IP addresses are required for both the public and private networks on each node:

  • Private addresses   Each node requires one static IP address for each network adapter that will be used for the cluster private network. You must use static IP addresses that are not on the same subnet or network as one of the public networks. We recommend that you use 10.10.10.10 and 10.10.10.11 with a subnet mask of 255.255.255.0 as the private IP addresses for the two nodes, respectively. If your public network uses a 10.x.x.x network and a 255.255.255.0 subnet mask, we recommend that you use alternate private network IP addresses and subnet mask. If you configure more than one private network, unique addresses and subnets are required for each private network adapter and network.

  • Public addresses   Each node requires one static IP address for each network adapter that will be used for the cluster public network. Additionally, static IP addresses are required for the server cluster and the clustered mailbox server so that they can be accessed by clients and administrators. You must use static IP addresses that are not on the same subnet or network as one of the private networks.

The private network for all nodes in a cluster must be on the same subnet, but you can use VLAN switches on the interconnects between two nodes. If you use a VLAN, the point-to-point, round-trip latency must be less than 0.5 seconds. In addition, the link between two nodes must appear as a single point-to-point connection from the perspective of the Windows Server 2003 operating system running on the nodes. To avoid single points of failure, use independent VLAN hardware for the different paths between the nodes. The same subnet restriction does not apply to failover clusters running on Windows Server 2008.

The public networks for all nodes in a cluster must be on the same subnet, and they must use a subnet that is different from the subnet being used for the private networks. The same subnet restriction does not apply to failover clusters running on Windows Server 2008.

The cluster network connection order in Windows must be configured so that the public networks are at the top of the connection order list, and the network priority in the cluster must be configured with the private networks listed at the top of the priority order.

If you are installing CCR on Windows Server 2003 in a multiple datacenter configuration:

  • All networks used for client access must provide adequate bandwidth and sufficiently low latency to enable clients to access the clustered mailbox server from either datacenter.

  • All networks that are used to replicate transaction logs must provide adequate bandwidth and sufficiently low latency to copy the log files in a timely manner, so that whenever possible, there is no backlog of log files.

  • The networks used for the cluster heartbeat must be capable of sending and receiving a heartbeat packet within the required number of configured retries. If you are installing CCR on either SP2 for Windows Server 2003 or SP1 for Windows Server 2003 and the hotfix from Knowledge Base article 921181, An update is available that adds a file share witness feature and a configurable cluster heartbeats feature to Windows Server 2003 Service Pack 1-based server clusters, the lost interface heartbeat retries and lost node heartbeat retries are exposed as cluster configuration properties. If you are installing CCR on Windows Server 2008, this update is not needed. In either case, heartbeats are still sent every 1.2 seconds, but the cluster can be configured so that more misses must occur (whether from dropped packets, excessive latency, interface failure, or node failure) before any recovery action is taken. The property values are in units of missed heartbeats and not elapsed time. So, the cluster cannot be configured to suspect an interface failure after five seconds. It can be configured to suspect an interface failure after five misses, and depending on when in the heartbeat period the failure actually occurs, five misses will be approximately five to six seconds. Both of these settings have an allowed minimum of 2 seconds and an allowed maximum of 20 seconds.

Optimizing Windows 2003 Networking for CCR

When using CCR on Windows Server 2003, we recommend that you optimize your Windows Server TCP/IP settings for your specific network link's speed and latency. Specifically, you may need to adjust the Transmission Control Protocol (TCP) receive window size and Request for Comments (RFC) 1323 window scaling options on the active and passive nodes. In addition, you may find it beneficial to configure address resolution protocol (ARP) cache expiration settings and to disable the advanced TCP/IP options for the Windows Server 2003 Scalable Networking Pack (SNP) in the Windows registry.

In addition to these recommendations, if your environment includes the use of the IP Security (IPsec) protocol, we recommend that you configure IPsec consistently throughout your CCR environment. Either both nodes should use IPsec or neither node should use IPsec. If only one node is configured to use IPsec, the IPsec Security Association process can cause packet delay or packet loss.

TCP Receive Windows and RFC 1323 Scaling Options

The TCP receive window size is the maximum amount of data (in bytes) that can be received at one time on a connection. The sending computer can send only that maximum amount of data before waiting for an acknowledgment and a TCP window update from the receiving computer. It may be beneficial to adjust this setting to increase throughput during log shipping.

To optimize the TCP throughput, the sending computer should transmit enough packets to fill the pipe between the sender and receiver. The capacity of the network pipe is based on the pipe’s bandwidth and its latency (round-trip time). The higher the latency, the greater the capacity of the network pipe, because there is more time to send data between acknowledgements. By increasing the TCP window size, the system can take advantage of the time between acknowledgements by sending more data.

The TCP/IP standard allows for a receive window up to 65,535 octets in size, which is the maximum value that can be specified in the 16-bit TCP window size field. To improve performance on high-bandwidth, high-delay networks, Windows Server TCP/IP supports the ability to advertise receive window sizes larger than 65,535 octets, by using scalable windows as described in RFC 1323, TCP Extensions for High Performance. When using window scaling, hosts in a conversation can negotiate a window size that allows multiple large packets, such as those often used in file transfer protocols, to be pending in the receiver's buffers. RFC 1323 details a method for supporting larger receive window sizes by allowing TCP to negotiate a scaling factor for the window size at connection establishment.

You can optimize the TCP receive window size and RFC 1323 window scaling options on a computer running Windows Server 2003 by modifying two registry entries: TCPWindowSize and TCP1323Opts. For more information about these features, see Microsoft Knowledge Base article 224829, Description of Windows 2000 and Windows Server 2003 TCP Features.

We recommend that you use version 13 or later of the Exchange 2007 Mailbox Server Role Storage Requirements Calculator to determine the optimal settings for these registry entries based on your network link and network latency. You can download the calculator from the Exchange Team Blog here. The Storage Calculator also includes step-by-step instructions for entering the registry values on your servers.

Note

The content of each blog and its URL are subject to change without notice. The content within each blog is provided "AS IS" with no warranties, and confers no rights. Use of included script samples or code is subject to the terms specified in the Microsoft Terms of Use.

ARP Cache Expiration

The ARP cache is an in-memory table that maps IP addresses to media access control (MAC) addresses. Entries in the ARP cache are referenced each time that an outbound packet is sent to the IP address in the entry. By default, Windows Server 2003 adjusts the size of the ARP cache automatically to meet the needs of the system. If an entry is not used by any outgoing datagram for two minutes, the entry is removed from the ARP cache. Entries that are being referenced are removed from the ARP cache after ten minutes. Entries added manually are not removed from the cache automatically.

Internal testing by the Microsoft internal IT department showed that the default ARP cache expiration settings resulted in packet loss in CCR and SCR environments. When packet loss occurs, the sending server must transmit the lost data again. In a continuous replication environment, it is important for log files to be copied to the passive node as quickly as possible, and transmitting data again due to lost packets can adversely affect log shipping throughput.

You can modify the ArpCacheMinReferencedLife TCP/IP parameter in the Windows registry to control ARP cache expiration. This parameter determines how long referenced entries must remain in the ARP cache table before they can be deleted. Internally, Microsoft found that the optimal setting for the ArpCacheMinReferencedLife registry value was to use the same value being used for ARP cache expiration by the routers on the network, which was 4 hours.

Before modifying the value for ArpCacheMinReferencedLife in your own environment, we recommend using Microsoft Network Monitor or a similar capture tool to collect and analyze the network traffic on the network interface being used to copy logs from the active node to the passive node. For detailed steps to modify the ArpCacheMinReferencedLife registry value, see Appendix A: TCP/IP Configuration Parameters.

Scalable Networking Pack Advanced TCP/IP Features

The Windows Server 2003 Scalable Networking Pack (SNP) is a separate update for Windows Server 2003 that contains stateful and stateless offloads to accelerate the Windows network stack. The update includes TCP Chimney offload, Receive Side Scaling (RSS), and Network Direct Memory Access (NetDMA).

TCP Chimney is a stateful offload. TCP Chimney offload enables TCP/IP processing to be offloaded to network adapters that can handle the TCP/IP processing in hardware.

RSS and NetDMA are stateless offloads. Where multiple CPUs reside in a single computer, the Windows networking stack limits "receive" protocol processing to a single CPU. RSS resolves this issue by enabling the packets that are received from a network adapter to be balanced across multiple CPUs. NetDMA allows for a Direct Memory Access (DMA) engine on the Peripheral Component Interconnect (PCI) bus. The TCP/IP stack can use the DMA engine to copy data instead of interrupting the CPU to handle the copy operation. A related component, TCPA, is another offload function where a hardware DMA engine on the PCI bus can be used to assist receive processing.

These features can provide network performance benefits in some environments; however, there are some scenarios in which they cannot be used because of the use of other technologies. For example, TCP Chimney offload and NetDMA cannot be used if any of the following technologies are used:

  • Windows Firewall

  • Internet Protocol security (IPsec)

  • Internet Protocol Network Address Translation (IPNAT)

  • Third-party firewalls

  • NDIS 5.1 intermediate drivers

In addition, there are known issues in some environments, including environments with Microsoft Exchange, in which network performance can decrease when using these features. For details on some of these issues, see the Exchange Team blog post, Windows 2003 Scalable Networking pack and its possible effects on Exchange.

Note

The content of each blog and its URL are subject to change without notice. The content within each blog is provided "AS IS" with no warranties, and confers no rights. Use of included script samples or code is subject to the terms specified in the Microsoft Terms of Use.

We recommend that you disable all of the features in CCR environments that run on the Windows Server 2003 operating system for both the operating system and each network interface card (NIC) in the system. You can disable these features as follows:

For more information about the SNP, see Knowledge Base article 912222, The Microsoft Windows Server 2003 Scalable Networking Pack release, and the Scalable Networking Web site.

Outlook Behavior After Failover of Clustered Mailbox Server in a Multi-Subnet Failover Cluster

When a move or a failover occurs for a CMS deployed in a geographically-dispersed, multiple-subnet failover cluster, the name of the CMS is maintained. However, the IP address assigned to that name is not maintained. The availability of this server to clients and other servers depends on propagation of the new IP address throughout DNS. It may take some time for DNS propagation to occur. For this reason, we recommend configuring a Time to Live (TTL) value for the CMS DNS host record to 5 minutes (300 seconds). For detailed steps about how to configure the DNS TTL value for the CMS, see How to Configure the DNS TTL Value for a Clustered Mailbox Server Network Name Resource. After configuring the DNS TTL value for the CMS, you must stop and then start the CMS for the change to take effect.

Although internal Microsoft Office Outlook clients do not need new or reconfigured profiles to connect using the new IP address, they will need to wait for their local DNS cache to be cleared so that name resolution of the CMS name will move from its old IP address to its new IP address. After the IP address has been propagated to the appropriate DNS servers, the DNS cache on the Outlook clients can be cleared by running the following command at a command prompt on the client:

ipconfig /flushdns

The following sections illustrate Outlook's behavior in different configurations.

Stretched CCR on Windows Server 2003 (one subnet)

In this configuration, there is one Network Name resource and one IP address resource on which the Network Name resource is dependent. In DNS, the network name is associated with the IP address. All resources, including the IP address resource can move between the two nodes in the cluster. From Outlook’s perspective, no IP address change occurs since the only network change on failover is the association of the IP address to the machine MAC address, which is transparent to clients.

Stretched CCR on Windows Server 2008 (two subnets, assuming IPv4)

In this configuration, there is one Network Name resource and two IP addresses on which the Network Name is dependent, as a logical "OR." In DNS, the network name is associated with the current online IP address. During failover, as the Network Name resource comes online, the Cluster service updates the DNS entry for the Network Name with the second IP address, which corresponds to the other subnet. The record update has to propagate throughout DNS. From Outlook’s perspective, Outlook does not need a new or reconfigured profile, but it does need to wait for its local DNS cache to flush to allow the Network Name to resolve to the other IP address. This can be performed manually on the client by running:

IPConfig /flushdns

Local CCR with SCR in remote site (one or two subnets)

In this configuration, there is one Network Name resource and one IP address resource on which the Network Name is dependent. All resources, including the IP address, can move between the 2 nodes of the cluster. On a site failover in which the SCR target is activated by running Setup.com /recoverCMS, the CMS is moved to a different site/cluster. Upon running this command, you provide the IP address that should be associated with the Network Name in the remote site. Setup creates the Network Name and IP address resources, and the Cluster service updates DNS with the new IP address. The DNS update has to propagate throughout DNS. From Outlook’s perspective, Outlook does not need a new or reconfigured profile, but it does need to wait for its local DNS cache to flush to allow the Network Name to resolve to the other IP address. This can be performed manually on the client by running:

IPConfig /flushdns

Storage Requirements for Cluster Continuous Replication

CCR is designed to eliminate the need for shared storage in a Windows cluster. Shared storage was a requirement of previous versions of Exchange. The only storage requirements for CCR are sufficient performance and capacity from Windows-supported storage.

CCR does not place additional I/O considerations on the storage used by storage groups and databases. When you design your CCR storage solution, we recommend that you follow these best practices:

  • The location of the storage groups and databases must be identical on all cluster nodes.

  • Store the database files and transaction log files on different logical unit numbers (LUNs).

  • Use NTFS file system volume mount points to surface the volumes to the operating system.

  • Use recognizable names that can be directly and obviously tied to the hosted storage group or database. If different volumes are used for logs and databases, the paths should identify the type of data. This approach can help prevent human errors as the number of databases and storage groups increases. If the default installation is performed, the storage group and databases are created under the install location of Exchange 2007.

    Note

    Exchange 2007 does not support placing transaction logs or database files in the root of a volume.

A CCR environment requires storage that provides adequate performance and capacity. Equivalent storage for performance and capacity of the system should be configured on both nodes using the same location (drive letter and paths) for each storage group and database.

Database Size and Cluster Continuous Replication

The first line of defense for catastrophic storage failure or physical database corruption with CCR is to revert to the passive copy of the data and not restore from backup. This makes it much less important to have short recovery time objectives (RTOs) based on restoring from archive or tape. Instead of restoring from tape, you activate the passive copy of a database and the data is available to clients in minutes as opposed to hours. In this sense, CCR can be considered a fast recovery mechanism, putting it in the same category as hardware-based snapshots and clones created using the Volume Shadow Copy Service (VSS) in Exchange Server 2003.

It is not uncommon for an administrator to have to perform offline database operations, such as repairs, because of bad backups (for example, a tape is bad or a restore fails). With CCR, this scenario is avoided and there is much less chance of having to run a repair against a database. Although the percentage of situations in which repair is necessary should decrease dramatically, there will still be times when it will be necessary. Be sure to consider your tolerance for worst case downtime when deciding on database size.

CCR enables you to have longer online maintenance windows. Because CCR allows you to make a backup from the passive copy of a storage group, you can extend your online maintenance window on the active cluster node. In many cases, you can double the online maintenance window, which in turn allows you to have larger mailboxes and databases.

Another feature of Exchange 2007, called lost log resilience (LLR), drastically reduces the occurrence of database inconsistency due to lost logs. Generally, the most common reason an administrator repairs a database is to bring it into a consistent state when required logs have been lost or corrupted, thereby preventing the database from mounting. LLR provides resiliency for many of these lost and corrupted log scenarios, enabling a database to be mounted without having to run repair. For more information about LLR, see Lost Log Resilience and Transaction Log Activity in Exchange 2007.

At this point, it might appear as though continuous replication enables you to grow your databases as large as you like without risk. However, that is not the case. Online maintenance that completes in a reasonable amount of time per database is still a limiting factor on database size. But with CCR, the possibility of needing to reseed databases is also a limiting factor. CCR provides database redundancy so that if the active copy of a database is lost or corrupted, recovery can be accomplished quickly by activating the passive copy of the database. CCR provides automatic activation through the process known as failover.

After failover occurs, there remains only one copy of the database—the new active copy. Because the passive copy no longer exists, database resiliency may be compromised. However, you should still have your backup. To enable resiliency again, the lost or corrupted database needs to be removed, and a new passive copy of the database needs to be created and reseeded from the active copy. Depending upon the size of your database, this could take a long time. The worst case scenario is the loss or corruption of all active copies, where all passive copies have to be reseeded. This scenario is one of the reasons why we recommend Gigabit Ethernet for CCR environments.

In a CCR environment, you should expect to see the following rates over Gigabit Ethernet where there are no disk or processor bottlenecks:

  • Single database reseed: approximately 25 megabytes (MB) per second

  • Multiple database reseed (in Parallel): approximately 100 MB per second (limited by network bandwidth)

A larger maximum database size is possible when continuous replication is used. We recommend the following maximum database sizes for Exchange 2007:

  • Databases hosted on a Mailbox server without continuous replication: 100 gigabytes (GB)

  • Databases hosted on a Mailbox server with continuous replication and Gigabit Ethernet: 200 GB

    Note

    Large databases may also require newer storage technology for increased bandwidth to accommodate repair scenarios.

    Important

    The true maximum size for your databases should be dictated by the service level agreement (SLA) in place at your organization. Determining the largest size database that can be backed up and restored within the period specified in your organization's SLA is how you determine the maximum size for your databases.

Active Directory Requirements for Cluster Continuous Replication

CCR has all the same requirements of the Active Directory infrastructure that a stand-alone server has plus additional requirements. In a multiple datacenter solution, both datacenters must have adequate Active Directory infrastructure support because, at any time, either datacenter could be hosting the clustered mailbox server. This capacity needs to be present even if the other datacenters are not available. Additionally, all nodes in the cluster must be in the same domain and the Cluster service account must have the appropriate permissions.

Note

Mailbox servers in a geographically dispersed cluster require that a single Active Directory site be stretched between the datacenters because all nodes in the cluster must be members of the same site. However, there is no requirement that any other servers in both datacenters be on the same subnet or in the same Active Directory site.

Service Account Requirements for Cluster Continuous Replication

If you are installing CCR on Windows Server 2008, the Cluster service account runs under the LocalSystem (SYSTEM) account.

If you are installing CCR on Windows Server 2003, you must use a domain account for the Cluster service account. All nodes in the cluster must be members of the same domain, and all nodes in the cluster must use the same Cluster service account. The Cluster service account must also be a member of the local administrators group on each node that is capable of hosting a clustered mailbox server.

The Cluster service account is responsible for creating and maintaining the computer account identified by and associated with the failover cluster's Network Name resource when that resource is brought online. To ensure that the Cluster service account has the appropriate permissions, see Knowledge Base article 307532, How to troubleshoot the Cluster service account when it modifies computer objects. Additional information can be found in Knowledge Base article 251335, Domain Users Cannot Join Workstation or Server to a Domain.

Cluster Continuous Replication and Public Folder Databases

CCR and public folder replication are two very different forms of replication built into Exchange. Due to interoperability limitations between continuous replication and public folder replication, if more than one Mailbox server in the Exchange organization has a public folder database, public folder replication is enabled and public folder databases should not be hosted in CCR environments.

The following are the recommended configurations for using public folder databases and CCR in your Exchange organization:

  • If you have a single Mailbox server in your Exchange organization and that Mailbox server is a clustered mailbox server in a CCR environment, the Mailbox server can host a public folder database. In this configuration, there is a single public folder database in the Exchange organization. Thus, public folder replication is disabled. In this scenario, public folder database redundancy is achieved using CCR; CCR maintains two copies of your public folder database.

  • If you have multiple Mailbox servers you can host a public folder database in a CCR environment provided that there is only one public folder database in the entire Exchange organization. In this scenario, public folder database redundancy is also achieved by using CCR. In this configuration, there is a single public folder database in the Exchange organization. Thus, public folder replication is disabled.

  • If you are migrating public folder data into a CCR environment, you can use public folder replication to move the contents of a public folder database from a stand-alone Mailbox server or a clustered mailbox server in an SCC to a clustered mailbox server in a CCR environment. After you create the public folder database in a CCR environment, the additional public folder databases should only be present until your public folder data has fully replicated to the CCR environment. When replication has completed successfully, all public folder databases outside of the CCR environment should be removed, and you should not host any other public folder databases in the Exchange organization.

  • If you are migrating public folder data out of a CCR environment, you can use public folder replication to move the contents of a public folder database from a clustered mailbox server in a CCR environment to a stand-alone Mailbox server or a clustered mailbox server in an SCC. After you create the additional public folder database outside of the CCR environment, the public folder database in the CCR environment should only be present until your public folder data has fully replicated to the additional public folder databases. When replication has completed successfully, all public folder databases inside of all CCR environments should be removed and all subsequent public folder databases should not be hosted in storage groups that are enabled for continuous replication.

During any period where more than one public folder database exists in the Exchange organization and one or more public folder databases are hosted in a CCR environment (such as the migration scenarios described previously), consider the differences in behavior for scheduled (Lossless) and unscheduled (lossy) outages:

  • If a successful scheduled Lossless outage occurs, the public folder database will come online and public folder replication should continue as expected.

  • If an unscheduled outage occurs, the public folder database will not come online until the original server is available and all logs for the storage group hosting the public folder database are available. If any data is lost as a result of the outage, CCR will not allow the public folder database to come online when public folder replication is enabled. In this event, the original node must be brought online to ensure no data loss, or the public folder database must be re-created on the clustered mailbox server in the CCR environment and its content must be recovered using public folder replication from public folder databases that are outside the CCR environment.

Backup and Restore and Cluster Continuous Replication

Exchange-aware backups are supported for both production and copy storage groups and databases using VSS technology. Streaming backups are only supported from the active node.

Note

A common task during Exchange-aware backups is the truncation of transaction log files after the backup has completed successfully. The replication feature in CCR guarantees that logs that have not been replicated are not deleted. The implication of this behavior is that running backups in a mode that deletes logs may not actually free space if replication is sufficiently far behind in its log copying.

Exchange-aware restores to the active copy can either be done using streaming or VSS backup solutions. Exchange-aware restores are not supported for the passive copy.

Note

Before you perform a restore, you should remove all storage group and database files from the passive storage group copy.

After restoring a database from backup into a storage group in a CCR environment, you must suspend and then resume continuous replication for the storage group using Suspend-StorageGroupCopy and Resume-StorageGroupCopy, respectively. This process is needed to update the Microsoft Exchange Replication Service with the correct log generation information. If continuous replication is not suspended and resumed, the Microsoft Exchange Replication Service will have outdated log generation information and will stop replicating log files.

Online Maintenance Database Checksumming and Database Page Zeroing in Exchange 2007 SP1

Checksumming is the process of checking the integrity of the database. Page scrubbing is the process of zeroing out databases at the end of a streaming backup. Exchange 2007 RTM checksums an entire database when an online full streaming backup of a database is taken. As mentioned previously, in a continuous replication environment, streaming backups can only be taken against the active copy of a database. You cannot make a streaming backup of a passive copy of a database. VSS can be used to take full snapshots or make full clones of a passive copy, and full snapshots and clones can also be checksummed. But typically, in a continuous replication environment, only one of the database copies (either the active or the passive) can be checksummed without administrator intervention and some down time. This is because:

  • It is burdensome to make streaming backups of an active copy of a database, and also make VSS backups of the passive copy of the same database.

  • Although VSS can be used for both active and passive database copies, doing so is contrary to the recommendation to offload backup operations from the active copy to the passive copy.

  • Resilience can be temporarily compromised because manually performing integrity checks using Exchange Server Database Utilities (Eseutil) requires the suspension of continuous replication.

To enable page scrubbing and database checksumming on all database copies without experiencing or having to work around the issues described earlier, Exchange 2007 SP1 introduces two new features: Online Maintenance Database Checksumming and Online Maintenance Database Page Zeroing. These features enable an administrator to turn on both background page scrubbing and background checksumming of a database. You can enable each of these features separately or in tandem by manually configuring registry values on the Mailbox server containing the databases to be scanned and then restarting the Microsoft Exchange Information Store service. The registry values are configured at the Microsoft Exchange Information Store level. Thus, after enabling, all databases on the Mailbox server perform the configured background activity. The available registry entries are described later in this topic.

Warning

Incorrectly editing the registry can cause serious problems that may require you to reinstall your operating system. Problems resulting from editing the registry incorrectly may not be able to be resolved. Before editing the registry, back up any valuable data.

Enable Online Maintenance Database Checksumming

Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem

Name: Online Maintenance Checksum

Type: REG_DWORD

Value: 0x00000001

Enable Online Maintenance Database Page Zeroing

Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem

Name: Zero Database Pages During Checksum

Type: REG_DWORD

Value: 0x00000001

Throttling Online Maintenance Database Checksumming

Location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSExchangeIS\ParametersSystem

Name: Throttle Checksum

Type: REG_DWORD

Value: 0x00000000 (milliseconds)