Deployment Guidelines for Exchange Server Multi-Site Data Replication
Topic Last Modified: 2006-09-01
Replication technology can provide high availability for Microsoft® Exchange Server data. This topic is intended to help you better understand replication storage technology, and how it is used in an Exchange Server environment.
Replication supports high availability by having redundant data at multiple sites, but it does not prevent data corruption from occurring. If any bad data is written to the primary storage device that causes database corruption, the same bad data will be replicated to the remote sites and corrupt the remote sites' databases. Therefore, data replication does not substitute for database maintenance processes such as database backup that validates the database integrity periodically.
In this topic, the type of data that is discussed is the data accessed by running Exchange services, for example, any write I/O request made by an Exchange process. System/OS data replication is not discussed here.
Microsoft has support policies for various types of replication solutions. For details about these support policies, see Microsoft Knowledge Base article 895847 "Multi-site data replication support for Exchange 2003 and Exchange 2000."
|Download Deployment Guidelines for Microsoft Exchange Server Multi-Site Data Replication to print or read offline.|
The purpose of using data replication is to maintain current replicas of the data at remote sites. Exchange servers can use the replicas at the remote site to provide continuity of e-mail service in the event of a storage or site outage in the primary location. Data replication can be propagated in a synchronous or asynchronous fashion. By definition, when data is replicated synchronously, hosts will only get a write complete response from the storage when the I/O writes are committed in both the local and remote locations. In other words, both the local and the remote storage must implement the change before the write is acknowledged to the host as having succeeded. In asynchronous mode, the host will get a write complete from the storage when the write has committed to the local storage, without having to wait for an acknowledgement from the remote storage that it has also updated the replica.
In general, the host application is not as sensitive in terms of increased write latency to the replication distance when using asynchronous replication as it would be in a synchronous replication mode. However, you should be aware of the following issues when you deploy asynchronous replication:
Data Loss Depending on the frequency of data replication, data changes at the remote site may lag the changes made at the primary site. In the case of a primary site outage, the remote site replica will not be completely current. Although this delay is configurable on most storage solutions, you should be aware of the potential for data loss due to this behavior.
Data Integrity (Write Order Preservation) Exchange has write order dependencies between the database and its associated transaction logs. Exchange always writes changes to the log files first, before committing those changes to the database files. When in synchronous replication mode, the application controls the write ordering. However, in asynchronous mode, the replication solution controls when to replicate the data. If the solution does not maintain the write ordering during replication, it could potentially corrupt the database files and prevent the databases from mounting when disaster occurs at the primary site.
Performance Impact Many vendors claim that their asynchronous solutions do not affect storage performance, but in reality, there will be a performance degradation when running asynchronous replication. Depending on the implementation of the solution, there is no single number to describe the performance expectations. Therefore, customers should well test the solution before deployment, and the goal is to verify the solution can provide adequate storage performance for Exchange users.
Some solution providers use various technologies to address the write order preservation issue. To successfully deploy an asynchronously replicated solution, the customer must work with the vendor to ensure their asynchronous technology meets the following requirements:
It can maintain the write-order consistency of all devices in a storage group, including being continuously consistent with each other;
It has been proven to be recoverable, preferably in both a lab and a production environment;
It is being provided by a vendor with a support plan in place for the replicated data.
The main concern for synchronous replication in Exchange deployment is related to performance. Tests have shown that the client experience is closely tied to write latency. With a synchronous replication solution, the number of mailboxes that can be hosted per Exchange server is reduced. The performance impact largely depends on the replication distance, replication link bandwidth, and utilization. Synchronous replication can cause as much as a 75 percent reduction in mailboxes/server scalability. You should consider this scalability reduction factor when you are working on your Exchange capacity planning methodology. For more information, see "Deployment Planning for Synchronous Replication" later in this topic.
Synchronous replication has commonly been considered a solution that ensures no data loss because the replicas are completely synchronized with the primary storage data files. However, contrary to this common belief, there are scenarios in which synchronous replication solutions can lose data. The following example illustrates such a scenario.
Generally, storage data replication solutions handle a replication link failure in one of the two following ways:
Continue to commit write I/O to the primary storage device only, record all the changes made to all the devices which use the replication link in a log file, and store the log in the primary storage.
Fail the write operation, so that the application handles the failure as if the disk has failed.
If the replication solution uses the first handling method, data loss is possible. During a link failure condition followed shortly by a primary site failure, data which has been committed after the link failure will not get replicated; therefore, it will be lost along with the primary storage failure. When you design the storage replication solution, keep in mind these types of failure conditions, so that you can build the system to reduce such occurrences. In this example, the customer might want to consider deploying redundant replication links to reduce the chance of data loss.
Synchronous replication solutions are classified by where the replication occurs, either host-based replication or storage-based replication. Host-based replication generally uses host-based software (filter driver) to interrupt the I/O stream to manage the replication. Storage-based replication occurs off-host at the storage device level. Both replication solutions can be deployed as part of either of the following:
Geographically Dispersed Clustering (Geocluster) In this category, the nodes that belong to the same cluster are placed in different sites. Generally, Exchange servers are actively hosted by the nodes at the primary site. Solutions provide synchronous replication of the Exchange data to the remote site(s). In the case of a primary site disaster, the Exchange virtual servers failover to the passive nodes at the remote site and come online using the replicated Exchange data.
The Microsoft Windows® Server Catalog has a category for geographically dispersed cluster solutions. You can search the Windows Hardware Qualified Labs (WHQL)-qualified geographically dispersed cluster solutions at http://go.microsoft.com/fwlink/?LinkId=28572.
Others This category includes all other types of synchronous replication deployments which do not use geographically dispersed clustering. These solutions rely on some other means for making use of the replicated Exchange data at the remote site(s) in the event of a primary site failure (for example, a standby solution; replication coupled with disaster recovery processes).
Microsoft strongly recommends that customers obtain assurances from their replication solution vendors on the following issues:
Is the solution in the category of a geographically dispersed clustering solution? If so, is it WHQL-certified? If it is not such a solution, is the storage device listed on a solution outlined in the "Cluster Solutions, Geographically Dispersed Clustering Solution" section of the Windows Server Catalog?
Will the replication solution prevent all possibility of data loss short of simultaneous outage at all sites?
What are the procedures for performing a failover and fail back?
Can the replication solution and expected latency handle the planned Exchange user load and provide a quality client experience?
Exchange is a data-centric server application. Replicating Exchange data to a secondary site provides for redundancy in the case of storage-related failures. It is a business decision to determine exactly what kind of data to replicate. You should evaluate your business tolerance for losing the various types of data described here.
The following data must be replicated:
Exchange mailbox database files store message data. Each database consists of two files:
Database file (.edb), which holds messages and MAPI content.
Streaming file (.stm), which holds non-MAPI, native content.
Transaction log files (.log), which record each transaction that is committed to the database.
Checkpoint files (.chk), which contain the information on the entries in the log files which have been written to the disk.
All these files are vital to providing client access to their mailbox server and to soft recovery of database changes that are held in memory and lost if an Exchange server’s stores are not shut down cleanly, such as a power outage. Because of the critical aspects of these files, this data set must be replicated. The paths of the database files are specified on the database property page and each database has its own path. The transaction log file path and checkpoint file path are specified on the storage group property page and are depended upon by all databases in that storage group.
The decision to replicate public folder databases is a more complex decision to make. Whether to do so depends in part on the Exchange topology design of your deployment. Unlike mailbox data, public folder data can be replicated directly by Exchange Server. You can have multiple replicas of public folder stores that replicate changes (content). This data replication is not performed in a synchronous manner.
Geocluster solutions require synchronous replication of the public folders within the cluster. This requirement is necessary for the cluster to come up fully functional in the secondary site. Mailbox databases within the cluster must point to the public folder store (“Default Public Store”) also hosted within the cluster so that clients will be able to logon immediately after the cluster becomes available in the secondary site. The public folders within the geocluster only need to host the hierarchy and not necessarily full content to facilitate mailbox logon during a failure condition. The option to host the full public folder content and replicate it synchronously within the Public Folders hosted in the geocluster is a business decision. If the public folder data is vital to the core business, meaning that only a minimal amount of data loss is acceptable, you should consider using a geocluster solution instead of the Exchange Server public folder replication mechanism. If you do not need this level of public folder data availability, you can use a non-geocluster synchronous replication solution for mailbox data combined with the replication mechanism found in Exchange public folders.
Simple Mail Transfer Protocol (SMTP) local queue data (Mailroot directory) is temporarily held in the storage device while it is being processed by Exchange Server. This design prevents the data from being lost in case of a server failure. For example, when a destination server is unreachable, the messages that should be routed to that server will be stored on the local server queue directory until they can be delivered. If the disk that stores the queue data fails, all the messages in the queue are lost. Because of the transitory nature of queue data, there is no defined process for backing up the mail queues like there is for backing up Exchange Server databases. Providing fault tolerance and/or high availability solutions for the storage holding this queue information can protect you from potential data loss. It is also recommended that the MTA queue data (MTADATA directory) be replicated in environments where transitory messages cannot be lost due to site failures.
The path for the SMTP Mailroot (including the Queue and Badmail directories per virtual server instance) is specified on the Messages tab of the SMTP Virtual Server property page in Exchange System Manager (ESM) and on the X.400 property page for MTA queue path. You should look at your profile to decide if it is necessary to replicate the Exchange queue data. If you have an existing Exchange topology, you can decide whether you can tolerate the data loss in the local queue. You can measure the expected amount of data in the local queue by using the local queue length in Performance Monitor (Perfmon.msc) or the Queue Viewer in ESM during peak load periods. If replication is required for queue data, it is important to test the performance of message processing in the replication environment, so that the replication latency that is introduced does not create a bottleneck to the transport. You can use the Exchange Server Stress and Performance 2003 tool for testing transport throughput in a synchronous replication environment where the queue data is replicated. You can download the tool from the Exchange Server Stress and Performance 2003 Web site.
Message tracking logs contain the information about all the messages transferred to, from, and within an Exchange server. This data can be important for diagnostic purposes. By default, message tracking is not enabled. However, if this data is important to your business, it could be replicated to prevent loss in the event of a disaster. The path for the message tracking log is specified on the Exchange Server property page in ESM.
Each vendor has their own various proprietary implementations of replication mechanisms that provide for different replication options. You should discuss solution details with the specific vendor to determine if the solution that they are proposing is best suited for meeting your organizational requirements and Service Level Agreement (SLA) for disaster recovery. The following recommendations may apply only to certain replication solutions:
|The term “replication point” is defined as the location where replication occurs. Depending upon the solution, this location can be at the host filter driver level or at a disk slice within a storage array.|
Configure replication at the logical/mount point volume level.
Even though the data which needs to be replicated is held in the files that are described in the "Exchange Data to Replicate" section of this topic, you must ensure that, at the host level, the replication is configured for the unit of a logical/mount point volume. For example, if the mailbox data path is G:\MDB1\MDB1.EDB, then drive G should be the base unit to perform replication. As a result, all the data on drive G will be replicated. Setting replication to occur at the file or subdirectory level is prone to human error and is not supported by Microsoft.
Create many replication points.
To reduce the queuing of multiple I/O’s which are destined for the same replication point, configure the storage to create as many replication points as possible. Load balance the I/O across many replication points. Depending upon the storage/replication solution, this approach can reduce the overall I/O read/write latency due to reduced I/O queuing.
Keep transaction logs on different logical volumes.
When data is being replicated, each write I/O request is queued at the replication point level. Exchange writes logs in a sequential pattern and, if these I/O’s are destined for the same replication point, the possibility is significant that every I/O will be queued for write. This situation would contribute to longer log write response times, which can be a significant negative factor in Exchange performance/scalability. For this reason, Microsoft recommends that you segment transaction logs from different storage groups on to different logical volumes with different replication points.
Use multiple replication links.
You can often improve the performance/scalability of the replication solution by configuring multiple replication links between the primary and secondary sites. This approach can be expensive to implement, and it is not required for Exchange data replication. However, there are deployments which have to implement multiple replication links to achieve the desired performance/scalability for a given Exchange data replication solution. It may also be necessary to load balance the replication points across the available replication links for optimal replication throughput.
Because Exchange has write order dependency between databases and their associated transaction logs, it is important to configure a group of replication points which back the storage group logical volumes (which includes the database logical unit number (LUN) and log LUN) to use the same replication link. This configuration is necessary to preserve the write ordering at the storage group level, which is essential to maintain the data integrity at the remote site in case of failure scenarios, such as a link failure.
Using multiple replication links with multiple replication points can be an effective approach to scaling an Exchange data replication solution. This approach could also reduce the possibility of data loss, which was discussed in the example in the earlier section "Synchronous Replication."
When Exchange is deployed in a synchronous replication environment, a few configuration changes will improve the server performance/scalability. It is important to understand these changes at the planning phase so that they can be implemented during the storage and replication design. Best practice recommendations for configuration are the following:
Create the maximum number of storage groups per Exchange server.
Increasing the number of storage groups in a synchronously replicated Exchange solution can benefit the performance/scalability of the deployment by load balancing log write transactions across multiple logical volumes and, subsequently, multiple replication points. In general, there will be more parallel log writing processes, which can reduce the overall transaction log-write latency (reduced I/O queuing) in a synchronous replication environment. Exchange Server 2003 Enterprise Edition allows four storage groups per Exchange server.
Increase transaction log buffer size.
Exchange Log write I/O latency is a significant scalability factor in synchronous replication Exchange solutions. Log write I/O's are sequential and single threaded, so the latency on log I/O is likely to be a bottleneck to the system. Log I/O’s are written to the log buffers first, and then the buffer is cleared by either a non-lazy commit or a capacity commit. A non-lazy commit means that the log buffer is written to the disk immediately. A capacity commit means that the log buffer is written to the disk when the buffer becomes full.
Increasing the log buffer size reduces the frequency of capacity flushes, increases the log write size, and subsequently reduces the overall log write latency. Reducing the log I/O write latency is a significant way to improve the performance/scalability of the Exchange deployment.
The general recommendation is to increase the buffer size to the maximum of 9,000 if the replication is over fibre channels. For low bandwidth links, such as TCP/IP links, it is not easy to determine an optimal value for this parameter. If the link shows saturation for the increased log writes size, which will slow the replication, you should do extensive testing to determine the optimal log buffer size that minimizes the log write latency. To learn how to modify this parameter, see Knowledge Base article 328466, "ESE log buffers that are set too low can cause the Microsoft Exchange Information Store service to stop responding." Also, consult your solution provider about this setting.
Even if the synchronous replication storage solution has followed all the previous recommendations, it may still cause performance problems for Exchange clients if the solution has not been thoroughly tested before deployment. There are no definitive rules as to the negative scalability/performance effects of implementing synchronous replication with Exchange. Each Exchange replication solution has different performance factors which may include, but are not limited to the following: distance between sites, replication transport mechanism, number of replication links, number of replication points, number of Exchange storage groups, Exchange database/log configuration settings, storage and replication architecture, and the Exchange client profile. Each solution is unique and requires extensive planning and testing for a successful deployment.
The I/O write latency attributed to synchronous replication solutions is the key factor in limiting Exchange scalability. This increased I/O latency creates a load on the server that can severely affect the Exchange client experience. Specifically, the high write latency causes the RPC latency to increase, which leads to a slower client experience. While synchronous replication provides for high availability of Exchange data, it also incurs a significant I/O performance penalty. This I/O write, and sometimes read, penalty is a critical factor in determining the maximum number of users that can be supported on a given platform.
In the planning phase, take the following steps to validate the design:
See Optimizing Storage for Exchange 2003 to understand how to best design and implement storage for Exchange.
Use Jetstress testing to validate the raw throughput of the storage with synchronous replication configured. To download the Jetstress tool, see the Microsoft Exchange Server Jetstress Tool Web site.
Measure the effect that the increased write latency has on the e-mail client by running an Exchange Server Load Simulator 2003 (LoadSim) test that is tailored to your environment. To download LoadSim, see the Microsoft Exchange Server 2003 Load Simulator (LoadSim) Web site.
Measure the average disk throughput when you run LoadSim. The disk throughput must be equal to or higher than the peak average throughput expected in the production environment which you are simulating (IOPS/Mailbox). For details on how to measure the peak average disk throughput, see Optimizing Storage for Exchange 2003.
Pay close attention to the RPC average latency counter on the server and the client response time after you run LoadSim tests. When you analyze the test results, be aware that all three counters must satisfy the criteria listed below.
RPC Average Latency
This counter shows the average amount of time required to service a single remote procedure call (RPC) request. Increasing either the user load or the replication distance will result in an increase of the average of RPC latency. The maximum limit on the average is 50ms and the maximum value should be 100ms. If the test results show an average above 50ms, the overall client experience is expected to be sluggish. If the average is less than 50ms, but occasionally spikes over 100ms, the client experience will be sluggish during the time of the spikes.
Disk Latency Counters
The Microsoft Exchange product team has tested several hardware synchronous replication solutions. The results indicate the connections between RPC average latency and disk latencies. As a general guideline, the solution is able to handle the given load when the average of the database read latency is under 20ms and the average of log read and write latencies are under 20ms. The maximum values for these latencies should be kept below 40ms. Above these thresholds, clients will likely experience slow response.
Client Response Time
You can confirm the overall client experience by running lslog.exe on all the client machines. This activity returns the weighted average of the 95th percentile; the value must be less than 1,000ms. lslog.exe is part of the LoadSim tool. LoadSim documentation discusses how to use Islog.exe and interpret the results that it provides.
For more information about performance, see Troubleshooting Exchange Server 2003 Performance.
Test the solution for your mailbox profile against the planned replication distance. Replication link distance has a physical limitation. As the distance increases, client/server response time slows as a result of the rising write latencies incurred by synchronous replication over the distance. Generally, 100KM is considered to be the threshold for synchronous storage replication of Exchange Server data. This threshold value can vary depending on the solution implementation.
Create a backup plan that validates the database integrity on a regular basis. Replication does not substitute for backup process.
Make sure that you have a comprehensive disaster recovery plan that has been tested as thoroughly as the replication performance of the solution. There are different methods of recovering Exchange data, servers, and sites. You should implement a disaster recovery plan that satisfies your business requirements and Service Level Agreements to guide you through a fast and efficient recovery phase should disaster occur. Test and validate the plan by simulating multiple types of failure conditions in an Exchange deployment using synchronous replication under heavy load conditions.