System-Level Fault Tolerant Measures

Article
07/25/2014

This section provides system-level considerations and strategies for increasing the fault tolerance of your Exchange 2003 organization. Specifically, system-level refers to your Exchange 2003 infrastructure and the recommended best practices for implementing fault tolerance within that infrastructure.

The following figure illustrates a reliable Exchange 2003 infrastructure and lists the best practices for maintaining a high level of fault tolerance.

System-level fault tolerant measures

5cf317a4-324d-400f-ba6a-5f995d15a820

Fault Tolerant Infrastructure Measures

This section discusses methods for designing fault tolerance at each level in your Exchange 2003 infrastructure. Specifically, this section provides information about:

Implementing firewalls and perimeter networks
Ensuring reliable access to Active Directory and Domain Name System (DNS)
Ensuring reliable access to Exchange front-end servers
Configuring Exchange protocol virtual servers
Implementing a reliable back-end storage solution
Implementing a server clustering solution
Implementing a monitoring strategy
Implementing a disaster recovery strategy

Implementing Firewalls and Perimeter Networks

It is recommended that your Exchange 2003 topology includes a perimeter network and front-end and back-end server architecture. The following figure illustrates this topology, including the additional security provided by an advanced reverse-proxy server (in this case, Internet Security and Acceleration (ISA) Server 2000 Feature Pack 1).

Note

To increase the performance and scalability of your advanced reverse-proxy server, you can implement Windows Server 2003 Network Load Balancing (NLB) on the servers in your perimeter network. For information about NLB, see "Using Network Load Balancing on Your Front-End Servers" later in this topic.

Exchange 2003 topology using a perimeter network

d61b9e08-426b-4a9a-988d-1e2ae049624c

Deploying ISA Server 2000 Feature Pack 1 in a perimeter network is just one way you can help secure your messaging system. Other methods include using transport-level security such as Internet Protocol security (IPSec) or Secure Sockets Layer (SSL).

Important

Whether or not you decide to implement a topology that includes Exchange 2003 front-end servers, it is recommended that you not allow Internet users to access your back-end servers directly.

For complete information about designing a secure Exchange topology, see "Planning Your Infrastructure" in Planning an Exchange Server 2003 Messaging System.

For information about using ISA Server 2000 with Exchange 2003, see Using ISA Server 2000 with Exchange Server 2003.

Ensuring Reliable Access to Active Directory and Domain Name System

Exchange relies heavily on Active Directory and Domain Name System (DNS). To provide reliable and efficient access to Active Directory and DNS, make sure that your domain controllers, global catalog servers, and DNS servers are well protected from possible failures.

Domain Controllers

A domain controller is a server that hosts a domain database and performs authentication services that are required for clients to log on and access Exchange. (Users must be able to be authenticated by either Exchange or Windows.) Exchange 2003 relies on domain controllers for system and server configuration information. In Windows Server 2003, the domain database is part of the Active Directory database. In a Windows Server 2003 domain forest, Active Directory information is replicated between domain controllers that also host a copy of the forest configuration and schema containers.

A domain controller can assume numerous roles within an Active Directory infrastructure: a global catalog server, an operations master, or a simple domain controller.

Global Catalog Servers

A global catalog server is a domain controller that hosts the global catalog. A global catalog server is required for logon because it contains information about universal group membership. This membership grants or denies user access to resources. If a global catalog server cannot be contacted, a user's universal membership cannot be determined and logon access is denied.

Note

Although Windows Server 2003 provides features that do not require a local global catalog server, you still need a local global catalog server for Exchange and Outlook. The global catalog server is critical for Exchange services (including logon, group membership, and Microsoft Exchange Information Store service) and access to the global address list (GAL). Deploying global catalog servers locally to both servers and users allows for more efficient address lookups. Contacting a global catalog server across a slow connection increases network traffic and impairs the user experience.

At least one global catalog server must be installed in each domain that contains Exchange servers.

Domain Controller and Global Catalog Server Best Practices

Because domain controllers contain essential Active Directory information, make sure that the domain controllers in your organization are well protected from possible failures.

The following are best practices for deploying and configuring Active Directory domain controllers and global catalog servers:

Unless it is a requirement for your organization, do not run Exchange 2003 on your domain controllers. For information about the implications of running Exchange on a domain controller, see "Running Exchange 2003 on a Domain Controller" later in this topic.
Place at least two domain controllers in each Active Directory site. If a domain controller is not available within a site, Exchange will look for another domain controller. This is especially important if the other domain controllers in your organization can be accessed only across a WAN. This circumstance could cause performance issues and possibly introduce a single point of failure.
Place at least two global catalog servers in each Active Directory site. If a global catalog server is not available within a site, Exchange will look for another global catalog server. This is especially important if the other global catalog servers in your organization can be accessed only across a WAN. This circumstance could cause performance issues and possibly introduce a single point of failure.

Note

If your performance requirements do not demand the bandwidth of two domain controllers and two global catalog servers per domain, consider configuring all of your domain controllers as global catalog servers. In this scenario, every domain controller will be available to provide global catalog services to your Exchange 2003 organization.
There should generally be a 4:1 ratio of Exchange processors to global catalog server processors, assuming the processors are similar models and speeds. However, higher global catalog server usage, a large Active Directory database, or large distribution lists can necessitate more global catalog servers.
In branch offices that service more than 10 users, one global catalog server must be installed in each location that contains Exchange servers. However, for redundancy purposes, deploying two global catalog servers is ideal. If a physical site does not have two global catalog servers, you can configure existing domain controllers as global catalog servers.
If your architecture includes multiple subnets per site, you can add additional availability by ensuring that you have at least one domain controller and one global catalog server per subnet. As a result, even if a router fails, you can still access the domain controller access.
Ensure that the server assigned to the infrastructure master role is not a global catalog server. For information about the infrastructure master role, see the topic "Global catalog and infrastructure master" in Windows 2000 Server Help.
Consider monitoring the LDAP latency on all Exchange 2003 domain controllers. For information about monitoring Exchange, see Implementing Software Monitoring and Error-Detection Tools.
Consider increasing the LDAP threads from 20 to 40, depending on your requirements. For information about tuning Exchange, see the Exchange Server 2003 Performance and Scalability Guide.
Ensure that you have a solid backup plan for your domain controllers.

Running Exchange 2003 on a Domain Controller

As a best practice, you should not run Exchange 2003 on servers that also function as Windows domain controllers. Instead, you should configure Exchange servers and Windows domain controllers separately.

However, if your organization requires that you run Exchange 2003 on a domain controller, consider the following limitations:

If you run Exchange 2003 on a domain controller, it uses only that domain controller. As a result, if the domain controller fails, Exchange cannot fail over to another domain controller.
If your Exchange servers also perform domain controller tasks in addition to serving Exchange client computers, those servers may experience performance degradation during heavy user loads.
If you run Exchange 2003 on a domain controller, your Active Directory and Exchange administrators may experience an overlap of security and disaster recovery responsibilities.
Exchange 2003 servers that are also domain controllers cannot be part of a Windows cluster. Specifically, Exchange 2003 does not support clustered Exchange 2003 servers that coexist with Active Directory servers. For example, because Exchange administrators who can log on to the local server have physical console access to the domain controller, they can potentially elevate their permissions in Active Directory.
If your server is the only domain controller in your messaging system, it must also be a global catalog server.
If you run Exchange 2003 on a domain controller, avoid using the /3GB switch. If you use this switch, the Exchange cache may monopolize system memory. Additionally, because the number of user connections should be low, the /3GB switch should not be required.
Because all services run under LocalSystem, there is a greater risk of exposure if there is a security bug. For example, if Exchange 2003 is running on a domain controller, an Active Directory bug that allows an attacker to access Active Directory would also allow access to Exchange.
A domain controller that is running Exchange 2003 takes a considerable amount of time to restart or shut down. (approximately 10 minutes or longer). This is because services related to Active Directory (for example, Lsass.exe) shut down before Exchange services, thereby causing Exchange services to fail repeatedly while searching for Active Directory services. One solution to this problem is to change the time-out for a failed service. A second solution is to manually stop the Exchange services before you shut down the server.

Domain Name System and Windows Internet Name Service Availability

Similar to domain controller and global catalog server services, Domain Name System (DNS) services are critical to the availability of your Exchange 2003 organization. On a Windows Server 2003 network, users locate resources by using DNS and Windows Internet Name Service (WINS). The failure of a DNS server can prevent users from locating your messaging system.

To ensure that your Exchange 2003 topology includes reliable access to DNS, consider the following:

Ensure that a secondary DNS server exists on the network. If the primary DNS server fails, this secondary server should be able to direct users to the correct servers.
Integrate Windows Server 2003 DNS zones into Active Directory. In this scenario, each domain controller becomes a potential DNS server.
Configure each client computer with at least two DNS addresses.
Ideally, both DNS servers should be in the same site as the client. If the DNS servers are not in the same site as the client, the primary DNS server should be the server that is in the same site as the client.
Ensure that name resolution and DNS functionality are both operating correctly. For more information, see Microsoft Knowledge Base article 322856, "HOW TO: Configure DNS for Use with Exchange Server."
Before deploying Exchange, ensure that DNS is correctly configured at the hub site and at all branches.
Exchange requires WINS. Although it is possible to run Exchange 2003 without enabling WINS, it is not recommended. There are availability benefits that result from using WINS to resolve NetBIOS names. (For example, in some configurations, using WINS removes the potential risk of duplicate NetBIOS names causing a name resolution failure.) For more information, see Microsoft Knowledge Base article 837391, "Exchange Server 2003 and Exchange 2000 Server require NetBIOS name resolution for full functionality."

For information about deploying DNS and WINS, see "Deploying DNS" and "Deploying WINS" in the Microsoft Windows Server 2003 Deployment Kit.

Ensuring Reliable Access to Exchange Front-End Servers

If your organization has more than one Exchange server, it is recommended that you use Exchange front-end and back-end server architecture. Front-end and back-end architecture provides several client access performance and availability benefits.

Internet clients access their mailboxes through front-end servers. However, in default Exchange 2003 configurations, MAPI clients cannot use front-end servers; instead, these clients access their mailboxes through back-end servers directly.

Note

You can configure Exchange 2003 RPC over HTTP to allow your MAPI clients to access their mailboxes through front-end servers. For information about using RPC over HTTP, see Exchange Server 2003 RPC over HTTP Deployment Scenarios.

When front-end servers use HTTP, POP3, and IMAP4, performance is increased because the front-end servers offload some load processing duties from the back-end servers.

If you plan to support MAPI, HTTP, POP3, or IMAP4, you can use Exchange front-end and back-end server architecture to take advantage of the following benefits:

Front-end servers balance processing tasks among servers. For example, front-end servers perform authentication, encryption, and decryption processes. This improves the performance of your Exchange back-end servers.
Your messaging system security is improved. For more information, see "Security Measures" later in this topic.
To incorporate redundancy and load balancing in your messaging system, you can use Network Load Balancing (NLB) on your Exchange front-end servers.

For information about planning an Exchange 2003 front-end and back-end architecture, see "Planning Your Infrastructure" in Planning an Exchange Server 2003 Messaging System.

For information about deploying front-end and back-end servers, see Using Microsoft Exchange 2000 Front-End Servers. Although that document focuses on Exchange 2000, the content is applicable to Exchange 2003.

To build fault tolerance into your messaging system, consider implementing Exchange front-end servers that use NLB. You should also configure redundant virtual servers on your front-end servers.

Using Network Load Balancing on Your Front-End Servers

Network Load Balancing (NLB) is a Windows Server 2003 service that provides load balancing support for IP-based applications and services that require high scalability and performance. When implemented on your Exchange 2003 front-end servers, NLB can address bottlenecks caused by front-end services.

The following figure illustrates a basic front-end and back-end architecture that includes NLB.

Basic front-end and back-end architecture including Network Load Balancing

f55baf22-6ba0-4906-8a6b-ee7ae5233798

An NLB cluster dynamically distributes IP traffic to two or more Exchange front-end servers, transparently distributes client requests among the front-end servers, and allows clients to access their mailboxes using a single server namespace.

NLB clusters are computers that, through their numbers, enhance the scalability and performance of the following:

Web servers
Computers running ISA Server (for proxy and firewall servers)
Other applications that receive TCP/IP and User Datagram Protocol (UDP) traffic

NLB cluster nodes usually have identical hardware and software configurations. This helps ensure that your users receive consistent front-end service performance, regardless of the NLB cluster node that provides the service. The nodes in an NLB cluster are all active.

Important

NLB clustering does not provide failover support as does the Windows Cluster service. For more information, see the next section, "Network Load Balancing and Scalability."

For more information about NLB, see "Designing Network Load Balancing" and "Deploying Network Load Balancing" in the Microsoft Windows Server 2003 Deployment Kit.

Network Load Balancing and Scalability

With NLB, as the demand increases on your Exchange 2003 front-end servers, you can either scale up or scale out. In general, if your primary goal is to provide faster service to your Exchange users, scaling up (for example, adding additional processors and additional memory) is a good solution. However, if you want to implement some measure of fault tolerance to your front-end services, scaling out (adding additional servers) is the best solution. With NLB, you can scale out to 32 servers if necessary. Scaling out increases fault tolerance because, if you have more servers in your NLB cluster, a server failure affects fewer users.

Important

You must closely monitor the servers in your NLB cluster. When one server in an NLB cluster fails, client requests that were configured to be sent to the failed server are not automatically distributed to the other servers in the cluster. Therefore, when one server in your NLB cluster fails, it should immediately be taken out of the cluster to ensure that required services are provided to your users.

Configuring Exchange Protocol Virtual Servers

When configuring your Exchange 2003 messaging system, use Exchange System Manager to create a protocol virtual server for each protocol that you want to support on a specific front-end server.

To maximize availability and performance of your front-end servers, consider the following recommendations when configuring protocol virtual servers:

When configuring NLB for your Exchange 2003 front-end servers, you should make sure that all protocol virtual servers on your NLB front-end servers are configured with identical settings.

Important

If the protocol virtual servers in your NLB cluster are not identical, your e-mail clients may experience different behavior, depending on the server to which they are routed.
If you are not using NLB on your front-end servers, do not create additional protocol virtual servers on each of your front-end servers. (For example, do not create two identical HTTP protocol virtual servers on the same front-end server.) Additional virtual servers can significantly affect performance and should be created only when default virtual servers cannot be configured adequately.

For more information about configuring Exchange protocol virtual servers, see the Exchange Server 2003 Administration Guide.

For information about tuning Exchange 2003 front-end servers, see the Exchange Server 2003 Performance and Scalability Guide.

Implementing a Reliable Back-End Storage Solution

A reliable storage strategy is paramount to achieving a fault tolerant messaging system. To implement and configure a reliable storage solution, you should be familiar with the following:

Exchange 2003 database technology
Best practices for configuring and maintaining Exchange data
Advanced storage technologies such as RAID and Storage Area Networks (SANs)
For detailed information about planning and implementing a reliable back-end storage solution, see Planning a Reliable Back-End Storage Solution.

Implementing a Server Clustering Solution

By allowing the failover of resources, server clustering provides fault tolerance for your Exchange 2003 organization. Specifically, server clusters that use the Cluster service maintain data integrity and provide failover support and high availability for mission-critical applications and services on your back-end servers, including databases, messaging systems, and file and print services.

The following figure illustrates an example of a four-node cluster where three nodes are active and one is passive.

Example of a four-node 3 active/1 passive cluster

dffb0365-e309-4ecf-aebd-18180cd7410f

In server clusters, nodes share access to data. Nodes can be either active or passive, and the configuration of each node depends on the operating mode (active or passive) and how you configure failover. A server that is designated to handle failover must be sized to handle the workload of the failed node.

Note

In Windows Server 2003, Enterprise Edition, and Windows Server 2003, Datacenter Edition, server clusters can contain up to eight nodes. Each node is attached to one or more cluster storage devices, which allow different servers to share the same data. Because nodes in a server cluster share access to data, the type and method of storage in the server cluster is important.

For information about planning Exchange server clusters, see Planning for Exchange Clustering.

Benefits of Clustering

Server clustering provides two main benefits in your organization: failover and scalability.

Failover

Failover is one of the most significant benefits of server clustering. If one server in a cluster stops functioning, the workload of the failed server fails over to another server in the cluster. Failover ensures continuous availability of applications and data. Windows Clustering technologies help guard against three specific failure types:

Application and service failures. These failures affect application software and essential services.
System and hardware failures. These failures affect hardware components such as CPUs, drives, memory, network adapters, and power supplies.
Site failures in multi-site organizations. These failures can be caused by natural disasters, power outages, or connectivity outages. To protect against this type of failure, you must implement an advanced geoclustering solution. For more information, see "Using Multiple Physical Sites" later in this topic.

By helping to guard against these failure types, server clustering provides the following two benefits for your messaging environment:

High availability The ability to provide end users with dependable access services while reducing unscheduled outages.
High reliability The ability to reduce the frequency of system failure.

Scalability

Scalability is another benefit of server clustering. Because you can add nodes to your clusters, Windows server clusters are extremely scalable.

Limitations of Clustering

Rather than providing fault tolerance at the data level, server clustering provides fault tolerance at the application level. When implementing a server clustering solution, you must also implement solid data protection and recovery solutions to protect against viruses, corruption, and other threats to data. Clustering technologies cannot protect against failures caused by viruses, software corruption, or human error.

Clustering vs. Fault Tolerant Hardware

Both clustering and fault tolerant hardware protect your system from component failures (such as CPU, memory, fan, or PCI bus failures). Although you can use clustering and fault tolerant hardware together as an end-to-end solution, be aware that the two methods provide high availability in different ways:

Clustering can provide protection from an application or operating system failure. However, a stand-alone (non-clustered) server using fault tolerant hardware (or a server that uses hot-swappable hardware, which allows a device to be added while the server is running) cannot provide protection from these failure types.
Clustering enables you to perform upgrades or installations on one of the cluster nodes, while maintaining full Exchange service availability for users. With stand-alone (non-clustered) servers, you must often stop Exchange services to perform these upgrades or installations. For specific information about how you can maintain Exchange service availability when performing upgrades or installations, see "Taking Exchange Virtual Servers or Exchange Resources Offline" in the Exchange Server 2003 Administration Guide.

Implementing a Monitoring Strategy

Continuous monitoring of your network, applications, data, and hardware is essential for high availability. Software-monitoring tools and techniques enable you to determine the health of your system and identify potential issues before an error occurs.

To maximize availability, you must consistently manage, monitor, and troubleshoot your servers and applications. If a problem occurs, you must be able to react quickly so you can recover data and make it available as soon as possible. To help you monitor your Exchange 2003 organization, you could use the Exchange 2003 Management Pack for Microsoft Operations Manager.

For complete information about Exchange 2003 Management Pack, Microsoft Operations Manager, and other monitoring tools, see Implementing Software Monitoring and Error-Detection Tools.

Implementing a Disaster Recovery Solution

To increase fault tolerance in your organization, you need to develop and implement a well-planned backup and recovery strategy. If you are prepared, you should be able to recover from most failures.

Additional System-Level Best Practices

After considering measures to increase fault tolerance in your Exchange 2003 infrastructure, consider the following additional system-level best practices:

Safeguarding the physical environment of your servers Take precautions to ensure that the physical environment is protected.
Security measures Implement permissions practices, security patching, physical computer security, antivirus protection, and anti-spam solutions.
Message routing Use fault tolerant network hardware and correctly configure your routing groups and connectors.
Use multiple physical sites Protect data from site failures by mirroring data to one or more remote sites or implementing geoclustering to allow failover in the event of a site failure.
Operational procedures Maintain and monitor servers, use standardized procedures, and test your disaster recovery procedures.
Laboratory testing and pilot deployments Before deploying your messaging system in a production environment, test performance and scalability in laboratory and pilot environments.

Safeguarding the Physical Environment of Your Servers

To maintain the availability of your servers, you should maintain high standards for the environment in which the servers must run. To increase the longevity and reliability of your server hardware, consider the following:

Temperature and humidity Install mission-critical servers in a room established for that purpose—specifically a room in which you can carefully control temperature and humidity. Computers perform best at approximately 70 degrees Fahrenheit (approximately 21 degrees Celsius). In an office setting, temperature is not usually an issue. However, consider the effects of a long holiday weekend in the summer with the air conditioning turned off.
Dust or contaminants Where possible, protect servers and other equipment from dust and contaminants and check for dust periodically. Dust and other contaminants can cause components to short-circuit or overheat, which can cause intermittent failures. Whenever a server's case is open, quickly check to determine whether the unit needs cleaning. If so, check all the other units in the area.
Power supplies As with any disaster-recovery planning, planning for power outages is best done long before you anticipate outages and involves identifying resources that are most critical to the operation of your business. When possible, provide power from at least two circuits to the computer room and divide redundant power supplies between the power sources. Ideally, the circuits should originate from two sources that are external to the building. Be aware of the maximum amount of power a location can provide. It is possible that a location could have so many servers that there is not sufficient power for any additional servers. Consider a backup power supply for use in the event of a power failure in your computer center. It may be necessary to continue providing computer service to other buildings in the area or to areas geographically remote from the computer center. You can use uninterruptible power supply (UPS) units to handle short outages and standby generators to handle longer outages. When reviewing equipment that requires backup power during an outage, include network equipment, such as routers.
Maintenance of cables To prevent physical damage to cables, make sure the cables are neat and orderly, either with a cable management system or tie wraps. Cables should never be loose in a cabinet, where they can be disconnected by mistake. If possible, make sure that all cables are securely attached at both ends. Also, make sure that pull-out, rack-mounted equipment has enough slack in the cables, and that the cables do not bind and are not pinched or scraped. Set up good pathways for redundant sets of cables. If you use multiple sources of power or network communications, try to route the cables into the cabinets from different points. This way, if one cable is severed, the other can continue to function. Do not plug dual power supplies into the same power strip. If possible, use separate power outlets or UPS units (ideally, connected to separate circuits) to avoid a single point of failure.

Security Measures

Security is a critical component to achieving a highly available messaging system. Although there are many security measures to consider, the following are some of the more significant:

Permission practices
Security patches
Physical security
Antivirus protection
Anti-spam solutions

For detailed information about these and other security measures, see the Exchange Server 2003 Security Hardening Guide.

Message Routing Considerations

Your routing topology is the basis of your messaging system. As a result, you must plan your routing topology with network, bandwidth, and geographical considerations in mind.

Routing describes how Exchange transfers messages from one server to another. When planning your routing topology, you must understand how messages are transferred within Exchange and then plan a topology for the most efficient transfer of messages. You must also plan the locations of connectors to messaging systems outside your Exchange organization. Careful planning can reduce the volume of network traffic and optimize Exchange and Windows services.

To ensure that your message routing is reliable and available, consider the following high-level recommendations:

Make sure that your physical network has built-in redundancy. For more information, see "Network Hardware" earlier in this topic.
Make sure that you have correctly configured connectors and routing groups. For example, in some scenarios, using Exchange System Manager to configure redundant connector paths can limit a single point of failure.
Configure your connectors to ensure there are multiple paths to all bridgehead servers.
If applicable, make sure that your Simple Mail Transfer Protocol (SMTP) gateway servers are redundant. In large data centers, it is generally recommended that you dedicate specific Exchange 2003 servers to handle only inbound and outbound SMTP traffic. These servers are usually called SMTP gateway servers or SMTP hubs. These servers are responsible for moving SMTP e-mail between clients and Exchange 2003 mailbox servers (back-end servers).

For information about planning your routing design and configuration (including recommendations for creating routing groups and connectors), see Planning an Exchange Server 2003 Messaging System.

For information about how to configure message routing, see the Exchange Server 2003 Transport and Routing Guide.

Using Multiple Physical Sites

To improve disaster recovery and increase availability, some organizations use multiple physical sites. Most multi-site designs include a primary site and one or more remote sites that mirror the primary site. The level at which components and data are mirrored between sites depends on the SLA and the business requirements. Another option is to implement geographically dispersed clusters. With geographically dispersed clusters, in the event of a disaster, applications at one site can fail over to another site.

The following sections provide more information about site mirroring and geoclustering.

Site Mirroring

Site mirroring involves using either synchronous or asynchronous replication to mirror data (for example, Exchange 2003 databases and transaction log data) from the primary site to one or more remote sites.

Using site mirroring to provide data redundancy in multiple physical sites

5e2db3ec-08d7-41e7-82a5-d91321c7408a

If a complete site failure occurs at the primary site, the amount of time it takes to bring Exchange services online at the mirrored site depends on the complexity of your Exchange organization, the amount of preconfigured standby hardware you have, and your level of administrative support. For example, an organization may be able to follow a preplanned set of disaster recovery procedures and bring their Exchange messaging system online within 24 hours. Although 24 hours may seem like a lot of downtime, you may be able to recover data close to the point of failure. For information about synchronous and asynchronous replication of Exchange data, see "Exchange Data Replication Technologies" in Overview of Storage Technologies.

Geographically Dispersed Clusters

A more advanced way to implement fault tolerance at the site level is to implement geographically dispersed clusters. To deploy geographically dispersed clusters with Windows Server 2003, you use virtual LANs (VLANs) to connect SANs over long distances.

Using geographically dispersed clustering to provide application failover between physical sites

59de5320-fb94-40a7-8633-8b660c3b6089

Geographically dispersed cluster configurations can be complex, and the clustered servers must use only components supported by Microsoft. You should deploy geographically dispersed clusters only with vendors who provide qualified configurations.

For more information about geographically dispersed cluster solutions with Exchange 2003, see Planning for Exchange Clustering.

For information about Windows Server 2003 and geographically dispersed clusters, see Geographically Dispersed Clusters in Windows Server 2003.

Operational Best Practices

When operating and administering your Exchange 2003 messaging system, it is important that your IT staff use standard IP best practices. This section provides best practices for maximizing the availability of your applications and computers. (This information applies to both clustered and non-clustered environments.)

Minimize or eliminate support for multiple versions of operating systems, service packs, and out-of-date applications
It is difficult to provide reliable support when multiple combinations of different software and hardware versions are used together in one system (or in systems that interact on the network). Out-of-date software, protocols, and drivers (and associated hardware) are impractical when they do not support new technologies. Set aside resources and time for planning, testing, and installing new operating systems, applications, and hardware. When planning software upgrades, work with users to identify the features they require. Provide training to ease users through software transitions. In your software and support budget, provide funds for upgrading applications and operating systems in the future.

Isolate unreliable applications
An unreliable application is an application that your business cannot do without, but that does not meet appropriate standards for reliability. If you must work with such an application, there are two basic approaches you can take:
- Remove the unreliable applications from the servers that are most critical to your enterprise. If an application is known to be unreliable, take steps to isolate it, and do not run the application on a mission-critical server.
- Provide sufficient monitoring, and use automatic restarting options where appropriate. Sufficient monitoring requires taking snapshots of important system performance measurements at regular intervals. You can set up automatic restarting of an application or service by using the Services snap-in. For more information about Windows services, see "Services overview" in Windows Server 2003 Help.

Use current, standardized hardware
Incompatible hardware can cause performance problems and data loss. Maintain and follow a hardware standard for new systems, spare parts, and replacement parts.

Plan for future capacity requirements
Capacity planning is critical to the success of highly available systems. To understand how much extra capacity currently exists in the system, study and monitor your system during peak loads.

Maintain an updated list of operational procedures
When a root system problem is fixed, make sure you remove any outdated procedures from operation and support schedules. For example, when software is replaced or upgraded, certain procedures might become unnecessary or no longer be valid. Pay special attention to procedures that may have become routine. Make sure that all procedures are necessary and not temporary fixes for issues for which the root cause has not been found.

Perform adequate monitoring practices
If you do not adequately monitor your messaging system, you might not identify problems before they become critical and cause system failures. Without monitoring, an application or server failure could be your only notification of a problem.

Determine the nature of the problem before reacting
If the operations staff is not trained and directed to analyze problems carefully before reacting, your personnel can spend large amounts of time responding inappropriately to a problem. They also might not be effectively using monitoring tools in the crucial time between the first signs of a problem and an actual failure.

Treat the root cause of problems instead of treating symptoms
When an unexpected failure occurs or when performing short-term preventive maintenance, symptom treatment is an effective strategy for restoring services. However, symptom treatments that are added to standard operating procedures can become unmanageable. Support personnel can be overwhelmed with symptom treatments and might not be able to correctly react to new failures.

Avoid stopping and restarting services and servers to end error conditions
Stopping and restarting a server may be necessary at times. However, if this process temporarily fixes a problem but does not address the root cause, it can create additional problems.

Laboratory Testing and Pilot Deployments

Before you deploy any new solution, whether it is fault tolerant or network hardware, a software monitoring tool, or a Windows Clustering solution, you should thoroughly test the solution before deploying it in a production environment. After testing in an isolated lab, test the solution in a pilot deployment in which only a few users are affected, and then make any necessary adjustments to the design. After you are satisfied with the pilot deployment, perform a full-scale deployment in your production environment.

Depending on the number of users in your Exchange organization, you may want to perform your full-scale deployment in stages. After each stage, verify that your system can accommodate the increased processing load from the additional users before deploying the next group of users. For complete information about setting up test and pilot environments, see "Designing a Test Environment" and "Designing a Pilot Project" in the Microsoft Windows Server 2003 Deployment Kit.

Exchange Capacity Planning Tools

To determine how many Exchange servers are required to manage user load, use the following capacity planning tools:

Exchange Server Load Simulator 2003 (LoadSim)
Exchange Server Stress and Performance (ESP) tool
Jetstress

Important

Because some of these tools create accounts that have insecure passwords, these tools are intended for use in test environments, not in production environments.

Exchange Server Load Simulator 2003

With Exchange Server Load Simulator 2003 (LoadSim), you can simulate the load of MAPI clients against Exchange. You simulate the load by running LoadSim tests on client computers. These tests send messaging requests to the Exchange server, causing a load on the server.

Use the output from these tests in the following ways:

To calculate the client computer response time for the server configuration under client load
To estimate the number of users per server
To identify bottlenecks on the server

You can download LoadSim 2003 from the Downloads for Exchange Server 2003 Web site.

Exchange Server Stress and Performance Tool

The Exchange Server Stress and Performance (ESP) 2003 tool is a highly scalable stress and performance tool for Exchange. It simulates large numbers of client sessions by concurrently accessing one or more protocol services. Scripts control the actions that each simulated user performs. The scripts contain the logic for communicating with the server. Test modules (DLLs) then run these scripts. Test modules connect to a server through Internet protocols, calls to application programming interfaces (APIs), or through interfaces like OLE DB.

ESP is modular and extensible and currently provides modules for most Internet protocols, including the following:

WebDAV
Internet Message Access Protocol version 4rev1 (IMAP4)
Lightweight Directory Access Protocol (LDAP)
OLE DB
Post Office Protocol version 3 (POP3)
Simple Mail Transfer Protocol (SMTP)

You can download ESP 2003 at https://go.microsoft.com/fwlink/?linkid=27881.

Jetstress

Exchange 2003 is a disk-intensive application. To function correctly, Exchange requires a fast, reliable disk subsystem. Jetstress (Jetstress.exe) is an Exchange tool that helps administrators verify the performance and stability of the disk subsystem prior to deploying Exchange servers in a production environment. For more information about Jetstress and Exchange back-end storage, see Planning a Reliable Back-End Storage Solution.

You can download Jetstress at https://go.microsoft.com/fwlink/?linkid=27883.

System-Level Fault Tolerant Measures

Fault Tolerant Infrastructure Measures

Implementing Firewalls and Perimeter Networks

Ensuring Reliable Access to Active Directory and Domain Name System

Domain Controllers

Global Catalog Servers

Domain Controller and Global Catalog Server Best Practices

Running Exchange 2003 on a Domain Controller

Domain Name System and Windows Internet Name Service Availability

Ensuring Reliable Access to Exchange Front-End Servers

Using Network Load Balancing on Your Front-End Servers

Network Load Balancing and Scalability

Configuring Exchange Protocol Virtual Servers

Implementing a Reliable Back-End Storage Solution

Implementing a Server Clustering Solution

Benefits of Clustering

Failover

Scalability

Limitations of Clustering

Clustering vs. Fault Tolerant Hardware

Implementing a Monitoring Strategy

Implementing a Disaster Recovery Solution

Additional System-Level Best Practices

Safeguarding the Physical Environment of Your Servers

Security Measures

Message Routing Considerations

Using Multiple Physical Sites

Site Mirroring

Geographically Dispersed Clusters

Operational Best Practices

Laboratory Testing and Pilot Deployments

Exchange Capacity Planning Tools

Exchange Server Load Simulator 2003

Exchange Server Stress and Performance Tool

Jetstress

Additional resources