Ensuring Availability in NLB Solutions
Updated: March 28, 2003
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
By including multiple cluster hosts that provide the same applications and services, Network Load Balancing inherently provides fault tolerance. However, to provide a complete high-availability solution, your design must include more than just Network Load Balancing. The network infrastructure and system hardware that are associated with the cluster also affect the availability of the applications and services running on the cluster. In addition, include application-level monitoring, such as the monitoring provided by MOM or Application Center 2000, to ensure that applications are operating correctly. Ensuring availability is the final task in designing a network load balancing solution, as shown in Figure 8.17.
Figure 8.17 Ensuring Availability in Network Load Balancing Solutions
Include the following items to ensure high availability for clients that access applications and services on the cluster:
Cluster hosts with fault-tolerant hardware
Signed device drivers and software only
Fault-tolerant network infrastructure
Also, you can improve the availability of applications and services by using methods that are specific to the applications and services running on the cluster. For more information about improving the availability of services running on Network Load Balancing, see "Additional Resources for Designing Network Load Balancing" later in this chapter.
After you design the specifications for ensuring availability, document your decisions. For a Word document to assist you in recording your decisions, see "NLB Cluster Host Worksheet" (Sdcnlb_1.doc) on the Windows Server 2003 Deployment Kit companion CD (or see "NLB Cluster Host Worksheet" on the Web at http://www.microsoft.com/reskit).
Including Fault-Tolerant Hardware on Cluster Hosts
The cluster host hardware that you specify in your design can affect the uptime of the applications and services in your solution. Including system hardware with a longer mean time between failure (MTBF) can ensure that you experience fewer cluster host failures. In addition, including cluster hosts with fault-tolerant hardware can prevent unnecessary outages in your cluster.
For more information about including fault-tolerant hardware in your design, see "Planning for High Availability and Scalability" in this book.
Including Signed Device Drivers and Software Only
Another method of improving application and services uptime is to include only signed device drivers and software on the cluster hosts. Drivers and software that are signed have been certified by Microsoft, your organization, or third-party companies that your organization trusts. Because unstable drivers and software can affect cluster uptime, including only signed device drivers and software helps ensure the stability of the cluster.
You can specify Group Policy settings in Active Directory to centrally configure the cluster hosts for the appropriate driver signing settings. When you are unable to specify driver signing by using Active Directory, specify the Local Security policies for each cluster host.
For more information about signed device drivers and software, see "Driver signing for Windows" in Help and Support Center for Windows Server 2003. For more information about specifying Group Policy settings, see "Designing a Group Policy Infrastructure in Designing a Managed Environment of this kit.
Including a Fault-Tolerant Network Infrastructure
Even if you perform all the previous steps to ensure fault tolerance for improving application and services uptime, your solution is not complete. Even with a highly optimized cluster, failures in the network infrastructure between the clients and the cluster can reduce uptime for applications and services.
To include a fault-tolerant network infrastructure between the clients and the cluster, complete the following steps:
Identify the intermediary network segments, routers, and switches between the clients and the cluster.
Determine if any of the intermediary network segments, routers, and switches between the clients and the cluster are potential points of failure that can cause application and services outages.
Modify your design to provide a fault-tolerant network infrastructure, based on the information in Table 8.24.
Table 8.24 Providing Network Infrastructure Fault Tolerance Based on Limitations
Potential Failure Points Include Any of These Fault-Tolerance Solutions
Network connection failure
Redundant network connections to provide fault tolerance in the event that a network connection fails. For example, if you are connected to the Internet by a single T1 connection, a failure of the T1 connection would prevent clients from accessing the cluster. Specify a redundant T1 connection to help prevent this type of failure.
Redundant switches to provide fault tolerance in the event that a switch fails.
Redundant routers and redundant routes to provide fault tolerance in the event that a router fails.