Ensuring Availability in NLB Solutions

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

By including multiple cluster hosts that provide the same applications and services, Network Load Balancing inherently provides fault tolerance. However, to provide a complete high-availability solution, your design must include more than just Network Load Balancing. The network infrastructure and system hardware that are associated with the cluster also affect the availability of the applications and services running on the cluster. In addition, include application-level monitoring, such as the monitoring provided by MOM or Application Center 2000, to ensure that applications are operating correctly. Ensuring availability is the final task in designing a network load balancing solution, as shown in Figure 8.17.

Figure 8.17   Ensuring Availability in Network Load Balancing Solutions

Ensuring Availability in Load Balancing Solutions

Include the following items to ensure high availability for clients that access applications and services on the cluster:

  • Cluster hosts with fault-tolerant hardware

  • Signed device drivers and software only

  • Fault-tolerant network infrastructure

Also, you can improve the availability of applications and services by using methods that are specific to the applications and services running on the cluster. For more information about improving the availability of services running on Network Load Balancing, see "Additional Resources for Designing Network Load Balancing" later in this chapter.

Note

  • After you design the specifications for ensuring availability, document your decisions. For a Word document to assist you in recording your decisions, see "NLB Cluster Host Worksheet" (Sdcnlb_1.doc) on the Windows Server 2003 Deployment Kit companion CD (or see "NLB Cluster Host Worksheet" on the Web at https://www.microsoft.com/reskit).

Including Fault-Tolerant Hardware on Cluster Hosts

The cluster host hardware that you specify in your design can affect the uptime of the applications and services in your solution. Including system hardware with a longer mean time between failure (MTBF) can ensure that you experience fewer cluster host failures. In addition, including cluster hosts with fault-tolerant hardware can prevent unnecessary outages in your cluster.

For more information about including fault-tolerant hardware in your design, see "Planning for High Availability and Scalability" in this book.

Including Signed Device Drivers and Software Only

Another method of improving application and services uptime is to include only signed device drivers and software on the cluster hosts. Drivers and software that are signed have been certified by Microsoft, your organization, or third-party companies that your organization trusts. Because unstable drivers and software can affect cluster uptime, including only signed device drivers and software helps ensure the stability of the cluster.

You can specify Group Policy settings in Active Directory to centrally configure the cluster hosts for the appropriate driver signing settings. When you are unable to specify driver signing by using Active Directory, specify the Local Security policies for each cluster host.

For more information about signed device drivers and software, see "Driver signing for Windows" in Help and Support Center for Windows Server 2003. For more information about specifying Group Policy settings, see "Designing a Group Policy Infrastructure in Designing a Managed Environment of this kit.

Including a Fault-Tolerant Network Infrastructure

Even if you perform all the previous steps to ensure fault tolerance for improving application and services uptime, your solution is not complete. Even with a highly optimized cluster, failures in the network infrastructure between the clients and the cluster can reduce uptime for applications and services.

To include a fault-tolerant network infrastructure between the clients and the cluster, complete the following steps:

  1. Identify the intermediary network segments, routers, and switches between the clients and the cluster.

  2. Determine if any of the intermediary network segments, routers, and switches between the clients and the cluster are potential points of failure that can cause application and services outages.

  3. Modify your design to provide a fault-tolerant network infrastructure, based on the information in Table 8.24.

    Table 8.24   Providing Network Infrastructure Fault Tolerance Based on Limitations

    Potential Failure Points Include Any of These Fault-Tolerance Solutions

    Network connection failure

    Redundant network connections to provide fault tolerance in the event that a network connection fails. For example, if you are connected to the Internet by a single T1 connection, a failure of the T1 connection would prevent clients from accessing the cluster. Specify a redundant T1 connection to help prevent this type of failure.

    Switch failure

    Redundant switches to provide fault tolerance in the event that a switch fails.

    Router failure

    Redundant routers and redundant routes to provide fault tolerance in the event that a router fails.