Improving Availability

Article
10/08/2009

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Availability refers to how much time the network is operational. Planning well for availability improves both your network’s mean time between failures (MTBF) and its mean time to recovery (MTTR) after a network failure.

To improve availability in your IP network design, you must know your organization’s availability requirements. For some organizations, unanticipated down time is simply an irritating inconvenience. In other environments, unanticipated down time could mean financial disaster, drastic loss of credibility, or, as in health care or law enforcement, a risk to safety.

Figure 1.12 shows the process for improving availability on your network.

Figure 1.12 Improving Availability

Improving Availability

Each method for improving availability places different demands on the design of your network. As the risk of down time to your operation increases, build more redundancy into your design, both in hardware and routing. Similarly, as the consequences of failure increase, make your network more resilient by increasing the amount of stress it can handle before it loses functionality.

Implementing Redundancy

Single points of failure, such as devices, links, and interfaces, can make a network vulnerable. If one such point fails, it isolates users from services and, in the worst case, causes entire sections of the network to fail. For a purely hierarchical network — one based on summarization and controlled access between tiers — every device and link is a point of failure.

Redundancy provides alternative paths around points of failure. In a purely redundant network, each individual device, link, and interface is dispensable. No single device, link, or interface can isolate users or cause the network to fail.

In most production environments, neither a purely hierarchical nor a purely redundant network is practical. You must balance the efficiency of a hierarchical network with the safety net of redundancy.

Implementing Secondary Paths

After deploying multiple devices to eliminate single points of failure, configure secondary paths to take advantage of the multiple devices. A secondary path, or backup path, consists of the interconnecting devices and the links between them that duplicate the devices and links in the primary path. For example, you can configure multiple routers to provide redundancy.

A redundant design uses the secondary path to maintain network connectivity when any of the primary path’s devices or links fails. Be sure to test any secondary paths on a regular basis. Do not assume that they will work. If possible, ensure that the switch from the primary path to the secondary path occurs transparently. For mission-critical applications, automatic failover is mandatory.

Using Load Balancing

In addition to its safety net function, redundancy plays a second valuable role. By properly configuring two or more paths that connect the same source and destination networks, you can significantly improve throughput by providing load balancing. Load balancing evenly divides the flow of traffic among parallel links.

Most routing protocols based on open standards support load balancing across paths that the protocol determines to be equally favorable to the destination. In addition, some vendors’ proprietary routing protocols support load balancing where the costs of the paths (their relative favorability to the destination in terms of shortest distance, number of hops, and other criteria) are not considered equal.

For more information about network load balancing, see "Designing Network Load Balancing" in Planning Server Deployments of this kit.

Improving Availability

Implementing Redundancy

Implementing Secondary Paths

Using Load Balancing

Additional resources