Addressing risks using server clusters

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Addressing risks using server clusters

When you configure a cluster, identify the possible failures that can interrupt access to resources. A single point of failure is any component in your environment that would block data or applications if it failed. Single points of failure can be hardware, software, or external dependencies, such as power supplied by a utility company and dedicated wide area network (WAN) lines.

In general, you provide maximum reliability when you:

  • Minimize the number of single points of failure in your environment.

  • Provide mechanisms that maintain service when a failure occurs.

With Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition, you can use server clusters and new administrative procedures to provide increased reliability. However, server clusters are not designed to protect all components of your workflow in all circumstances. For example, clusters are not an alternative to backing up data; clusters protect only availability of data, not the data itself.

The following table lists common points of failure and describes whether the point of failure can be protected, either by server clusters or by other means.

Failure point server cluster solution Other solutions

Network hub

Redundant networks. For more information, see Configuring cluster network hardware.

--

Utility company power

--

Uninterruptible power supply (UPS).

Put each cluster node on a separate circuit.

server connection

Failover

--

Disk

--

Hardware RAID, to protect the data on the disk. For more information, see Hardware redundant array of independent disks (RAID).

Other server hardware, such as CPU or memory.

Failover

--

server software, such as the operating system or specific applications.

Failover

--

Wide area network (WAN) links, such as routers and dedicated lines.

--

Redundant links over the WAN, to provide secondary access to remote connections.

Dial-up connection

--

Multiple modems.

Client computer within your organization

--

Configuring multiple clients for the same level of access. If one client fails, you still have access through other clients.

Authentication of the cluster service account

--

Redundant networks.

Configure individual nodes as domain controllers.

To further increase the availability of network resources and prevent the loss of data:

  • Consider having replacement disks and controllers available at your site. Always make sure that any spare parts you keep on hand exactly match the original parts, including network and SCSI components. The cost of two spare SCSI controllers can be a small fraction of the cost of having hundreds of clients unable to use data.

  • Consider providing UPS protection for individual computers and the network itself, including hubs, bridges, and routers. Computers running Windows Server 2003, Standard Edition support UPS. Many UPS solutions provide power for 5 to 20 minutes, which is long enough for the operating system to do an orderly shutdown when power fails.