Removing all Single Points of Failure

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Q. What other services does the server cluster rely on?

A. The cluster service itself relies on being able to authenticate and sign communications traffic between the cluster nodes. It uses the domain infrastructure to authenticate using the cluster service account. In an environment with Server clusters installed, you must ensure that the domain infrastructure is highly available; any disruption to the infrastructure can result in the clusters becoming unavailable.

Q. What other services do I need to think about?

A. In order for applications to remain highly available in a clustered environment, any services that the application requires external to the cluster must also be highly available. Many of these services have mechanisms such as replication or being made cluster-aware themselves to protect against failures. Examples of services that you should think about include WINS, DNS, DHCP, the domain infrastructure, firewalls, etc.

Q. What other single points of failure should I protect against?

A. Server clusters are a mechanism that protects applications against hardware, operating system and application failures. There are some types of hardware failure that you should think about:

  • Disk failures you should use RAID or mirroring to protect against disk failures

  • Hardware failures multiple hot swap fans in the server, redundant power supplies etc.

  • Network failures redundant networks that do not have any shared components

  • Site failures disaster recovery plans