Motivation (Server Clusters: Majority Node Set Quorum)

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP2

The Microsoft Cluster Server (MSCS) architecture requires a single quorum resource in the cluster that is used as the tie-breaker to avoid split-brain scenarios. A split-brain scenario happens when all of the network communication links between two or more cluster nodes fail. In these cases, the cluster may be split into two or more partitions1 that cannot communicate with each other. The cluster service guarantees that even in these cases, a resource is brought online only on one node. If the different partitions of the cluster each brought a given resource online, it would violate the cluster guarantees and potentially cause data corruption. When the cluster is partitioned, the quorum resource is used as an arbiter. The partition that owns2 the quorum resource is allowed to continue. The other partitions of the cluster are said to have lost quorum, and the cluster service and any resources hosted on the nodes not part of the partition that has quorum, are terminated.

The quorum resource is a storage-class resource and, in addition to being the arbiter in a split-brain scenario, is used to store the definitive version of the cluster configuration. To ensure that the cluster always has an up-to-date copy of the latest configuration information, the quorum resource must itself be highly available. In Windows NT Server 4.0 and Windows 2000 Server, the quorum device is typically a shared disk or physical disk resource type.3

This architecture limits the types of cluster configurations that can be deployed.

In Windows NT 4.0 and Windows 2000, MSCS does not support non-shared disk cluster configurations; that is, there must always be at least one shared disk in the cluster to act as the quorum resource. (Note: In this document, a shared disk is defined as any disk that is physically attached to multiple computers; it does not imply anything about how data is accessed on those disks).4

Although the quorum resource may be a highly available resource (such as a mirrored disk or a RAID set), there is a perception that this architecture has a single point of failure. There is a single resource and if it becomes unavailable, the cluster cannot function.

In Windows Server 2003, the goal is to enhance the types of configuration that can be deployed using MSCS. In particular, the following scenarios are addressed:

  • Geographically dispersed cluster-In a cluster that spans multiple sites, the notion of quorum as a single shared-disk resource means that the storage subsystem has to interact with the cluster infrastructure to provide the illusion of a single storage device with very strict semantics.

  • Clusters with no shared disks-There are some specialized configurations that need tightly consistent cluster features without having shared disks.

  • Clusters that host applications that can failover, but where there is an application-specific way to keep data consistent between nodes. Two possible examples are: database log shipping for keeping database status up to date and file replication for relatively static data.

  • Clusters that host applications that have no persistent data, but need to cooperate in a tightly coupled way to provide consistent volatile state.

  • Clusters that want the semantic of using a persistent store for applications but do not want to add to the complexity of doing the same for the quorum device. In such cases the application data will be stored on the shared disk but the quorum information could be located a local disk of every node.