What Is a Majority Node Set? (Server Clusters: Majority Node Set Quorum)
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP2
A majority node set is a single quorum resource, from a server cluster perspective; however, the data is actually stored on multiple disks across the cluster. Each cluster node stores the configuration on a local disk it can have access to when it starts up. By default, the location is pointed to %systemroot%\cluster\ResourceGUID. It is recommended that users user the default location unless there is an overriding reason not to.
The cluster service ensures that the cluster configuration data stored on the majority node set is kept consistent across each cluster node. This allows cluster topologies as follows:
If the configuration of the cluster changes, that change is replicated across the different disks. The change is only considered to have been committed, that is, made persistent, if that change is made to a majority of nodes that have formed the cluster. Mathematically this equates to:
(<Number of nodes configured in the cluster>/2) + 1
This ensures that a majority of the nodes have an up-to-date copy of the data. This also means that there is a change in how the cluster service starts up on each node, since the cluster service is set to automatic startup on boot. On the first node in the cluster, the cluster service will try to join a cluster by default. Since there is no cluster available at this stage it will not be able to and will stop. The default retry period is set to one minute so the cluster service will try to restart after every minute. This will successfully happen once a majority of the cluster nodes are up and running and can talk to each other through either the public or the private network. Network priority is honored, and each node will try to talk across that network that has been flagged to be used first for network communication. This behavior is the same as in vanilla clusters (clusters that use a device on the share storage interconnect for quorum).
Once the cluster service has started up cleanly on a majority of nodes, it will then bring all the resources online. If at any point in time the cluster loses a majority of nodes, the cluster services will cleanly shut down on all nodes leading to all the resources being terminated as well. This is a very important semantic that is different from the behavior customers are used to in vanilla clusters where the cluster service is running as long as there is at least one node up and running and it can access the quorum device.
In the case of a failure or a split-brain caused by a failure in communications between cluster nodes, all resources hosted on the partition of a cluster that has lost quorum are terminated to ensure that the resources are being run on only one node in a cluster . If a cluster becomes partitioned (that is, there is a communications failure between two sets of nodes in the cluster), any partitions that do not have a majority of the configured nodes (that is, have less than (n/2) + 1 nodes) are said to have lost quorum. This will lead to the cluster service and all resources hosted on the nodes of that partition to be terminated. This ensures that if there is a partition running that contains a majority of the nodes, it can safely start up any resources that are not running on that partition, safe in the knowledge that it can be the only partition in the cluster that is running resources (since all other partitions must have lost quorum). The diagram below depicts the behavior one will see. In the diagram, Nodes 4 and 5 have lost communication with Node 1, 2, and 3. This will cause the cluster services on Nodes 4 and 5 to terminate, causing the resources they hosted to move to Nodes 1 and 3.
In a vanilla server cluster, a cluster can continue as long as at least one of the nodes can access and own the quorum disk.
Any node that is a cluster member and can communicate (via heartbeats) with the quorum owner are part of the cluster and can host resources. Any other nodes that are configured to be in the cluster but cannot communicate with the quorum owner are said to have lost quorum, and thus any resources that they are hosting are terminated. When a node starts up, the cluster services can join an existing cluster (if there is one up and running), or create a cluster right away (if it can access the quorum device).
A cluster running with a majority node set quorum resource, on the other hand, will only start up or continue running if a majority of the nodes configured for the cluster are up and running and can all communicate with each other. The failure semantics of the cluster are different from a vanilla MSCS cluster. Because of this difference in how the cluster survives start up and on how the cluster behaves when nodes fail or get partitioned, care must be taken when deciding to host an application on a vanilla server cluster using a physical disk as quorum resource, or on a server cluster that uses a majority node set as a quorum resource.
The following sections cover how the quorum disk resource can be setup and what happens in the event that quorum is lost.