Other Considerations (Server Clusters: Majority Node Set Quorum)

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP2

There is a single quorum of node resources in the cluster and it is brought online on one node at any one time, just like any other cluster resource. The majority node set resource is responsible for ensuring that the quorum data is kept consistent on all disks around the cluster. When a cluster is setup to have a majority node set quorum resource or when a node is added to a cluster with a majority node set, a file share is created on that node. As described above, each node in a majority node set cluster has a file share that exports the quorum directory, so that, regardless of where the majority node set resource is hosted, it can write to all of the members of the majority node set.

The share is NOT under cluster server control (since this would lead to a chicken and egg situation: the cluster needs the share, the share needs the cluster). In other words, the share is available and visible when the node is booted, NOT when the cluster service starts. The share will be used in quorum calculations when it is visible, NOT when the node joins the cluster, so a node that is configured to be in the cluster is contributing to the quorum calculation for the cluster even if the cluster service on that node is stopped. This is VERY important to consider and it can have an impact on operational procedures, particularly when changing the number of nodes in a cluster.

Consider the following case of a 4-node cluster:

Node 1 is booted but the cluster service is not running

Nodes 2, 3, and 4 are fully operational and running in a cluster.

The administrator then evicts nodes 2 and 3 from the cluster, thus the cluster now contains two nodes (1 and 4). Node 4 still has quorum since it has its own share and also has access to the share on Node 1 which is configured as part of the cluster but not running the cluster. If the cluster service on node 1 starts at this point, it will join the existing cluster (containing Node 4) just fine, Node 1's local quorum information stored in the registry will be updated to reflect the new membership. If on the other hand, the cluster service on Node 4 is stopped, and then the cluster service on Node 1 is started, Node 1 will NOT be able to startup and form a cluster. This is because Node 1 tries to form a cluster based on the quorum information it has in its registry (NOT the quorum database on the majority node set). It still believes that Nodes 1, 2, 3, and 4 are members and therefore it will not start until it has quorum.

This situation can be corrected by using the force quorum mechanism above. The force quorum mechanism allows the cluster service on Node 1 to startup at which point it will refresh its local registry state with that from the majority node set.

Using MNS in single node clusters-Using MNS as a quorum device on single node clusters is fully supported. There are some startup implications a user needs to consider when choosing this model. As explained before, MNS depends on LanManServer and LanManWorkstation to initialize. These dependencies have not been explicitly created in Service Control Manager. As such when Cluster service initializes if either of the two services are not ready Cluster services will fail and an entry denoting the same will be logged in Event Log. However the default recovery behavior for Cluster services has been set to 1 minute. This means at it will try and restart every minute and will eventually succeed when the dependant services come online. This might mean that there could be a lag time in Cluster services and clustered application to come online even after the operating system has booted successfully. This behavior is slightly different as compared to a single node cluster that has a local quorum device or a quorum device located on a share disk. This will be particularly noticed if the cluster is hosting file share resources that contain a large number of shares (2000 or more).

Using MNS as a quorum device in File and Print clusters-The conditions mentioned above are also applicable to larger clusters (greater than two) that choose to use MNS as a quorum device. In a cluster that hosts a large number of shares we will continue to have similar startup and failover issues. Although LanManServer has been optimized to a large extent in Windows Server 2003, it will still take some time (anywhere from two to 16 minutes) if a large number of file shares (2000 to 20,000) are hosted in the cluster. The delay will be seen on the node that will be hosting the group containing the file share resource and it the node also acts as cannon for a majority then cluster functioning could be temporarily affected.

1 A partition is defined as a set of cluster nodes that can all communicate with each other.

2 How a partition becomes the owner of the quorum resource is beyond the scope of this discussion.

3 This is the only resource type supported as a quorum resource in the product shipped as part of Windows 2000. Other vendors have supplied alternative quorum resource types, but they are still typically associated with disks on a shared storage bus.

4 MSCS does have the notion of local quorum, which can be used as a debugging tool; however, it is not recommended as a production quorum resource (see KB article 245626).