What to Do If You Lose Quorum (Server Clusters: Majority Node Set Quorum)

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP2

There are cases where a cluster must be allowed to continue even if it does not have quorum. Consider the case of a geographically dispersed cluster with four nodes at the "primary" site and three nodes at the "secondary" site. While there are no failures, the cluster is a 7-node cluster where resources can be hosted on any node, on any site (depending on business needs). If there is a communications failure between the sites or if the secondary site is taken offline (or fails), the primary site can continue since it will still have quorum. All resources will be re-hosted and brought online at the primary site.

In the event of a catastrophic failure of the primary site, however, the secondary site will lose quorum and therefore all resources will be terminated at that site. One of the primary purposes for having a multi-site cluster is to survive a disaster at the primary site; however, the cluster software itself cannot make a determination about the state of the primary site. The cluster software cannot differentiate between a communications failure between the sites and a disaster at the primary site. That must be done by manual intervention. In other words, the secondary site can be forced to continue even though the cluster software believes it does not have quorum. This is known as forcing quorum.

Because this mechanism is effectively breaking the semantics associated with the quorum replica set, it must only be done under controlled conditions. In the example above, if the secondary site and primary site lose communication and an administrator forces quorum at the secondary site, resources will be brought online at BOTH sites, thus allowing the potential for inconsistent data and/or data corruption in the cluster.

Forcing quorum is a manual process that requires the following steps

Stop the cluster service ON ALL of the remaining nodes using cluster administrator.

The cluster service must be told which nodes should be considered as having quorum. This can be done in one of two ways:

Setup ForceQuorum registry key ON ALL remaining nodes in the cluster under

HKLM\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\ForceQuorum

This is a REG_SZ key that should be setup to contain a comma separated list of the names of the nodes that are to have quorum. The key is case insensitive. So, in the above example, if the secondary site contains "Node5", "Node6" and "Node7", then the ForceQuorum registry key should be setup as

"Node5,Node6,Node7"

Note

There should be no spaces in the key (except where there are spaces in the node names themselves).

Once the registry keys are set on all nodes, the cluster service can be started on those nodes.

Setup the cluster service startup parameters ON ALL remaining nodes in the cluster. This is done by starting up the services control panel, selecting the cluster service and entering the following into the "start parameters" option:

/forcequorum <node list>

In the above example, if the secondary site contains "Node5", "Node6" and "Node7", then the cluster service start parameter should be set to:

/forcequorum Node5,Node6,Node7

The cluster service MUST be started by clicking the START button on the service control panel, you must not hit OK or Apply first as this does not preserve the parameters.

Note

Any command line parameters over-ride the registry setting, however, the command line parameters do NOT persist a reboot, and therefore, setting the registry key is the preferred mechanism for forcing quorum.

The cluster service will now start up on those nodes that are considered part of the quorum set and resources will be brought online.

Special care must be taken if and when the primary site comes back since the nodes are configured as part of the cluster.

Do NOT reboot the cluster nodes at the primary site

Stop the cluster service ON ALL of the cluster nodes

Remove the registry key setting or the cluster service startup parameters set to force quorum

Startup the cluster service on all of the nodes at the secondary site

Boot the nodes at the primary site

Note

The cluster service on all nodes NOT in the force quorum node list must remain stopped until the force quorum information is removed. Failure to do so can lead to data inconsistencies OR data corruption.

While a cluster is running in the force quorum state, it is fully functional. For example, nodes can be added or removed from the cluster; new resources, groups etc. can be defined.