Files on the Cluster Quorum Might be Missing, Inaccessible, or Corrupt

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

To understand this topic in context, see the flowchart in Troubleshooting Quorum Resource Problems.

If you can start the Cluster service only when you use the /fixquorum option but then you succeed at bringing the quorum resource online (with Cluster Administrator), there is probably a problem with the configuration files on the quorum resource, as described in the "Cause" description in this topic. You can examine the files on the quorum resource and replace them if necessary.

Cause

There are a variety of possible causes. Like any files, the cluster configuration files on the quorum resource can become corrupted. Also, the files might be missing or they might be impossible to write to, for example, because of incorrect permissions or because the disk is full.

Solution

Use Windows Explorer to examine the files on the quorum resource. The files will be in a folder, almost always called MSCS, and in subfolders of that folder. Check the permissions on the files and the folder, and check whether the quorum disk is full. Then use the following tables to help you determine if any files are missing or corrupt. The procedures later in this topic can help you replace files as appropriate.

Files That Should be Present on the Quorum Resource

File Description

Chkxxx.tmp

Copy of the cluster configuration database (also known as a checkpoint file, but not to be confused with registry checkpoints for resources). If the file contains 0 bytes, it is definitely corrupt. Even if it contains more than 0 bytes, it could be corrupt.

If you have the original release of Windows Server 2003, the Cluster service will not start if the Chkxxx.tmp file is missing or corrupt. However, if you have Windows Server 2003 with SP1, in many situations the Cluster service can automatically re-create this file if it is missing or corrupt.

Quolog.log

The quorum log, which records changes to the cluster configuration database, but only those changes that occur while one or more nodes are down. The file exists even when all nodes are functioning, but information is added to it only when a node is offline. Information in the log is carefully marked according to sequence and timing so that it can be used correctly when nodes go down and come back up.

If you have the original release of Windows Server 2003, the Cluster service will not start if Quolog.log is missing or corrupt. However, if you have Windows Server 2003 with SP1, in many situations the Cluster service can automatically re-create this file if it is missing or corrupt.

Folders and Files That Might be Present on the Quorum Resource

File Description

Folder called {GUID}

Folder for each resource that is configured with registry checkpoints or cryptographic key checkpoints. The resource GUID is the name of the folder.

{GUID}\*.cpt

Checkpoint files for resources configured with registry checkpoints.

{GUID}\*.cpr

Checkpoint files for resources configured with cryptographic key checkpoints.

Use the following procedures to replace a missing or corrupted file.

To replace a missing or corrupt Chkxxx.tmp file or quorum log or both on the quorum resource

  1. If the Cluster service is running (with the /fixquorum option), stop it by typing:

    net stop clussvc

  2. On a node that was functioning correctly when problems with the quorum resource appeared, restart the Cluster service with the /resetquorumlog option by typing:

    net start clussvc /rq

    This re-creates the Chkxxx.tmp file based on the cluster configuration information on that node.

    You can accomplish the same action through Computer Management by opening Services and then, for the Cluster service, opening the Properties dialog box. From the General tab, stop the Cluster service if needed, specify /resetquorumlog in Start parameters, and then start the service with that parameter. (The parameter is used only once and does not persist.)

  3. If the previous steps are not workable for you, and you have a backup containing quorum information (that is, a recent System State backup created on a node while the Cluster service was running), start Backup on a node. Before restoring the System State, in the Confirm Restore dialog box, click Advanced, and then click Restore the Cluster Registry to the quorum disk and all other nodes.

    Important

    Be sure to click this advanced option, or information will not be restored to the quorum resource, only to the node.

  4. If you have resources that are configured to use registry checkpoints or cryptographic key checkpoints and you think that these checkpoint files are also corrupt, see the next procedure.

To replace corrupt registry checkpoint files or cryptographic key checkpoint files on the quorum resource

  1. Correct other problems with the quorum resource and start the Cluster service without options.

  2. Run Clusterrecovery.exe, which is described in Configuring a Computer for Troubleshooting the Quorum Resource in a Server Cluster. Use Clusterrecovery to recover the resource checkpoint files, referring to Clusterrecovery Help as needed.

  3. If the previous steps are not workable for you, and you have a recent backup of your registry checkpoint files or cryptographic key checkpoint files (that is, *.cpt or *.cpr files), restore from backup. To do this:

    1. Correct other problems with the quorum resource and start the Cluster service without options. Identify the resources that are configured to use registry checkpoints or cryptographic key checkpoints and take them offline.

    2. Identify the node that currently owns the quorum resource. On that node, restore the *.cpt or *.cpr files to the correct folder on the quorum resource. Refer to the table earlier in this topic for information about where the files should be restored.