Backing up and restoring server clusters

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

 

Backing up and restoring server clusters

Performing regular backups of your server cluster is imperative for high availability. This topic explains how you can use the Backup or Recovery Wizard to back up cluster nodes, describes ten cluster failure scenarios, and offers data restore solutions for each scenario using the Backup or Restore Wizard and recovery utilities from the Microsoft Windows Server 2003 Resource Kit.

For more information on backup and restore procedures, see Backing up and restoring data.

Backing up cluster data

In a server cluster, there are four groups of data critical to the proper operation of the cluster; the disk signatures and partitions of the cluster disks, the cluster quorum data, the data on the cluster disks, and the data on the individual cluster nodes.

  • Cluster disk signatures and partitions

  • Cluster quorum data

  • Data on the cluster disks

  • Data on the individual cluster node

Cluster disk signatures and partitions

Before you begin to back up any data on the server cluster nodes, make sure you backup the cluster disk signatures and partitions using Automated System Recovery in the Backup Wizard. This step is necessary if you later need to restore the signature of the quorum disk, for example, if you experience a complete system failure, and the signature of the quorum disk has changed since you last backed up.

Note

  • By default, Backup Operators do not have the user rights necessary to create an Automated System Recovery (ASR) backup on a cluster node. However, Backup Operators can perform this procedure if that group is added to the security descriptor for the Cluster service. You can do that using Cluster Administrator or cluster.exe. For more information, see Give a user permissions to administer a cluster and Cluster.

For information, see Back up cluster disk signatures and partition layouts.

Cluster quorum data

When you back up data on a server cluster node, make sure you also back up the cluster quorum. The cluster quorum is important because it contains the current cluster configuration, application registry checkpoints, and the cluster recovery log.

You can use the Backup Wizard to back up the cluster quorum data if you perform a System State backup from any node provided the Cluster service is running on that node.

For information, see Back up the cluster quorum.

Data on the cluster disks

To back up all cluster disks owned by a node, perform a full backup from that node.

You can also back up this data through a network connection to a hidden administrative file share. For example, you might use the New Resource Wizard to create FBackup$, GBackup$, and HBackup$ file shares for the root of drives F, G, and H, respectively. These shares would not appear in the browse list and could be configured to allow access only to members of the Backup Operators group.

For information on backing up data on the cluster disks, see Back up data on cluster nodes.

Important

  • If a cluster disk owned by the node being backed up fails over to another node during the backup process, the backup set will not contain a full backup of that disk.

Important

  • You can only backup a cluster disk on a local node. You cannot backup a cluster disk on a remote computer.

Data on the individual cluster nodes

After you back up the cluster quorum disk on one node, it is not necessary to back up the quorum on the remaining cluster nodes. However, you may want to back up the clustering software, cluster administrative software, system state, and application data on the remaining nodes.

Important

  • If you back up the system state for a node, you will also automatically back up the quorum data as long as the Cluster service is running on that node.

For information on backing up data on individual cluster nodes, see Back up data on cluster nodes.

Cluster failure and restore scenarios

This section describes ten failure scenarios that will require restoring your cluster. The type of failure you experience determines the steps you must follow.

  • Scenario 1—Cluster Disk Data Loss

  • Scenario 2—Cluster Quorum Corruption

  • Scenario 3—Cluster Quorum Loses Checkpoints

  • Scenario 4—Cluster Disk Corruption or Failure

  • Scenario 5—Cluster Quorum Disk Failure

  • Scenario 6—Single Cluster Node Corruption or Failure

  • Scenario 7—Cluster Quorum Rollback

  • Scenario 8—Complete Cluster Failure

  • Scenario 9—Majority Node Set Cluster Failure

  • Scenario 10—Application Data Loss in a Server Cluster

Scenario 1—Cluster Disk Data Loss

If you have lost files and folders on one of your cluster disks, but not on the disk containing the cluster quorum, you can use the Backup or Restore Wizard to restore that data.

Important

  • You must restore the cluster disk data from the node that owns the cluster disk.

For information, see Restore files from a file or a tape.

Scenario 2—Cluster Quorum Corruption

Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online.

If this problem results from corrupted files on the quorum disk, try starting the Cluster service by opening a command prompt and typing net start clussvc /resetquorumlog. This creates a new quorum log file, using information stored in the cluster database on the local node. For additional information about recovering from cluster quorum corruption, see Recover from a corrupted quorum log or quorum disk. If the cluster quorum disk needs to be replaced, see Scenario 5, below. For a majority node set cluster, see Scenario 9, below.

Scenario 3—Cluster Quorum Loses Application Checkpoints

Symptom: Some resources fail to come online and the application checkpoints are out of date.

If you have recovered from quorum corruption by creating a new quorum log as described in Scenario 2 above, you may need to restore the matching checkpoints before the quorum resource can come back online.

Using the Microsoft Windows Server 2003 Resource Kit tools
  • Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.
Using Windows Server 2003 family tools

Scenario 4—Cluster Disk Corruption or Failure

Symptom: A cluster disk cannot come online. Resources that depend on that cluster disk will not be able to come online.

First, see if you can run a diagnostic utility from the disk manufacturer to determine the condition of the disk. If the cluster disk is corrupted or the disk hardware fails, you can restore the disk more quickly by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still restore your cluster disk using the Backup and Recovery utilities included with Windows Server 2003 family operating systems.

Using the Microsoft Windows Server 2003 Resource Kit tools
  • Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.

  • Use NTBackup along with the Confdisk utility (from the Microsoft Windows Server 2003 Resource Kit) to restore the data on the cluster disk. For more information, see the Microsoft Windows Server 2003 Resource Kit.

Using Windows Server 2003 tools
  • If necessary, replace the cluster disk. For information, see Install local storage buses and devices.

  • Stop the Cluster service on all nodes of the cluster.

  • Locate the data backup set for the node that owns the cluster disk. Also, locate the Automated System Recovery backup set for that node, if it is available. Perform an Automated System Recovery restore on a node. Use ASR as a last resort in system recovery, only after you have exhausted other options. For more information, see Restore a damaged cluster node using Automated System Recovery.

  • After the restored node comes back online, restart the Cluster service on the remaining nodes.

Scenario 5—Cluster Quorum Disk Failure

Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online. Entries in the Event Log indicate hardware failures.

First, try starting the Cluster service by opening a command prompt and typing net start clussvc /fixquorum. This starts the Cluster service with all resources offline, including the quorum resource. Then you can try switching to a new quorum resource, with or without using the Clusterrecovery utility in the Windows Server 2003 Resource Kit. For more information, see Fixquorum command.

If the cluster quorum disk (the disk containing the quorum resource) fails, you can replace it more quickly by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still replace your cluster quorum disk using the backup and restore utilities shipped with Windows Server 2003 family operating systems.

Using the Microsoft Windows Server 2003 Resource Kit tools
  • Use NTBackup along with the Confdisk utility (from the Microsoft Windows Server 2003 Resource Kit) to restore the data on the cluster disk.

  • Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.

Using Windows Server 2003 family tools

Scenario 6—Single Cluster Node Corruption or Failure

Symptom: The node cannot join the cluster.

If the Event Log indicates that the cluster database on the local node is corrupted, you can perform a System State restore on that node to replace the local cluster database. For information, see Restore the cluster database on a local node. Alternatively, you can copy the latest checkpoint file (CHKxxx.TMP) from the quorum disk to the %systemroot%\Cluster\ directory, rename it as file CLUSDB, and restart the Cluster service on that node.

If a single node fails in the cluster due to system disk or other hardware failure, follow these steps to rebuild the node and rejoin the cluster:

Scenario 7—Cluster Quorum Rollback

If recent changes to your cluster have resulted in the cluster not functioning as expected, you can use the Backup or Restore Wizard to roll back your cluster to a previous configuration. For example, if a number of resources have mistakenly been deleted from the cluster configuration, you can roll it back, using a backup that contains those resources.

For information, see Restore the contents of a cluster quorum disk for all nodes in a cluster.

Scenario 8—Complete Cluster Failure

Symptom: None of the nodes can boot up.

If all nodes fail in a cluster and the quorum disk cannot be repaired, follow these steps:

  • Use Automated System Recovery on one node in the original cluster, choosing a node that was backed up recently and that was active in the cluster at the time it was backed up. This restores the disk signatures, the partition layout of the cluster disks (quorum and nonquorum), and the cluster configuration data. Do not start other nodes until the first node is restored. For more information, see To Restore a damaged cluster node using Automated System Recovery.

  • Restore other nodes. For more information, see Restore a damaged cluster node using Automated System Recovery.

  • Restore your applications and application data from backup data sets.

Important

Scenario 9—Majority Node Set Cluster Failure

The methods for restoring a majority node set cluster are the same as for restoring other clusters. However, in a majority node set cluster, if some of the nodes fail, and the cluster loses quorum, you can force the remaining nodes to form a quorum and restart the cluster. For more information, see To Force quorum in a majority node set server cluster.

Note

  • On a majority node set cluster, the cluster database is not stored on a cluster disk central to all nodes, but is instead stored locally on each node at %systemroot%\Cluster\MNS.%ResourceGUID%$\%ResourceGUID%$\MSCS\.

Scenario 10—Application Data Loss in a Server Cluster

When restoring application data in a server cluster, follow the instructions provided in the documentation that shipped with your application.

Important

  • If you are backing up Microsoft Exchange Server, a newer version of NTBackup.exe may be available from the Exchange Server section of the Microsoft Web site. Otherwise, you can use the version of NTBackup.exe that is included with Windows Server 2003 family operating systems.