Guidelines for Backing Up a Windows HPC Server 2008 R2 Cluster

Updated: July 2011

Applies To: Windows HPC Server 2008 R2

This section provides an overview of the guidelines and methods for backing up a Windows HPC Server 2008 R2 cluster. This section focuses on backing up the data that is stored in the HPC databases (on the head node or on remote servers that are running at least Microsoft® SQL Server® 2008 SP1) and backing up the cluster configuration settings on the head node. If this data is regularly backed up, it is generally possible to restore a cluster to normal operations with minimal disruptions, after a hardware or software failure on the head node or on a remote server for the HPC databases. For more information about restoring a Windows HPC Server 2008 R2 cluster in different situations, see Scenarios for Restoring a Windows HPC Server 2008 R2 Cluster in this Back Up and Restore guide.

Note
Often the existing cluster nodes other than the head node can continue to operate after a cluster or head node is restored, or they can be easily redeployed by using the data that is stored in the HPC databases and the cluster configuration settings.

In this section:

  • Important cluster data and configuration settings

  • Backup methods

Important cluster data and configuration settings

The critical data for the cluster is stored in the HPC databases and in configuration files and settings on the head node.

HPC databases

The HPC databases in the following table store data that is critical to the operation of Windows HPC Server 2008 R2 and should be backed up regularly. The HPC databases are much more dynamic than other cluster data. The databases are changing continuously while jobs are submitted and run on the cluster.

Database Default name Data

Cluster management database

HPCManagement

Cluster users, network configuration, nodes, node groups, node templates, operations history, performance counter metric history

Job scheduling database

HPCScheduler

Nodes, job templates, job history, scheduler configuration

Diagnostics database

HPCDiagnostics

Results of diagnostic tests

Reporting database

HPCReporting

Raw and aggregated reporting data

In Windows HPC Server 2008 R2, the cluster databases can be installed on the head node or on remote servers that are running SQL Server. When the databases are installed on the head node, they are installed by default in the COMPUTECLUSTER instance in SQL Server. The database files are located by default in the %PROGRAMFILES%\Microsoft HPC Pack 2008 R2\SQLDB folder.

Cluster configuration settings

In addition to the data that is stored in the HPC databases, the following table lists important cluster configuration settings and data that are stored on the head node.

Important
This list is not exhaustive, but it indicates settings that are important to restore in many environments. Many settings depend on or affect the data that is stored in the HPC databases. Not all items apply to all clusters.
Item Location Notes

Store of files for node setup, including operating system images and drivers

REMINST file share, including the REMINST\setup\images and REMINST\setup\drivers folders

Configuration files for service-oriented architecture (SOA) services

HpcServiceRegistration file share

Present if SOA services are installed. The DLLs for the SOA services that are specified in the service registration files are stored separately according to the preferences of the cluster administrator, and they should also be backed up.

Output spool share for compute nodes

CcpSpoolShare file share

Results of diagnostic tests

Diagnostics file share

Submission and activation filters

Paths configured either in the job scheduler configuration options, or by job template (in Windows HPC Server 2008 R2 SP2 or later)

Present if installed by the cluster administrator

Registry settings

HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\HPC

Custom diagnostic tests

%CCP_HOME%bin\DiagTests folder

Present if installed by the cluster administrator

Other customizable files

Folders under %CCP_HOME%

Includes CcpPower.cmd, startnet.cmd, unattend.xml, HpcSession.exe.config

Environment variables

On the head node: CCP_DATA, CCP_HOME, CCP_JOBTEMPLATE, CCP_SCHEDULER

Hosts file

%SystemRoot%\System32\drivers\etc\hosts)

Backup methods

To help to recover your cluster if the head node fails or if the HPC databases fail, you should regularly back up the full system of the head node and any remote database servers, the HPC databases (on the head node or on one or more remote servers), and the cluster configuration settings on the head node. The following table provides guidelines for these three backup types.

Backup type Description Recommended frequency of backups

Full server

A backup of all volumes so that you can recover the full server, including all the files, data, applications, and the system state. The system state includes the boot file, the COM+ class registration database, and the registry.

Make a full server backup at regular intervals, and before and after you make major configuration changes.

In a stable cluster, you can schedule full server backups once a week or even less frequently.

Databases

A full backup of each HPC database. Database backups represent the whole database at the time the backup finished.

The HPC databases should be backed up much more often than the full system state backup.

Back up all the HPC databases at the same time.

Depending on the activity level in your cluster, you might want to back up the databases daily, multiple times per day, or even use a continual backup method.

To ensure consistency between the databases, back up all of the databases after making configuration changes such as adding or deleting nodes.

Cluster configuration settings

A backup of all the file shares on the head node that contain configuration settings that are critical for the operation of your cluster.

Back up all of the configuration settings at the same time.

To ensure consistency between the databases and the configuration settings, back up the configuration settings at the same time that you back up the databases.

To create a backup of your head node, the HPC databases, and the cluster configuration settings, you can choose among several standard backup solutions, which include Microsoft and non-Microsoft backup and restore solutions. These methods include the following:

Backup solution More information

Windows Server Backup

Windows Server Backup

SQL Server Management Studio Backup

Microsoft System Center Data Protection Manager (DPM)

Non-Microsoft backup and recovery solutions, such as Symantec NetBackup

Consult the documentation of the vendor.

DISCLAIMER: Reference to any non-Microsoft products is intended solely for informational purposes and does not constitute or imply any endorsement by Microsoft.

Note
If your system is running at least Windows HPC Server 2008 R2 Service Pack 2, you can also use the Export-HpcConfiguration.ps1 and Import-HpcConfiguration.ps1 HPC PowerShell® scripts to back up and restore certain cluster configuration settings on the head node. Some of these cluster configuration settings, such as node templates and job templates, are stored in the HPC databases. These scripts can also be used to migrate critical configuration settings from one cluster to another—for example, to Mayntain cluster operations in case of a disaster. For more information, see Export and Import Windows HPC Cluster Configuration Settings in this Back Up and Restoreguide.
Important
Regardless of which method you choose for backup and restore, to bring the cluster to a consistent state, you need to perform additional steps when you restore the cluster. For an overview of the restore steps in several scenarios, see Scenarios for Restoring a Windows HPC Server 2008 R2 Cluster.

Example: Create a protection group for the cluster in DPM

You can use DPM to create a protection group for the cluster that includes the HPC databases (on the head node or on one or more remote computers) and the configuration settings that are stored in shared folders on the head node. For example, the protection group could include the following sources:

  • The SQL Server instance or instances that host the HPC databases

  • REMINST file share

  • HpcServiceRegistration file share

  • CcpSpoolShare file share

  • Diagnostics file share

The protection group could be expanded to include other data sources depending on the needs of the cluster administrator.

For information about creating a protection group in DPM, see Creating a Protection Group for File and Application Servers.

Additional references