Running the HPC Pack Head Node in a Failover Cluster
Updated: August 6, 2013
Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2, Windows HPC Server 2008 R2
This section provides information about running HPC Pack in a failover cluster, and it describes the failover process.
When a failover cluster server within an HPC cluster fails, the specific services that are supported by that server begin to run on another server in that failover cluster. The steps in failing over are as follows:
Detection: A failure is detected.
Failover: The head node fails over to another server in the failover cluster.
Client reconnect: Following a failure, clients reconnect. For the head node, this means that job scheduler clients reconnect to the HPC Job Scheduler Service on the server that is now the head node. The actual location of the service (on a server in the failover cluster) does not matter, because it appears to the clients under one consistent name offered by the failover cluster. Management clients will retry until they can reconnect to the HPC Management Service.
The servers in a failover cluster monitor one another through periodic network signals, called heartbeats. If a server misses five heartbeats by default, communication with that server is considered to have failed. You can use Failover Cluster Manager to configure the thresholds at which a server is considered to have failed.
You can also configure failover and failback settings in Failover Cluster Manager, but we recommend that you prevent failback unless you have a specific reason to allow it. By definition, failback causes the head node to return to running on a preferred physical server when possible. However, failback also causes a brief interruption in service. Preventing failback therefore decreases interruptions in service.
Failover Clustering also monitors some of the services (for example, the HPC Job Scheduler Service on the head node) to ensure that they are running. For detailed information about which services are monitored, see the tables that are at the end of the following topics:
The head node and one or more WCF broker nodes in failover clusters: See the table at the end of Overview of Microsoft HPC Pack and SOA in Failover Clusters.
Only the head node in a failover cluster: See the table at the end of Overview of Configuring the HPC Pack Head Node for Failover.