Running the Head Node in a Failover Cluster with Windows HPC Server 2008 R2
Updated: September 23, 2010
Applies To: Windows HPC Server 2008 R2
This section provides information about running Windows HPC Server 2008 R2 in a failover cluster, and it describes the failover process.
The failover process
When a failover cluster server within an HPC cluster fails, the specific services that are supported by that server begin to run on another server in that failover cluster. The steps in failing over are as follows:
Detection: A failure is detected.
Failover: The head node fails over to another server in the failover cluster.
Client reconnect: Following a failure, clients reconnect. For the head node, this means that job scheduler clients reconnect to the Job Scheduler on the server that is now the head node. The actual location of the service (on a server in the failover cluster) does not matter, because it appears to the clients under one consistent name (offered by the failover cluster). Management clients will retry until they can reconnect to a management service.
Failure detection in a failover cluster
The servers in a failover cluster monitor one another through periodic network signals, called heartbeats. If a server misses five heartbeats, communication with that server is considered to have failed. You can configure the thresholds at which a server is considered to have failed in the Failover Cluster Manager snap-in.
You can also configure failover and failback settings in Failover Cluster Manager, but we recommend that you prevent failback unless you have a specific reason to allow it. By definition, failback causes the head node to return to running on a preferred physical server when possible. However, failback also causes a brief interruption in service. Preventing failback therefore decreases interruptions in service.
Failover clustering also monitors some of the services (for example, the HPC Job Scheduler Service on the head node) to ensure that they are running. For detailed information about which services are monitored, see the tables that are at the end of the following topics:
The head node and one or more WCF broker nodes in failover clusters: See the table at the end of Overview of Windows HPC Server 2008 R2 and SOA in Failover Clusters (http://go.microsoft.com/fwlink/?LinkId=198304).
Only the head node in a failover cluster: See the table at the end of Overview of Configuring the Head Node for Failover with Windows HPC Server 2008 R2 (http://go.microsoft.com/fwlink/?LinkId=198290).
Configuring Windows HPC Server 2008 R2 for High Availability with SOA Applications (http://go.microsoft.com/fwlink/?LinkId=198300)
Configuring Windows HPC Server 2008 R2 for High Availability of the Head Node (http://go.microsoft.com/fwlink/?LinkId=198285)