Overview of Windows HPC Server 2008 R2 and SOA in Failover Clusters

Applies To: Windows HPC Server 2008 R2

This guide provides procedures and guidance for deploying Windows HPC Server 2008 R2 in failover clusters where the servers are running Windows Server 2008 R2. The guide describes how you can configure the head node in a failover cluster and then configure Windows Communication Foundation (WCF) broker nodes in separate failover clusters. This topic in the guide provides an overview of the configuration. For a detailed list of requirements for the configuration, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

In this section

Overview

Overview of node and storage configuration requirements

Services and resources during failover of the head node

Overview of naming recommendations for WCF broker nodes

Overview

If you want to provide high availability in an HPC cluster that uses applications based on service-oriented architecture (SOA), you can configure your head node in a failover cluster and then configure WCF broker nodes in separate failover clusters. The failover clusters contain servers that work together, so if one server in a failover cluster fails, another server in that cluster automatically begins providing service (in a process known as failover).

Important

The word “cluster” can refer to a head node with compute nodes and WCF broker nodes running software in Windows HPC Server 2008 R2, or to a set of servers running Windows Server 2008 R2 that are using the failover cluster feature. The word “node” can refer to a head node, compute node, or WCF broker node running software in Windows HPC Server 2008 R2, or to one of the servers in a failover cluster. In this document, servers in the context of a failover cluster are usually referred to as “servers,” to distinguish failover cluster nodes from an HPC cluster head node or compute node. Also, the word “cluster” is placed in an appropriate phrase (such as “failover cluster”) or used in context in a sentence to distinguish which type of cluster is being referred to.

Each of the servers in a failover cluster must have access to the failover cluster storage. Figure 1 shows the failover of head node services that can run on either of two servers in a failover cluster:

Failover of head node services in HPC cluster

Figure 1   Failover of head node services in HPC cluster

To provide high availability in an HPC cluster that uses applications based on service-oriented architecture (SOA), you configure the head node in a failover cluster and you also configure one or more additional failover clusters. To support the head node, you must also configure a SQL Server, either as a SQL Server failover cluster (for higher availability) or as a standalone SQL Server. You must configure failover WCF broker nodes in separate failover clusters. Figure 2 shows a configuration that includes three failover clusters: one failover cluster runs SQL Server (to support the head node), one failover cluster runs the head node, and one failover cluster runs WCF broker nodes. Two of the failover clusters have two nodes each, but the failover cluster that runs WCF broker nodes contains three nodes, two active and one passive (a passive node is inactive, but it is ready to provide service if needed).

Failover clusters within an HPC cluster

Figure 2   Failover clusters supporting head node, SQL Server, and WCF broker nodes

In the preceding figure (Figure 2), the failover cluster storage for the head node includes one disk (LUN) for a clustered file server and one disk as a disk witness. The disk witness is necessary for any failover cluster that has an even number of nodes (this head node failover cluster has two). For the WCF broker nodes, the failover cluster storage is needed because these nodes use Message Queuing (also known as MSMQ). One instance of Message Queuing runs on each active WCF broker node, and each instance requires a disk in storage, so the failover cluster for the WCF broker nodes requires two disks in storage. That failover cluster has an odd number of nodes; therefore, it does not include a disk witness in the storage.

For clarity, Figure 2 shows a limited number of servers. Figure 3 shows more servers. As shown in Figure 3, in addition to having a failover cluster for the head node in your HPC cluster and a SQL Server that supports the head node, you can have multiple failover clusters for WCF broker nodes. In the same HPC cluster, you can also have individual WCF broker nodes, if those nodes serve only broker sessions that do not provide durable messages.

Failover clusters within an HPC cluster

Figure 3   Multiple failover clusters and an additional WCF broker node

After you configure a WCF broker node to run in a failover cluster, if you decide to change it back to a broker node running on a standalone server, you must uninstall and then reinstall the broker node.

When the head node and broker nodes are in failover clusters, multiple failover clusters are required. Figure 4 illustrates that when you configure multiple failover clusters, you must limit the exposure of each storage volume or logical unit number (LUN) to the nodes in one failover cluster:

Failover clusters with no overlap of LUNs

Figure 4   Two failover clusters, each with its own LUNs

Note that for the maximum availability of any server, it is important to follow best practices for server management—for example, carefully managing the physical environment of the servers, testing software changes before fully implementing them, and carefully keeping track of software updates and configuration changes on all servers in a failover cluster.

Back to top

Overview of node and storage configuration requirements

The following overview of node and configuration requirements provides supporting details for Figures 1, 2, and 3.

Important

For detailed information about hardware, software, and networking requirements for running Windows HPC Server 2008 R2 in a failover cluster, see Requirements for Windows HPC Server 2008 R2 in Failover Clusters.

The following list provides a broad overview of the configuration requirements for deploying Windows HPC Server 2008 R2 in failover clusters where the servers are running Windows Server 2008 R2.

  • If you choose to configure WCF broker nodes in failover clusters, you must configure the head node in a separate failover cluster.

  • As shown in the preceding figures, each failover cluster requires disks (LUNs) in cluster storage, and every node in a given failover cluster must be connected to the same disks as the other nodes in that failover cluster. For each failover cluster, two or three disks in cluster storage are usually required:

    • In the failover cluster that supports the head node, one disk in cluster storage is the disk witness, which is necessary for a failover cluster with an even number of nodes. Another disk is used for the clustered file server that is a part of the head-node configuration.

    • In the failover cluster that supports one or more WCF broker nodes, each active node requires a disk in storage (to support the instance of Message Queuing running on that active node). If the failover cluster that supports a WCF broker node has an even number of failover cluster nodes, it requires an additional disk in cluster storage for the disk witness.

  • Figures 2 and 3 show multiple failover clusters that support WCF broker nodes. The maximum number of servers and failover clusters are greater than shown in the figures. You can have as many as 8 failover clusters that support WCF broker nodes within an HPC cluster. In each these failover clusters, you can have a minimum of 2 nodes and a maximum of 16 nodes.

  • As shown in Figure 3, you can mix WCF broker nodes that are in failover clusters with individual WCF broker nodes that are not in failover clusters.

  • In HPC clusters where you want to configure high availability for WCF broker nodes, when you choose the network topology, we recommend either Topology 2 or Topology 4 (the topology shown in Figures 1, 2, and 3). In these topologies, there is an enterprise network and at least one other network. Using multiple networks in this way helps avoid single points of failure. For more information about network topologies in Windows HPC Server 2008 R2, see Appendix 1: HPC Cluster Networking (https://go.microsoft.com/fwlink/?LinkId=198313) in the Design and Deployment Guide for Windows HPC Server 2008 R2.

  • For a failover cluster that runs a WCF broker node or nodes, choosing the ratio of active nodes to passive nodes involves weighing several factors. You should consider the balance between your hardware budget, your availability requirements, and the performance requirements of your SOA applications. For example, for higher availability and performance but also higher hardware costs for your WCF broker nodes, you could choose to have one passive node for every active node in your failover clusters. For availability and performance that is decreased from that level but also costs less, you could have a failover cluster with more active nodes, for example, an eight-node cluster with seven active nodes and one passive node, where the failure of more than one node could impact performance and possibly also availability.

Back to top

Services and resources during failover of the head node

This section summarizes some of the differences between running the head node for Windows HPC Server 2008 R2 on a single server and running it in a failover cluster.

Important

  • In a failover cluster, the head node cannot also be a compute node or WCF broker node. These options are disabled when the head node is configured in a failover cluster.

  • For connections to a head node that is configured in the context of a failover cluster, do not use the name of a physical server. Use the name that appears in Failover Cluster Manager. To see the name in Failover Cluster Manager, in the appropriate failover cluster, expand Services and applications, select the clustered instance of the head node, and in the center pane, view the name under Server Name. After the head node is configured in a failover cluster, it is not tied to a single physical server, and it does not have the name of a physical server.

The following table summarizes what happens to each service or resource during failover of the head node:

Service or Resource What Happens in a Failover Cluster

HPC SDM Store Service

HPC Job Scheduler Service

HPC Session Service

HPC Diagnostics Service

Fail over to the other server in the failover cluster.

Four file shares that are used by the head node

Ownership fails over to the other server in the failover cluster.

DHCP

HPC Management Service

HPC MPI Service

HPC Node Manager Service

HPC Reporting Service

NAT

WDS

Start automatically and run on each individual server. The failover cluster does not monitor these services for failure.

File sharing for compute nodes

Fails over to the other server in the failover cluster if configured through the Failover Cluster Manager snap-in.

Note

The HPC Basic Profile Web Service and the HPC Storage Management Surrogate service are also installed on a head node (whether that head node is in a failover cluster or not). However, these services are not activated by default. For information about uses and requirements for the HPC Basic Profile Web Service, see HPC Server Basic Profile Web Service Operations Guide (https://go.microsoft.com/fwlink/?LinkId=198311).

Back to top

Overview of naming recommendations for WCF broker nodes

As the previous diagrams in this topic show, a head node in a failover cluster involves only two servers, but a configuration with WCF broker nodes in failover clusters can contain many physical servers, many failover clusters, and many clustered instances of WCF broker nodes (that is, instances of WCF broker nodes that can fail over from one server to another in a particular failover cluster). Because of this, we recommend that you plan your naming scheme carefully before you start to configure WCF broker nodes in failover clusters. You will need the following names for WCF broker nodes that run in a failover cluster:

  • A unique name for each physical server.

  • A unique name for each failover cluster.

  • A unique name for each clustered instance of a WCF broker node running in the failover cluster. Each of these clustered instances runs the HPC Broker service and Message Queuing, and they can fail over (or be moved) from one physical server to another in a particular failover cluster.

The two snap-ins that are mentioned most often in this document, HPC Cluster Manager and Failover Cluster Manager, provide somewhat different information. In HPC Cluster Manager, you see the names of physical servers. In Failover Cluster Manager, you see the names of physical servers (nodes in the failover cluster), and you also see the names of failover clusters and clustered instances within those failover clusters. When you view HPC Cluster Manager, you might want a straightforward way to tell which physical servers are related to a given failover cluster or to a given clustered instance inside that failover cluster. Because of this, we recommend that you carefully plan the naming scheme for your WCF broker nodes.

The following table shows an example of a naming scheme. In this example, the servers that are together in a failover cluster have names that are the same except for the last character. The example name for the failover cluster is similar, but it drops the last character and ends with Cluster. Similarly, the example names for the clustered instances end with Instance plus some digits. Shorter names would also work if you could readily tell from each name what it names and what it is related to.

Example names for physical servers in a failover cluster Example of a name for the failover cluster Example of names for two clustered instances that run in the failover cluster

BrokerServer01A

BrokerServer01B

BrokerServer01C

Broker01Cluster

Broker01Inst01

Broker01Inst02

Back to top

Additional references

Configuring Windows HPC Server 2008 R2 for High Availability with SOA Applications

Requirements for Windows HPC Server 2008 R2 in Failover Clusters

Running the Head Node in a Failover Cluster with Windows HPC Server 2008 R2