Overview and Requirements for Windows HPC Server 2008 in a Failover Cluster

Applies To: Windows HPC Server 2008

This guide provides procedures and guidance for deploying Windows® HPC Server 2008 in a failover cluster where the servers are running Windows Server® 2008.

Important

Before you begin your deployment, we recommend that you familiarize yourself with the documentation list in the Configuring Failover Clustering in Windows HPC Server 2008 Step-by-Step Guide.

In this section

Overview

System requirements

Hardware requirements for a failover cluster running Windows HPC Server 2008

Software requirements for a failover cluster running Windows HPC Server 2008

Domain account requirements

Network infrastructure requirements

Windows HPC Server 2008 in a failover cluster

Overview

If you want to provide high availability for your job scheduler service, you can configure your head node in a failover cluster. The failover cluster contains two servers that work together, so that if there is a failure of the server that is acting as the head node, the other server in the failover cluster automatically begins acting as the head node (in a process known as failover).

Important

The word “cluster” can refer to a head node and a set of compute nodes running software in Windows HPC Server 2008 or to a set of servers running Windows Server 2008 that are using the failover cluster feature. The word “node” can refer to a head node or a compute node running software in Windows HPC Server 2008, or to one of the servers in a failover cluster. In this document, servers in the context of a failover cluster are usually referred to as “servers,” to distinguish failover cluster nodes from a head node or compute node. Also, the word “cluster” is placed in an appropriate phrase (such as “failover cluster”) or used in context in a sentence to distinguish which type of cluster is being referred to.

When you configure your head node in a failover cluster, each of the servers in the failover cluster must have access to the failover cluster storage, which contains the job scheduler and management databases. The job scheduler service runs on one of the servers in the failover cluster, but it can fail over to the other server in the failover cluster if a problem occurs.

The following diagram shows how a failover cluster can support Windows HPC Server 2008:

Head node in HPC cluster failing over

System requirements

The system requirements for the configuration in this guide are a combination of the requirements for the following components, which are all required for running Windows HPC Server 2008 in the context of a failover cluster:

  • The failover clustering feature in Windows Server 2008

  • SQL Server 2005 Service Pack 2 installation for failover clustering

  • Windows HPC Server 2008

The following sections provide more detail about these system requirements.

Hardware requirements for a failover cluster running Windows HPC Server 2008

You need the following hardware for a failover cluster running Windows HPC Server 2008:

  • Two servers that are compatible with Windows Server 2008: Although the failover clustering feature in Windows Server 2008 can run on as many as eight servers, Windows HPC Server 2008 only provides support for a two-server configuration.

    We recommend that you use a set of matching computers that contain the same or similar components.

    The following table lists the hardware requirements for a minimal and a recommended installation of Windows Server 2008 that will run Windows HPC Server 2008.

Configuration RAM Processor Disk Space (System Partition)

Minimum

512 MB

1 GHz (full installation)

8 GB

Recommended

1 GB (full installation)

2 GHz (full installation)

40 GB (full installation)

Important

Microsoft supports a failover cluster solution only if all the hardware components are marked as "Certified for Windows Server 2008." In addition, the complete configuration (servers, network, and storage) must pass all tests in the Validate a Configuration Wizard, which is included in the Failover Cluster Management snap-in. For information about hardware compatibility for Windows Server 2008, see the Windows Server Catalog (https://go.microsoft.com/fwlink/?LinkID=59821). For information about the maximum number of servers that you can have in a failover cluster, see Compare Technical Features and Specifications (https://go.microsoft.com/fwlink/?LinkId=92091).

  • Network adapters and cables (for failover network communication): The network hardware, like other components in the Windows Server 2008 failover cluster solution, must be compatible with Windows Server 2008. If you use iSCSI, your network adapters must be dedicated to either network communication or iSCSI, not both.

    Important

    In the network infrastructure that connects the failover cluster servers, to avoid having single points of failure for the configuration described in this guide, you must use multiple, distinct networks. This is important because the load on the network connecting a head node to compute nodes can be high when compute nodes are being deployed or when the application load is high.

    For Windows HPC Server 2008, a public-only topology, also called Topology 5 (Public Only) is not supported for the scenario described in this guide. For more information, see Network infrastructure requirements, later in this section.

    Note that if you connect the failover cluster servers using a single network, the network will pass the redundancy requirement in the Validate a Configuration Wizard. However, the report from the Failover Cluster Setup Wizard will include a warning that the network should not have single points of failure.

    For more details about the network configuration that is required for a Windows Server 2008 failover cluster, see “Network infrastructure and domain account requirements for a two-node failover cluster” in the Failover Cluster Step-by-Step Guide: Configuring a Two-Node File Server Failover Cluster at Failover Cluster Step-by-Step Guide: Configuring a Two-Node File Server Failover Cluster (https://go.microsoft.com/fwlink/?LinkId=86167).

  • Device controllers or appropriate adapters for the storage:

    • For Serial Attached SCSI or Fibre Channel: If you are using Serial Attached SCSI or Fibre Channel, in all clustered servers, all components of the storage stack should be identical. It is required that the multipath I/O (MPIO) software and Device Specific Module (DSM) software components be identical.  It is recommended that the mass-storage device controllers—that is, the host bus adapter (HBA), HBA drivers, and HBA firmware—that are attached to cluster storage be identical. If you use dissimilar HBAs, you should verify with the storage vendor that you are following their supported or recommended configurations.

      Note

      With Windows Server 2008, you cannot use parallel SCSI to connect the storage to the servers in the failover cluster.

    • For iSCSI: If you are using iSCSI, each server in the failover cluster must have one or more network adapters or host bus adapters that are dedicated to the cluster storage. The network that you use for iSCSI cannot be used for network communication. The network adapters that you use to connect to the iSCSI storage target should be identical, and we recommend that you use Gigabit Ethernet or a faster connection. For iSCSI, you cannot use teamed network adapters, because they are not supported with iSCSI. For more information about iSCSI, see iSCSI Cluster Support: FAQ (https://go.microsoft.com/fwlink/?LinkId=61375).

  • Storage: You must use shared storage that is compatible with Windows Server 2008. The storage should contain at least two separate volumes (LUNs) that are configured at the hardware level. One volume functions as the witness disk (described in the following paragraph). One volume contains the files that are required by the head node that runs in the failover cluster. Storage requirements include the following:

    • To use the native disk support included in failover clustering, use basic disks, not dynamic disks.

    • We recommend that you format the partitions with the NTFS file system (for the witness disk, the partition must be NTFS).

    • For the partition style of the disk, you can use a master boot record (MBR) or a GUID partition table (GPT).

    The witness disk is a disk in the failover clustering storage that is designated to hold a copy of the failover cluster configuration database. (A witness disk is part of some, not all, quorum configurations.) For Windows HPC Server 2008, the quorum configuration is Node and Disk Majority, which is the default for a failover cluster with an even number of servers. Node and Disk Majority means that the servers in the failover cluster and the witness disk each contain copies of the failover cluster configuration, and a failover cluster can function as long as a majority (two out of three) of these copies are available.

Software requirements for a failover cluster running Windows HPC Server 2008

To set up the head node on a failover cluster, you must use the Windows Server 2008 Enterprise or Windows Server 2008 Datacenter operating system. In addition, Windows HPC Server 2008 only supports x64, so all the servers must run the same x64 hardware version of the operating system. All servers should also have the same security updates and service packs.

Each server in the failover cluster also requires SQL Server 2005 Standard Service Pack 2 to support the Job Scheduling and Management Service databases.

The head node cannot serve as a compute node or Windows Communication Foundation (WCF) broker node.

Domain account requirements

You need the following domain attributes:

  • Domain role: Both servers in the failover cluster must be in the same Active Directory domain. Neither server in the failover cluster should be a domain controller (to maintain security levels for SQL Server 2005).

  • Account for administering the failover cluster: When you first create a failover cluster and when you install SQL Server 2005, you must be logged on to the domain with an account that has administrator rights and permissions on both servers in the failover cluster. The account does not need to be a Domain Admins account, but can be a Domain Users account that is in the Administrators group on both servers. In addition, if the account is not a Domain Admins account, the account (or the group that the account is a member of) must have the Create Computer Objects and Read All Properties permissions in the domain.

  • A domain security group to install SQL Server 2005: On a failover cluster running SQL Server 2005, domain groups that are common to all cluster nodes are used to control access to registry keys, files, SQL Server objects, and other cluster resources. All resource permissions are controlled by domain-level groups that include SQL Server service accounts as members. For more information, see Domain Groups for Clustered Services (https://go.microsoft.com/fwlink/?LinkId=121041).

  • A service account for the SQL Server Services to install SQL Server 2005: You can use one account for all of the services. All service accounts for a failover cluster instance must be domain accounts. For more information, see Service Account (https://go.microsoft.com/fwlink/?LinkId=121044).

Network infrastructure requirements

You need the following network infrastructure for the scenario described in this guide:

  • Network settings and IP addresses: SQL Server 2005 setup for failover clustering requires at least one static IP address. After the setup for SQL Server 2005 is complete, you can define additional IP addresses. For high availability, we recommend following the steps below, in which you choose a static address for the private network to satisfy the requirements of setup for SQL Server 2005. After setup completes, you can change the IP address configuration, for example, change the static address to a Dynamic Host Configuration Protocol (DHCP)-provided address and/or add additional IP addresses for the enterprise network.

  • See also Hardware requirements for a failover cluster running Windows HPC Server 2008, earlier in this section.

Windows HPC Server 2008 supports multiple network topologies, which are designed to meet a wide range of user needs and performance, scaling, and access requirements. The topologies are distinguished by how many networks the cluster is connected to, and in what manner. When used in a failover cluster, Windows HPC Server 2008 supports the first four topologies defined in the following table. It does not support a public-only topology (Topology 5).

Topology Description

1

Compute nodes isolated on private network

2

Compute nodes on public and private networks

3

Compute nodes isolated on private and MPI networks

4

Compute nodes on public, private, and MPI networks

5 (not supported in a failover cluster)

Compute nodes on public network only

Note

Later sections in this guide will call out specific configuration requirements for high availability when each of the preceding topologies is chosen. For more information about network configuration and choosing a network topology, consult the Windows HPC Server 2008 Getting Started Guide (https://go.microsoft.com/fwlink/?LinkId=121228).

Windows HPC Server 2008 in a failover cluster

This section summarizes some of the differences between running the head node for Windows HPC Server 2008 on a single server and running it in a failover cluster.

  • In a failover cluster, the head node cannot also be a compute node or WCF broker node. These options are disabled when the head node is configured in a failover cluster.

  • Because the head node cannot serve as a compute node, it cannot be brought online. Node state for the head node remains offline after the head node is installed in the failover cluster.

Services and resources during failover

The following table summarizes what happens to each service or resource during failover:

Service or Resource What Happens During Failover

Job scheduler

Fails over to the other server in the failover cluster

Management service

Fails over to the other server in the failover cluster

CCS managed file shares (InstallShare, spoolDir)

Ownership fails over to the other server in the failover cluster

Management database

Stored on a disk in cluster storage, which means it is accessible to either server in the failover cluster as needed

Job scheduler database

Ownership fails over to the other server in the failover cluster

NAT

Is replicated through the Network Configuration Wizard for Topologies 1 and 3

WDS

Is replicated

Reporting service

Does not fail over (failover is unsupported)

DHCP

Is replicated (scope partitioned through the Network Configuration Wizard)

File sharing for compute nodes

If configured through the Failover Cluster Management snap-in, fails over to the other server in the failover cluster