Appendix 1: HPC Cluster Networking

Windows HPC Server 2008 supports five cluster topologies designed to meet a wide range of user needs and performance, scalability, manageability, and access requirements. These topologies are distinguished by how the compute nodes in the cluster are connected to each other and to the enterprise network. Depending on the network topology that you choose for your cluster, certain network services, such as Dynamic Host Configuration Protocol (DHCP) and network address translation (NAT), can be provided by the head node to the compute nodes.

You must choose the network topology that you will use for your cluster well in advance of setting up an HPC cluster.

This section includes the following topics:

  • HPC cluster networks
  • Supported HPC cluster network topologies
  • HPC network services
  • Windows Firewall configuration

HPC cluster networks

The following table lists and describes the networks to which an HPC cluster can be connected.

Network Name Description

Enterprise network

An organizational network to which the head node is connected and optionally the compute nodes. The enterprise network is often the network that most users in an organization log on to when performing their job. All intra-cluster management and deployment traffic is carried on the enterprise network unless a private network (and optionally, an application network) also connects the cluster nodes.

Private network

A dedicated network that carries intra-cluster communication between nodes. This network carries management, deployment, and application traffic if no application network exists.

Application network

A dedicated network, preferably with high bandwidth and low latency. These characteristics are important so that this network can perform latency-sensitive tasks, such as carrying parallel Message Passing Interface (MPI) application communication between compute nodes.

Supported HPC cluster network topologies

There are five cluster topologies supported by Windows HPC Server 2008:

  • Topology 1: Compute Nodes Isolated on a Private Network
  • Topology 2: All Nodes on Enterprise and Private Networks
  • Topology 3: Compute Nodes Isolated on Private and Application Networks
  • Topology 4: All Nodes on Enterprise, Private, and Application Networks
  • Topology 5: All Nodes on an Enterprise Network

Topology 1: Compute nodes isolated on a private network

The following image illustrates how the head node and the compute nodes are connected to the cluster networks in this topology:

CCS Cluster Topology 1

The following table lists and describes details about the different components in this topology:

Component Description

Network adapters

  • The head node has two network adapters.
  • Each compute node has one network adapter.
  • The head node is connected to both an enterprise network and to a private network.
  • The compute nodes are connected only to the private network.

Traffic

  • The private network carries all communication between the head node and the compute nodes, including deployment, management and application traffic (for example, MPI communication).

Network services

  • The default configuration for this topology is NAT enabled on the private network in order to provide the compute nodes with address translation and access to services and resources on the enterprise network.
  • DHCP is enabled by default on the private network to assign IP addresses to compute nodes.
  • If a DHCP server is already installed on the private network, then both NAT and DHCP will be disabled by default.

Security

  • The default configuration on the cluster has the firewall turned ON for the enterprise network and turned OFF for the private network.

Considerations when selecting this topology

  • Cluster performance is more consistent because intra-cluster communication is routed onto the private network.
  • Network traffic between compute nodes and resources on the enterprise network (such as databases and file servers) pass through the head node. For this reason, and depending on the amount of traffic, this might impact cluster performance.
  • Compute nodes are not directly accessible by users on the enterprise network. This has implications when developing and debugging parallel applications for use on the cluster.

Topology 2: All nodes on enterprise and private networks

The following image illustrates how the head node and the compute nodes are connected to the cluster networks in this topology:

CCS Cluster Topology 2

The following table lists and describes details about the different components in this topology:

Component Description

Network adapters

  • The head node has two network adapters.
  • Each compute node has two network adapters.
  • All nodes in cluster are connected to both the enterprise network and to a dedicated private cluster network.

Traffic

  • Communication between nodes, including deployment, management, and application traffic, is carried on the private network in this topology.
  • Traffic from the enterprise network can be routed directly to a compute node.

Network services

  • The default configuration for this topology has DHCP enabled on the private network, to provide IP addresses to the compute nodes.
  • NAT is not required in this topology because the compute nodes are connected to the enterprise network, so this option is disabled by default.

Security

  • The default configuration on the cluster has the firewall turned ON for the enterprise network and turned OFF for the private network.

Considerations when selecting this topology

  • This topology offers more consistent cluster performance because intra-cluster communication is routed onto a private network.
  • This topology is well suited for developing and debugging applications because all compute nodes are connected to the enterprise network.
  • This topology provides easy access to compute nodes by users on the enterprise network.
  • This topology provides faster access to enterprise network resources by the compute nodes.

Topology 3: Compute nodes isolated on private and application networks

The following image illustrates how the head node and the compute nodes are connected to the cluster networks in this topology:

CCS Cluster Topology 3

The following table lists and describes details about the different components in this topology:

Component Description

Network adapters

  • The head node has three network adapters: one for the enterprise network, one for the private network, and a high-speed adapter that is connected to the application network.
  • Each compute node has two network adapters, one for the private network and another for the application network.

Traffic

  • The private network carries deployment and management communication between the head node and the compute nodes.
  • Jobs running on the cluster use the high-performance application network for cross-node communication.

Network services

  • The default configuration for this topology has both DHCP and NAT enabled for the private network, to provide IP addressing and address translation for compute nodes. DHCP is enabled by default on the application network, but not NAT.
  • If a DHCP is already installed on the private network, then both NAT and DHCP will be disabled by default.

Security

  • The default configuration on the cluster has the firewall turned ON for the enterprise network and turned OFF on the private and application networks.

Considerations when selecting this topology

  • This topology offers more consistent cluster performance because intra-cluster communication is routed onto the private and application networks.
  • Compute nodes are not directly accessible by users on the enterprise network in this topology.

Topology 4: All nodes on enterprise, private, and application networks

The following image illustrates how the head node and the compute nodes are connected to the cluster networks in this topology:

e95472de-755d-4abe-96ea-ce9ec5c07882

The following table lists and describes details about the different components in this topology:

Component Description

Network adapters

  • The head node has three network adapters.
  • All compute nodes have three network adapters.
  • The network adapters are for the enterprise network, the private network, and a high speed adapter for the high performance application network.

Traffic

  • The private cluster network carries only deployment and management traffic.
  • The application network carries latency-sensitive traffic, such as MPI communication between nodes.
  • Network traffic from the enterprise network reaches the compute nodes directly.

Network services

  • The default configuration for this topology has DHCP enabled for the private and application networks to provide IP addresses to the compute nodes on both networks.
  • NAT is disabled for the private and application networks because the compute nodes are connected to the enterprise network.

Security

  • The default configuration on the cluster has the firewall turned ON for the enterprise network and turned OFF on the private and application networks.

Considerations when selecting this topology

  • This topology offers more consistent cluster performance because intra-cluster communication is routed onto a private and application network.
  • This topology is well suited for developing and debugging applications because all cluster nodes are connected to the enterprise network.
  • This topology provides easy access to compute nodes by users on the enterprise network.
  • This topology provides faster access to enterprise network resources by the compute nodes.

Topology 5: All nodes on an enterprise network

The following image illustrates how the head node and the compute nodes are connected to the cluster networks in this topology:

439d098f-e61d-4055-9baf-35d1aecd1659

The following table lists and describes details about the different components in this topology:

Component Description

Network adapters

  • The head node has one network adapter.
  • All compute nodes have one network adapter.
  • All nodes are on the enterprise network.

Traffic

  • All traffic, including intra-cluster, application, and enterprise traffic, is carried over the enterprise network. This maximizes access to the compute nodes by users and developers on the enterprise network.

Network services

  • This topology does not require NAT or DHCP because the compute nodes are connected to the enterprise network.

Security

  • The default configuration on the cluster has the firewall turned ON for the enterprise network.

Considerations when selecting this topology

  • This topology offers easy access to compute nodes by users on the enterprise network.
  • Access of resources on the enterprise network by individual compute nodes is faster.
  • This topology, like topologies 2 and 4, is well suited for developing and debugging applications because all cluster nodes are connected to the enterprise network.
  • This topology provides easy access to compute nodes by users on the enterprise network.
  • This topology provides faster access to enterprise network resources by the compute nodes.
  • Because all nodes are connected only to the enterprise network, you cannot use Windows Deployment Services to deploy compute node images using the new deployment tools in Windows HPC Server 2008.

HPC network services

Depending on the network topology that you have chosen for your HPC cluster, the following network services can be provided by the head node to the compute nodes connected to the different cluster networks:

  • Network Address Translation (NAT)
  • Dynamic Host Configuration Protocol (DHCP) server

This section describes these HPC network services.

Network address translation (NAT)

Network address translation (NAT) provides a method for translating Internet Protocol version 4 (IPv4) addresses of computers on one network into IPv4 addresses of computers on a different network.

Enabling NAT on the head node enables compute nodes on the private or application networks to access resources on the enterprise network. You do not need to enable NAT if you have another server providing NAT or routing services on the private or application networks. Also, you do not need NAT if all nodes are connected to the enterprise network.

DHCP server

A DHCP server assigns IP addresses to network clients. Depending on the detected configuration of your HPC cluster and the network topology that you choose for your cluster, the compute nodes will receive IP addresses from either the head node running DHCP, or from a dedicated DHCP server on the private network, or via DHCP services coming from a server on the enterprise network.

Windows Firewall configuration

Windows HPC Server 2008 opens firewall ports on the head node and compute nodes to enable internal services to run. By default, Windows Firewall is enabled only on the enterprise network, and disabled on the private and application networks to provide the best performance and manageability experience.

Important

If you have applications that require access to the head node or to the cluster nodes on specific ports, you will have to manually open those ports in Windows Firewall.

Firewall ports required by Windows HPC Server 2008

The following table lists all the ports that are opened by Windows HPC Server 2008 for communication between cluster services on the head node and the compute nodes.

Port Number (TCP) Required By

5969

Required by the client tools on the enterprise network to connect to the HPC Job Scheduler Service on the head node.

9892, 9893

Used by the HPC Management Service on the compute nodes to communicate with the HPC System Definition Model (SDM) Service on the head node.

5970

Used for communication between the HPC Management Service on the compute nodes and the HPC Job Scheduler Service on the head node.

9794

Used for communication between ExecutionClient.exe on the compute nodes and the HPC Management Service on the head node. ExecutionClient.exe is used during the deployment process of a compute node. It performs tasks such as imaging the computer, installing all the necessary HPC components, and joining the computer to the domain.

9087, 9088, 9089

Used for communication between the client application on the enterprise network and the services provided by the Windows Communication Foundation (WCF) broker node.

1856

Used by the HPC Job Scheduler Service on the head node to communicate with the HPC Node Manager Service on the compute nodes.

8677

Used for communication between the HPC MPI Service on the head node and the HPC MPI Service on the compute nodes.

6729

Used for management services traffic coming from the compute nodes to the head node or WCF broker node.

5800

Used for communication between the HPC command-line tools on the enterprise network and the HPC Job Scheduler Service on the head node.

5801

Used by the remote node service on the enterprise network to enumerate nodes in a node group, or to bring a node online or take it offline.

5999

Used by HPC Cluster Manager on the enterprise network to communicate with the HPC Job Scheduler Service on the head node.

443

Used by the clients on the enterprise network to connect to the HPC Basic Profile Web Service on the head node.