Security Tip
of the Month – December 2008
See other Security Tips of the Month
by Richard Carpenter, Principal Consultant, Microsoft
Services, Southern California and Sanjay Pandit, Senior Consultant, Microsoft
Services, Southern California
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Windows® HPC Server 2008 is a cost-effective, high-performance
computing (HPC) solution. One of its primary advantages is that Windows HPC
Server 2008 can be deployed, managed, and extended using familiar Windows tools
and technologies. This also means that the foundation for securing Windows HPC
Server 2008 is the same as that for securing Windows Server® 2008, and can be
executed by taking advantage of resources such as the Windows Server 2008
Security Guide.
In addition to the broad guidance offered by the Windows
Server 2008 Security Guide, there are elements of security unique to HPC, and
the cluster architecture can be a significant contributor to its overall
security. Consider the two types of computers in a compute cluster. The head
node is the controlling node, which is represented by the server that will perform
all security checks and orchestrate the operation of the rest of the compute
nodes. The second type of server is the compute node; this is where work is
actually performed. The head node can also be a compute node, but this is usually
only the case for small clusters of less than 10 nodes.
Along with compute resources, there is also networking. Three
types of networks compose a compute cluster. The first is the enterprise
network—typically the corporate local area network (LAN)— which allows
users who are not directly connected to the compute cluster to access the head
node and, potentially, the compute clusters. The second is the private network,
which provides a dedicated connection between the head node and the compute
nodes. For small clusters, this network can sometimes be the enterprise
network, but as the compute cluster grows in size this connection can often
impact the corporate LAN and provide unwanted access to the compute nodes. The
third is a dedicated network (preferably with high bandwidth and low latency)
that carries parallel Message Passing Interface (MPI) application communication
between cluster nodes.
The compute cluster security is broken down into a
combination of two types of security: user credential and network configuration.
All user access to the compute cluster is managed through Active Directory®
groups. The groups provide access rights to the different resources of the
compute cluster. There are two different
groups used in compute cluster configurations: administrators and users. Administrators
have the power to manage the configuration of the cluster (e.g. add and/or remove
nodes, user permissions, and other administrative functions). Users are allowed
to run jobs on the compute cluster; depending on their rights, users may have
access to some or all of the resources in the cluster.
Compute clusters often use multiple network connections to
achieve the performance requirements of the jobs they are performing. With
compute clusters, network latency can be a major factor in the success or
failure of a compute job, so many compute clusters will have dedicated low-latency
network fabric that connects all of the compute nodes and the head node
together. In addition to the needs of low-latency network for computation
functions, compute clusters will often have one or more additional network
connections depending on the types of jobs being run on the compute nodes. These
additional network connections often carry data and control information to the
different compute nodes and results/status reports from the compute nodes to
the head nodes. We can also leverage
these multiple networks to maintain and reinforce the security of the
cluster.
There are three basic networking configurations for an HPC
cluster. The first is a two-network configuration for small clusters where the enterprise
network performs double duty as the data network. The second network is the data
network where the low-latency MPI protocol is used to communicate between the
compute nodes.
In the second configuration, all three network types are on
three separate network interfaces. The enterprise network provides access to
the larger corporate network, where raw data and results can be stored. The private
network is where command/control traffic is kept, which guarantees that the
head node can reach each compute node without interference from data traffic. The
data network allows the compute nodes to communicate with each other without
being impacted by the traffic over the enterprise or private networks.
A variation of the second configuration maintains the three
network connections to the head node, but each of the compute nodes only gets
two connections: the private network and the data network. The reason for doing
this is to improve security by reducing the attack surface of the compute
cluster. If a user wants to run a job that requires large amounts of data, that
data must be moved to the head node where the compute nodes will have access to
the data through the private network. This also means that any resulting data
will need to be copied off the head node before the end of the job to prevent
loss of data when the next job is run. To increase the security of this
configuration, many organizations turn on the Windows Firewall or install a third-party
firewall to prevent all inbound traffic except remote desktop connections. This
requires that end users perform a remote logon to the compute cluster head node
to perform data transfers to or from the enterprise network. From the remote
desktop session, users would copy data from the enterprise network to temporary
storage on the head node before initiating the job that will pull data from the
head node.
Through adoption of these strategies and deployment of an
appropriate HPC architecture, an attacker’s ability to access the compute
cluster and push malicious data to it will be significantly reduced. Combined
with general Windows Server security best practices, we can establish network
configurations that help ensure that the HPC platform and its data are more secure.