High Performance Computing
An Introduction to Windows Compute Cluster Server
John Kelbley and Doug Lindsey
At a Glance:
- Using clusters for solving complex problems
- General requirements for computational clusters
- Setting up a Windows-based compute cluster
- Running commands remotely
High Performance Computing (HPC) refers to a branch of applied computing that focuses for the most part on solving computationally intensive problems. Years ago, HPC (then more commonly referred to as "supercomputing") was dominated by
specialized large (expensive) systems found primarily in research centers. As the computing power of small systems increased, however, the cost/performance ratio has changed, and computational workloads have moved to PC-class systems.
Many computationally intensive problems can be solved by completing calculations in parallel; that is to say, a particular calculation or process may not be dependent on the output of another to complete. In cases like this, large problems can take advantage of multiple smaller systems (nodes) grouped into computational clusters. Here are just a few examples of the kinds of applications that take advantage of computation clusters:
- Financial models—an algorithm or formula is run thousands of times, each time with different inputs.
- Engineering—simulating effects in individual parts, applying textures to models.
- Computer animation—applying texture and lighting effects to each frame of a movie.
Computational clusters provide an economical way to solve complex problems in a short amount of time. As you'll see here, Microsoft provides key compute cluster functionality and support with Windows® Compute Cluster Server 2003.
It is important to understand that a computational cluster is different from an availability (failover) cluster. You might already be familiar with the availability clusters for Exchange, SQL Server®, or other applications in Windows Server® 2003. They typically leverage shared storage to maximize application uptime. The intention of computational clusters is not to resume the work of another system in the event of an outage but to allow all nodes to work in a coordinated fashion.
General Requirements for a Computational Cluster
Modern HPC systems share some key elements that facilitate the processing of workloads. For starters, you need more than one system. You also need a scheduler to coordinate work assignments across the nodes. The scheduler runs on the node in charge (the head node) and identifies available resources, assigning and distributing tasks and tracking the overall status of jobs. It is the coordinator of resources within the cluster of systems as well as the point where users and administrators submit jobs for processing.
Clusters require a means of communicating among nodes. Depending on the type of work, nodes may need high-speed, low-latency interconnects to pass messages among one another to coordinate processing. At a minimum, each compute node and the head node must be connected to a common network.
Large numbers of interconnected systems working together to solve complex problems also need a fair amount of care and feeding. Processes and tools geared toward provisioning, monitoring, managing, and maintaining dozens to hundreds or thousands of systems are required to ensure a consistent and stable environment.
Lastly, and most importantly, applications that can exploit the parallel-processing capabilities of a computational cluster must be available. Without applications that are able to break up jobs so they can be processed on multiple computers, or development tools that include "parallel debugger" support, a cluster would do little more than heat up your computer room.
Windows Server-Based Clustering
In 2006, Microsoft introduced Windows Server 2003 Compute Cluster Edition (CCE) and Windows Compute Cluster Server 2003 (WCCS) to meet the requirements for a broad range of HPC applications. CCE and WCCS are based on the same Windows Server 2003 you are already familiar with. CCE is a version of Windows Server 2003 licensed for use with HPC applications. WCCS is identical to CCE with the addition of the Microsoft® Compute Cluster Pack (CCP). This means that you can deploy, manage, monitor, and maintain a Windows Server-based cluster using the same tools you already use to manage existing Windows Server 2003 systems.
One key point to note is that CCE and WCCS are x64 only—there is no 32-bit (x86) version of these products. The hardware requirements for CCE with WCCS are identical to those for Windows Server 2003 Standard x64 Edition. In addition to support for high-performance hardware (64-bit architecture), the products include Remote Direct Memory Access (RDMA) support for high-performance interconnects (Gigabit Ethernet, InfiniBand, Myrinet, and others).
Compute Cluster Pack
As mentioned earlier, computational clusters have some basic requirements that can be met through the installation of the Compute Cluster Pack. CCP is a standalone installation package that includes:
- Integrated Job Scheduler
- Messaging Passing Interface (MPI) support for industry standard MPICH2
- Cluster Resource Management and User Tools
The CCP is the key differentiator between CCE and WCCS. You might be wondering why, if computational clusters need these components, they are included only in WCCS and not in CCE. The answer is that some HPC solutions leverage different job schedulers or specialized MPIs (or do not require MPI support) and simply require a great platform on which to run—namely, Windows Server 2003. Because the CCP is a standalone package, you can install it on other x64 versions of Windows Server 2003 (CCE, Standard, Standard R2, Enterprise, and Enterprise R2).
Nuts and Bolts
WCCS takes advantage of key Windows fundamentals to simplify the management and operation of Windows-based clusters, including Active Directory® and Remote Installation Services (RIS). Compute Cluster Server uses Active Directory to manage security transparently. With Active Directory, a user can submit a job to the head node—using a single set of credentials—that can run across hundreds of server nodes. When jobs are executed on one or more compute nodes, they run in the context of the user credentials provided at job submission, and they are securely cached thereafter. WCCS relies on Active Directory in order to provide this "single point of logon" functionality. An additional benefit of having Active Directory present in the environment is that server and configuration policy can be centrally administered via Group Policy.
If your IT organization already has Active Directory deployed, you can save significant time and administrative effort by building your cluster in the existing domain. This is the recommended scenario.
If you need to deploy a compute cluster into an environment where Active Directory does not exist, a recommended practice is to deploy one or more dedicated domain controllers to host Active Directory for the cluster. It is not uncommon to make the head node an Active Directory domain controller, since all nodes already have network connectivity to the system, regardless of network topology. However, making the head node a domain controller is not recommended for larger-scale clusters due to increased load on the head node. Common practices for Active Directory deployment and management should be followed (installation of redundant domain controllers, adequate back-ups, good security practices, and so forth).
Remote Installation Service
WCCS provides an integrated front end for the Microsoft Remote Installation Service image deployment platform. RIS is used to deploy operating system images from the head node to each of the compute nodes, meaning that you can use RIS to install new cluster nodes rapidly. You don't have to do it this way—RIS is integrated into WCCS as a convenience. Other common Windows Server deployment technologies may also be used, such as Windows Server 2003 Automated Deployment Services (ADS) or a manual installation of Windows Server 2003.
If you want to use RIS, before installing CCP on your head node, make sure you have at least two logical disks defined. RIS requires a disk that is separate from the operating system for the storage of server images. Your RIS partition should have enough free disk space to house one or more full copies of a Windows Server image.
Newer server hardware, particularly networking and storage gear, may require Plug and Play drivers that are not contained in the default Window Server 2003 images. In this case, you will need to add those drivers to the image manually. The procedure for doing this is documented online at support.microsoft.com/kb/254078.
Installing Your Cluster
The first step is to configure the brains of your cluster—the head node. Begin by installing one of the x64 versions of Windows Server 2003 noted earlier. During the operating system installation, you choose whether or not to join the server to an existing domain (recommended) or to install Active Directory on the server (not recommended for large-scale clusters).
After installing the operating system, make sure you have also downloaded and installed all recommended fixes from Microsoft Update. If you plan to use RIS for compute-node imaging, you then need to use the Computer Management | Disk Management administrative tool to ensure that you have at least two logical disks defined. In the interest of space, we won't cover RIS-based installation in this article.
Once you have successfully logged into and updated your server, you will run the CCP setup program. CCP setup will install or guide you through the downloading and installation of the following prerequisite files and updates:
The CCP setup program does a great job of assessing your system's readiness for installation. It will let you know what components are required and installed as part of the process, and install them, as shown in Figure 1.
Figure 1 Components required for CCP installation (Click the image for a larger view)
The To Do List
When the CCP setup is complete, it will launch the administrator console, with the focus on the To Do List. As Figure 2 shows, the To Do List contains several panes, each addressing a different area and listing key tasks to complete the cluster configuration.
Figure 2 To Do List shows what you need to do to configure your cluster (Click the image for a larger view)
The To Do List lets you easily implement the cluster architecture you've laid out, including selecting and configuring the network topology, node installation process, and user management model. Tasks for each of the panes are listed on the right, and each task launches a wizard. Complete each task in sequence for a fully configured head node that is ready to deploy RIS images to a known set of compute nodes and that can be administered and accessed by a defined set of users.
Networking Topology WCCS supports the five most common network topologies used in HPC. At a minimum, all compute nodes and the head node must share a common network. The topologies include support for multiple types of interconnects among the nodes and have separate benefits and costs.
Your network topology will depend on the performance, security, and deployment requirements for your cluster. For instance, your application might require a high-speed interconnect for message passing that you would not like exposed to your corporate network. Maybe you would like to take advantage of the integrated, RIS-based, automated deployment capabilities. Perhaps the hardware you have chosen can only accommodate a single NIC or, possibly, you want the head node to run Internet Connection Sharing (ICS) to manage name resolution and addressing for the compute nodes. The "Configure Cluster Network Topology" wizard invoked from the To Do List will present you with the five topologies typically supported by common HPC implementations as well as help you configure your network connections, including Windows Firewall configurations on the head node and compute nodes. Figure 3 shows one of the supported network topologies.
Figure 3 One of the supported WCCS network topologies (Click the image for a larger view)
Remote Installation Service As noted previously, RIS enables the automated deployment of cluster nodes.
Node Management You have the ability to specify the machine names of the servers that will be compute nodes in this cluster. When the CCP is executed on a node, the head node of the cluster that the node will join is specified at this time.
User Management Here you can specify which Active Directory user accounts or groups will be designated as system administrators and which will be designated as users of the cluster.
Deployment of compute nodes can be completed automatically via the RIS administration utility or using other supported deployment methods. As with the head node, the CCP must be installed on each compute node. As Figure 4 shows, CCP setup configuration options are much simpler for a compute node and consist of the following:
Figure 4 Compute Cluster Pack setup (Click the image for a larger view)
- Specifying that the server should be a compute node (and not a head node).
- Specifying the name of the head node of the cluster that the server should join.
- Specifying whether or not to install the administrative and user tools on the compute node.
Once your nodes are in communication with your head node, most management and administration tasks for all systems can be accomplished via the administrator console, shown in Figure 5. The console provides a central view of the entire cluster including a list of all associated nodes, access to key administrative actions, and other details. The left-most pane provides high-level navigation through the console, including access to the "Cluster Administrator," which, when highlighted, provides a summary screen showing cluster status and job statistics. Access to all remote desktop and system monitor sessions can be accessed here as well, but most of your time administering the cluster will be spent using "Node Management."
Figure 5 Administrator console provides a view of the entire cluster (Click the image for a larger view)
You should explore the administrator console on your own, as it provides access to key tools and functions required to centrally administer Windows and cluster nodes. Clicking once on a cluster node in the top-center pane will populate several of the other windows, and you can begin exploring node-specific functions.
The functions supported by the right-click menus allow you to easily administer compute nodes centrally from Node Management. You can use Node Management to pause and resume nodes; approve or remove nodes to and from the cluster; launch remote desktops, system monitor, or event viewer; or cause the CD tray to eject (which can be quite helpful when you're trying to physically identify a single machine in a large cluster).
The single coolest feature in this list is "Run Command"—that is, the ability to run any arbitrary command remotely, just as if you were sitting at a command prompt on the target machine. WCCS includes a command-line version of the Run Command feature called Clusrun.exe, and "ClusRun" is used to describe both the GUI and command-line versions.
The benefits of the utility of ClusRun cannot be overemphasized—particularly on a large cluster. It is a huge timesaver for running repetitive command-line tasks and can often eliminate the need to write more complex administrative scripts.
To use this feature, highlight a list of the machines against which you want to run a command, then right-click, and select Run Command. The dialogue box in Figure 6 is displayed. Now just type in the command you want, click on Run, and wait for the output to be displayed in the Result window. You will be prompted for credentials the first time you use ClusRun, but you can opt to have those credentials cached for subsequent reuse.
Figure 6 You can run a command on multiple nodes (Click the image for a larger view)
A common way to patch clusters or install programs that include automated setup routines is to place them on a file share and then use ClusRun to force all of the compute nodes to invoke the command. Rebooting all compute nodes in the entire cluster can be done through a single command:
In May 2007, the HPC team released the Compute Cluster Pack Tool Pack (you can get it online at windowshpc.net/resources/Pages/default.aspx). The tool pack includes support for Windows PowerShellTM, an MPIPingPong tool for diagnosing connectivity health and a very simple but useful graphical cluster monitor.
As shown in Figure 7, Simple Cluster Monitor can show you on one screen each of the nodes in the cluster, how many cores each node has, the utilization of each CPU (in bright green), memory utilization (yellow), percent disk time (red), and network bandwidth utilization (orange). It's a great at-a-glance tool admins can run to see how the cluster is doing.
Figure 7 Monitoring several nodes with Simple Cluster Monitor (Click the image for a larger view)
The cluster monitor also includes "remoteable" functionality. You must run the first instance of cluster monitor on the head node. However, if you run an instance of the cluster monitor in a console session on the head node, you then have the opportunity to launch instances of cluster monitor from a workstation, point them at the head node, and get the same display.
What Are You Waiting For?
You are now armed with an understanding of HPC and Windows Server 2003 Compute Cluster Server. You know how to install the CCP, you have seen the great all-in-one console, and you now have cool new commands and graphical tools with which to dazzle your peers and users. You'll find sources for further information on High Performance Computing in the "HPC Resources" sidebar.
John Kelbley is a Technical Product Manager for Microsoft on the Global Solutions Technology Team based in the North Eastern U.S. John can be reached at Johnkel@microsoft.com
Doug Lindsey is a Program Manager on the Microsoft HPC Team. He is also an administrator of production compute clusters (including #116 on the November '07 list at www.top500.org). Doug can be reached at Dougli@microsoft.com.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.