Using the Compute Cluster Administrator

Applies To: Windows Compute Cluster Server 2003

The Compute Cluster Administrator

The Compute Cluster Administrator is a Microsoft Management Console (MMC) 3.0 snap-in: Computeclusteradmin.msc. This file is installed during Compute Cluster Pack Setup on the head node and when the Compute Cluster ServerĀ 2003 client utilities are installed. When this file is installed on a remote workstation, an administrator can manage a cluster by specifying the head node of that cluster.

Note

Using the Compute Cluster Administrator snap-in to connect directly to a cluster compute node is not supported. All management of a cluster is performed through the head node. Use of the Compute Cluster Administrator from a compute node to completely manage the cluster is not supported, because not all tasks that can be performed from the head node can be performed from a remote workstation.

The Compute Cluster Administrator has five major pages, each accessed from the navigation pane at the left side of the console. These pages are:

  • Start Page

  • To Do List

  • Node Management

  • Remote Desktop Sessions

  • System Monitor

Start page

This page is displayed upon startup of the Compute Cluster Administrator. It provides an overview of cluster and job status, and allows the administrator to access the Compute Cluster Job Manager. The information displayed on the Start Page is refreshed every 5 seconds.

The compute node information displayed on this page includes:

  • Number of nodes pending for approval

  • Number of nodes ready and accepting jobs

  • Number of paused nodes

  • Number of unreachable nodes

  • Total number of nodes in cluster

  • Number of processors in use

  • Number of idle processors

  • Total number of processors available.

The job information displayed on this page includes:

  • Number of jobs running

  • Number of pending jobs

  • Number of failed jobs

  • Number of cancelled jobs

  • Number of completed jobs

  • Total number of jobs

On the Action pane of the Start Page, you can choose another cluster to manage or customize the items displayed by MMC or the Compute Cluster Administrator snap-in.

To Do List

In contrast to the Start Page, which is used for monitoring nodes and jobs, the purpose of the To Do List is to configure and administer the compute cluster. The To Do List displays four tiles in the MMC display pane. Each tile displays a description of tasks you can accomplish using the wizards on the tile, notification messages about the configuration state or status of the subject of that tile, and access to the appropriate wizards.

The tiles are:

Networking: This tile displays notification messages about the network topology chosen, configuration details such as whether ICS is enabled on the head node, the IP addresses of the public, private, and MPI network interfaces, if they are installed on the head node, and the status of the Windows Firewall.

You can configure the compute cluster topology using the Configure Network Topology Wizard. With this tool you define the public (cluster external), private (cluster internal) and MPI network interfaces of the head node.

You enable and manage the Windows Firewall using the Manage Windows Firewall Settings Wizard.

Remote Installation Services (RIS): This tile displays notification messages about the number of RIS images available and other operational messages. You can install and uninstall RIS using the appropriate wizard displayed on this tile, and create, modify, and delete RIS images with the Manage Images Wizard.

Node Management: This tile displays notification messages about the compute nodes. The cluster administrator uses the Add Nodes Wizard to add compute nodes to the cluster and the Remove NodesWizard to remove them from the cluster.

You use the Add Nodes Wizard to add nodes to the cluster using either the Automated Addition deployment method or the Manual deployment method. The Automated Addition method requires that RIS is installed and a private intracluster network exists.

Cluster Security: This tile provides notifications about cluster user and administrator groups. Users in the cluster user group have permission to submit jobs to the cluster. Members of the cluster administrator group can manage jobs and administer the cluster. The administrator uses the Manage Users Wizard to create, manage and delete users from these groups.

Node Management

The Node Management page displays information about head and compute nodes in columnar format. The Actions pane allows you to start the Add Nodes Wizard and the Compute Cluster Job Manager. You can also configure remote desktop settings by providing a default user name, domain, and password for use by remote desktop sessions to connect to multiple nodes, rather than logging onto each node individually.

By default, machine name, node status, number of jobs, and CPU information is displayed. The full list of node information that an administrator can display is as follows:

  • Machine name (default)

  • Node status (default)

  • Number of jobs (default)

  • Number of jobs running

  • Number of CPU's (default)

  • Number of CPU's in use (default)

  • Operating system version

  • Disk size

  • Total memory

  • Public network IP address

  • Private network IP address (if applicable)

  • MPI network IP address (if applicable)

Tasks

The cluster administrator can select one or more nodes and perform the following actions:

  • Approve node for inclusion into the cluster

  • Delete (remove) node from the cluster

  • Execute a command on the selected node(s)

  • Identify node(s) by causing the CD drive to eject, allowing an administrator to visually identify the node

  • Open a System Monitor session on selected node(s)

  • Pause node

  • Restart node

  • Resume a paused node

  • Start a Remote Desktop Connection to each selected node

  • Open Event Viewer on one or more compute nodes to review application or system logs

Remote Desktop sessions

The cluster administrator uses this page to create and close remote desktop sessions to one or more nodes.

For each active remote desktop session, that session will be added to a list of active sessions in the navigation pane beneath the Remote Desktop Sessions item. This allows an administrator to easily move from session to session.

The administrator can connect to a node by providing a user name and password, or can provide, using the Node Management page, a global or default set of credentials to create a remote session to a node.

Note

These credentials will be saved to the Stored User Names and Passwords file on the node when you terminate the session.

The cluster administrator can also change the default resolution settings for all remote desktop sessions.

System Monitor

To open a System Monitoring session for each node, select one or more nodes on the Node Management page and right-click, then select Start Perfmon Task. Each session can display the following objects and counters:

Object Counter

Processor

% Processor Time Total

Compute Nodes

Number of jobs running on node

Number of tasks running on node

Number of total processors on node

Number of processors in use on node

Number of processors idle on node

Number of serial tasks running on node

Number of parallel tasks running on node

Each System Monitor session is displayed in the left-hand navigation pane beneath System Monitor. The administrator can change the properties of any session by clicking the Properties task in the Action pane.

Remote command execution

Using the Compute Cluster Administrator snap-in, a cluster administrator can issue a command-line action on one or more selected compute nodes. The command is immediately run through the Job Scheduler on a priority basis. The command can be any valid command-line argument. The credentials of the user submitting the command-line action is used to execute the action.

A cluster administrator can use remote command execution to run scripts and programs on compute nodes. This command is very useful when running commands across multiple nodes. You can run commands on paused nodes. For additional information, see Running Commands on Nodes.