Understanding Node Metrics and Properties in HPC Cluster Manager
Applies To: Windows HPC Server 2008 R2
This topic describes the node properties and metrics that are available in HPC Cluster Manager to help you monitor your cluster. The node list and heat map view in HPC Cluster Manager can be modified to display various node metrics and properties. The heat map view only displays metrics. For information about creating custom node views, see Understanding Node List, Heat Map, and Custom Tab Views. For information about adding more metrics, see Customize Metrics Collection in Windows HPC Server.
In this topic:
-
Alphabetical list of node properties and metrics
-
Node properties and metrics by conceptual categories
-
Additional considerations
-
Additional references
The following table describes the available values for node properties and metrics in HPC Cluster Manager.
Note |
|---|
| In the “Property or metric” column, the names of metrics and of node properties that reflect node status are denoted by bold font. |
| Property or metric | Description | Category | ||
|---|---|---|---|---|
|
Affinity |
Displays the affinity setting for this node. Possible values:
This value is set by the cluster administrator. |
Cores/memory/disk |
||
|
Application IP |
The IP address for the network adapter that is bound to the Application network. |
Network |
||
|
Application Link Speed |
The link speed for the network adapter that is bound to the Application network. |
Network |
||
|
Application Link State |
The link state for the network adapter that is bound to the Application network. If your cluster topology does not include an Application network, or if the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected This value is periodically updated by the HPC management service during the discovery operation. |
Network |
||
|
Application Network Direct |
Whether or not a Network Direct provider is installed for the Application network. Possible values are True and False. This value is periodically updated by the HPC management service. |
Network |
||
|
Available Physical Memory (MBytes) |
The amount of physical memory available to processes running on the computer, in megabytes. AvailableMBytes is calculated by adding the amount of space on the Zeroed, Free, and Standby memory lists. Free memory is ready for use; Zeroed memory is pages of memory filled with zeros to prevent later processes from seeing data used by a previous process; Standby memory is memory removed from a process's working set (its physical memory) en route to disk but still available to be recalled. This counter displays the last observed value only; it is not an average. |
Cores/memory/disk |
||
|
Boot Information |
Information related to booting over the network from an iSCSI server. This specifies how the head node should respond to a PXE request from the node. |
Deployment |
||
|
Context Switches / second |
The combined rate at which all processors on the computer are switched from one thread to another. Context switches occur when a running thread voluntarily relinquishes the processor, is preempted by a higher priority ready thread, or switches between user-mode and privileged (kernel) mode to use an Executive or subsystem service. |
Cores/memory/disk |
||
|
Cores |
The number of physical cores on the computer. This value is periodically updated by the HPC management service during the discovery operation.
|
Cores/memory/disk |
||
|
Cores In Use |
The number of physical cores that are currently allocated to jobs. |
Cores/memory/disk |
||
|
CPU Usage (%) |
User and system time for all physical cores on the node, divided by the sampling interval times the total number of physical cores on the node. |
Cores/memory/disk |
||
|
Description |
A description for the node. This value is set by the cluster administrator. |
Deployment |
||
|
Disk Queue Length |
An indication of the number of transactions that are waiting to be processed. This counter provides a primary measure of disk congestion. The queue length is representative of not only the number of transactions, but also the length and frequency of each transaction. |
Cores/memory/disk |
||
|
Disk Throughput (Bytes/sec) |
An indication of the rate that data is being transferred. Describes the performance of disk throughput for the disk subsystem. |
Cores/memory/disk |
||
|
DNS Name |
The fully qualified DNS name for the node, including the DNS suffix. For example, “myNode.myDomain.com”. |
Network |
||
|
Domain Name |
The domain name specifications for the node. |
Network |
||
|
Durable Queues Total Bytes |
Total number of bytes of Message Queuing messages on the broker node. The broker node stores messages using Microsoft Message Queuing (MSMQ) when SOA clients create sessions on the cluster using the Durable Session APIs. Responses that are stored by the broker can be retrieved by the client at any time, even after intentional or unintentional disconnect. Messages are deleted when SOA clients retrieve their responses and close the session, or when the job history retention period is reached (by default, this is set to three days). By default, the MSMQ storage limit is 8 GB. When the MSMQ quota is reached, durable sessions stop working. |
SOA |
||
|
Durable Queues Total Messages |
Total number of Message Queuing messages on the broker node. |
SOA |
||
|
Durable Requests Queue |
Total number of requests stored in local Message Queuing. |
SOA |
||
|
Durable Responses Queue |
Total number of responses stored in local Message Queuing. |
SOA |
||
|
Enterprise IP |
The IP address for the network adapter that is bound to the Enterprise network. |
Network |
||
|
Enterprise Link Speed |
The link speed for the network adapter that is bound to the Enterprise network. |
Network |
||
|
Enterprise Link State |
The link state for the network adapter that is bound to the Enterprise network. If the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected This value is periodically updated by the HPC management service during the discovery operation. |
Network |
||
|
Enterprise NetworkDirect |
Whether or not a Network Direct provider is installed for the Enterprise network. Possible values are True and False. This value is periodically updated by the HPC management service. |
Network |
||
|
Free Disk Space (%) |
Percentage of total usable space on the local disk. |
Cores/memory/disk |
||
|
Groups |
The node groups to which the node belongs. Membership in the default node groups is determined at deployment or by changing the node role. Membership in custom node groups is determined by the cluster administrator. |
Status/workload |
||
|
HPC SOA Calculations/Sec |
Current calculating calls from the broker node. This is a sliding average of the past N seconds. This value can be significantly higher than the number of cores due to caching on the service host. The HPC SOA metrics, along with the memory and CPU metrics, can help you determine how to scale your broker nodes. For example, when the SOA throughput, memory, and CPU usage are high on your broker nodes, add more brokers. When these metrics are low, convert some brokers to compute nodes. For more information, see Multiple roles and broker scaling. |
SOA |
||
|
HPC SOA Faults/Sec |
The number of faulted calls on the node per second. |
SOA |
||
|
HPC SOA Requests/Sec |
The number of requests to the broker node per second. |
SOA |
||
|
HPC SOA Responses/Sec |
The number of responses on the broker node. This is a sliding average of the past N seconds. |
SOA |
||
|
Idle |
Whether or not the workstation node is idle. Possible values:
|
Status/workload |
||
|
Install Path |
The path where the Microsoft HPC Pack software is installed. This value is not listed for Azure Nodes. |
Deployment |
||
|
Installed Service Roles |
The HPC node roles that are installed on the node. Node roles that are installed can be enabled or disabled by changing the node role (enabled roles are listed in the Node Role property). For more information, see Understanding Node Roles in Windows HPC Server 2008 R2. Dedicated, on-premises nodes can have the following node roles installed:
Azure Nodes can have one of the following node roles installed:
Workstation nodes can have the following role installed:
|
Deployment |
||
|
Location |
The primary, secondary, and tertiary locations details for the node. For example, data center, server rack, chassis. This property value can be specified by the cluster administrator. |
Deployment |
||
|
LUN Mapping |
A GUID that identifies the iSCSI boot node. |
Deployment |
||
|
Machine Guid |
The SMBIOS GUID of the node. |
Deployment |
||
|
Management Ip Address |
The out-of-band management IP address for the node that you can use for scriptable power control tools such as Intelligent Platform Management Interface (IPMI) scripts. For example, this can be set to the IP address for the Base Management Controller (BMC) of the compute node. For more information, see Scriptable Power Control Tools This property value can be set by the cluster administrator. |
Deployment |
||
|
Memory |
The amount of memory installed on the node. |
Cores/memory/disk |
||
|
Memory Paging (Hard Faults/second) |
The number of hard page faults per second. A hard fault occurs when the address in memory of part of a program is no longer in main memory, but has been swapped out to the paging file, making the system go looking for it on the hard disk. When this occurs a lot, it causes slowdowns and increased hard disk activity. When it occurs excessively, the possibility of hard disk thrashing arises (when a program stops responding, but the hard drive continues to run for an extended period). |
Cores/memory/disk |
||
|
Name |
The name of the node, including the domain. For example, DOMAIN\nodename. For Azure Nodes, this name is AZURE\nodename. |
Deployment |
||
|
NetBoot MAC Address |
The MAC address of the network adapter that is bound to the Private network. This is the network that is used when deploying an operating system image to the node (PXE boot). |
Deployment |
||
|
Network Usage (Bytes/second) |
An indication of the total network throughput for all networks on a node. This does not include Network Direct traffic, because Network Direct bypasses TCP/IP. |
Network |
||
|
Node Health |
The overall indication of node health. Indicates whether or not there are any warnings or errors that the HPC services are aware of on that node, if the node is performing an operation that was initiated by the cluster administrator, or if the node has not been added to the cluster. For information about node health values, see Understanding Node States, Health, and Operations. |
Status/workload |
||
|
Node Name |
The name of the node. For nodes that are deployed from bare metal, this name is automatically assigned according to the node naming series that the cluster administrator defines in the node template. For Azure Nodes, the name starts with “AzureCN-” followed by a number. For example, AzureCN-0001. |
Deployment |
||
|
Node Role |
The node roles that are enabled for the node. Dedicated, on-premises nodes can have more than one role enabled, depending on what roles are installed (installed roles are listed in the Installed Service Roles property). Possible values:
The head node role is not displayed in this property. For more information, see Understanding Node Roles in Windows HPC Server 2008 R2. |
Status/workload |
||
|
Node State |
The node’s deployment state, or whether or not an administrator wants the node to be available as a resource for cluster jobs (Online or Offline). For information about node state values, see Understanding Node States, Health, and Operations. |
Status/workload |
||
|
Node Template |
The name of the node template that was used to deploy the node or to join the node to the cluster. |
Deployment |
||
|
OS Architecture |
The operating system architecture on the node. |
Deployment |
||
|
OS Version |
The operating system version on the node. |
Deployment |
||
|
Primary HeadNode |
For a head node that is configured as a fail-over cluster, the active head node has a value of True for this property, and the passive head node has a value of False. |
Status/workload |
||
|
Private IP |
The IP address for the network adapter that is bound to the Private network. |
Network |
||
|
Private Link Speed |
The link speed for the network adapter that is bound to the Private network. |
Network |
||
|
Private Link State |
The link state for the network adapter that is bound to the Private network. If your cluster topology does not include an Private network, or if the node is not connected to this network, the value appears as Disconnected. Possible values are Connected and Disconnected This value is periodically updated by the HPC management service during the discovery operation. |
Network |
||
|
Private NetworkDirect |
Whether or not a Network Direct provider is installed for the Private network. Possible values are True and False. This value is periodically updated by the HPC management service. |
Network |
||
|
Processors |
Name and properties of the processors that are installed on the node. |
Cores/memory/disk |
||
|
Product Key |
The Windows product key that will be used to activate the operating system on the node. This property value can be specified by the cluster administrator. |
Deployment |
||
|
Progress |
The most recent deployment log entry during deployment or provisioning operations. You can sort by this column to help monitor deployment progress. |
Deployment |
||
|
Provisioned |
Whether or not Microsoft HPC Pack is installed on the node. Possible values are True and False.
|
Deployment |
||
|
Running Jobs |
The number of jobs that are currently using this node. |
Status/workload |
||
|
Running Tasks |
The number of tasks, subtasks, or task processes (such as an MPI rank) that are currently using this node. The number can be higher than the number of physical cores or sockets if the subscribed cores or sockets properties are set on the node. |
Status/workload |
||
|
Service Health |
The overall indication of the health of the HPC services. Indicates whether or not there are any warnings or errors that the HPC services are aware of on that node. |
Status/workload |
||
|
Size |
The size of the Azure Node instance. The size determines number of CPU cores, memory capacity, and disk space as defined by Windows Azure.Possible values:
This value is specified by the cluster administrator when adding Azure Nodes to the cluster. |
Azure |
||
|
Sockets |
The number of physical sockets on the node. |
Cores/memory/disk |
||
|
Subscribed Cores |
The number of logical cores that the HPC Job Scheduler Service will use when it is allocating tasks to the node. It can be larger or smaller than the number of physical cores.Note: The “cores in use” metric reflects how many physical cores are in use. The “running tasks” metric can help you monitor how many subscribed cores are in use. This value is set by the cluster administrator. For more information, see Over-subscribe or under-subscribe core or socket counts on cluster nodes. |
Cores/memory/disk |
||
|
Subscribed Sockets |
The number of logical sockets that the HPC Job Scheduler Service will use when it is allocating tasks to the node. It can be larger or smaller than the number of physical sockets. This value is set by the cluster administrator. For more information, see Over-subscribe or under-subscribe core or socket counts on cluster nodes. |
Cores/memory/disk |
||
|
System Calls / second |
This counter is a measure of the number of calls made to the system components, Kernel mode services. This is a measure of how busy the system is taking care of applications and services—software stuff. When compared to the Interrupts/Sec it will give you an indication of whether processor issues are hardware or software related. |
Cores/memory/disk |
||
|
UnattendSetup |
Whether or not setup.exe ran with the –unattend flag. |
Deployment |
||
|
Version |
The version number of Microsoft HPC Pack that is installed on the node. For example:
|
Deployment |
||
|
Windows Azure Node Address |
The IP address of the Azure Node. This value is assigned by Windows Azure. For a list of the public IP ranges, see the posted IP Ranges. |
Azure |
||
|
Windows Azure Service Name |
The public name of the hosted service (in the Windows Azure subscription) in which this Azure Node is deployed. This value is defined by the cluster administrator in the node template. |
Azure |
||
|
Windows Azure Storage Service Name |
The public name of the storage account (in the Windows Azure subscription) that is associated with the Azure Node. This value is defined by the cluster administrator in the node template. |
Azure |
||
|
Windows Azure Subscription ID |
The unique ID for the Windows Azure subscription account associated with the Azure Node. This value is defined by the cluster administrator in the node template. |
Azure |
The following lists group the properties and metrics by functional categories so that you can quickly identify what values are available for different aspects of the cluster. These lists can help you select which values to display in custom node views to help monitor different aspects of cluster performance. In the following lists, the names of metrics and of node properties that reflect node status are denoted by bold font.
Cores/memory/disk
-
Processors
-
Cores
-
Sockets
-
Cores In Use
-
CPU Usage (%)
-
Context Switches / second
-
System Calls / second
-
Affinity
-
Subscribed Cores
-
Subscribed Sockets
-
Memory
-
Available Physical Memory (MBytes)
-
Memory Paging (Hard Faults/second)
-
Free Disk Space (%)
-
Disk Queue Length
-
Disk Throughput (Bytes/sec)
Status/workload
-
Node State
-
Node Health
-
Node Role
-
Groups
-
Primary HeadNode
-
Service Health
-
Idle
-
Running Jobs
-
Running Tasks
SOA
-
Durable Queues Total Bytes
-
Durable Queues Total Messages
-
Durable Requests Queue
-
Durable Responses Queue
-
HPC SOA Calculations/Sec
-
HPC SOA Faults/Sec
-
HPC SOA Requests/Sec
-
HPC SOA Responses/Sec
Network
-
DNS Name
-
Domain Name
-
Enterprise IP
-
Enterprise Link Speed
-
Enterprise Link State
-
Enterprise NetworkDirect
-
Private IP
-
Private Link Speed
-
Private Link State
-
Private NetworkDirect
-
Application IP
-
Application Link Speed
-
Application Link State
-
Application Network Direct
-
Network Usage (Bytes/second)
Deployment
-
Name
-
Node Name
-
Node Template
-
Description
-
Location
-
Machine Guid
-
NetBoot MAC Address
-
Boot Information
-
Install Path
-
Version
-
Installed Service Roles
-
OS Architecture
-
OS Version
-
Product Key
-
Management Ip Address
-
LUN Mapping
-
Provisioned
-
UnattendSetup
-
Progress
Azure
-
Size
-
Windows Azure Node Address
-
Windows Azure Service Name
-
Windows Azure Storage Service Name
-
Windows Azure Subscription ID
SP1 additions
The following properties or metrics were added in Service Pack 1 of Microsoft HPC Pack 2008 R2. These changes are related to the abililty to add Windows Azure nodes to the cluster. For more information, see Deploying Windows Azure Nodes in Windows HPC Server 2008 R2.
-
Size
-
Windows Azure Node Address
-
Windows Azure Service Name
-
Windows Azure Storage Service Name
-
Windows Azure Subscription ID
SP2 additions
The following properties or metrics were added in Service Pack 2 of Microsoft HPC Pack 2008 R2. These changes are related to the ability to oversubscribe and undersubscribe nodes. For more information, see Over-subscribe or under-subscribe core or socket counts on cluster nodes.
-
Affinity
-
Subscribed Cores
-
Subscribed Sockets


Note