Understanding Diagnostic Tests

Applies To: Windows HPC Server 2008

Windows HPC Server 2008 provides a set of commonly-used diagnostic tests. You can use these tests to help verify deployment, troubleshoot failures, and detect performance degradation.

Diagnostic tests are conceptually grouped by suite. The following table describes the tests in each suite. For more information about the diagnostic tests, see the Operations section (https://go.microsoft.com/fwlink/?LinkId=120726) of the Windows HPC Server 2008 Technical Library.

Suite Test Description

Scheduler

Job Submission Test

Submits a simple job to the HPC Job Scheduler Service using the clusrun command. This test verifies that the HPC Job Scheduler Service can accept and run a job on a set of user-specified compute nodes.

Services

All Services Running

Verifies that Windows HPC Server 2008 services are running on the selected nodes. Expected services are determined by the role of the target node (head node, compute node, or WCF broker node).

Connectivity

DNS Name Resolution

Verifies Domain Name System (DNS) name resolution between user-selected compute nodes and reports mismatches between node pairs. During the test, each node attempts to resolve the name of every other node in the cluster using DNS and compares the name with the HPC Management Service records. HPC Management Service records are updated dynamically by an agent running on the test node, ensuring that the test is between the address recorded by DNS and the actual physical IP address of the node.

Domain Connectivity

Verifies connectivity between a node and each domain controller. This is performed by utilizing a simple Lightweight Directory Access Protocol (LDAP) query to look up an Active Directory's RootDSE object.

Internode Connectivity

Verifies network connectivity between compute nodes by performing a ping test between each node and all other nodes in the selected group.

System Configuration

Application Configurations Report

Reports on the application configuration of the selected nodes.

Firewall Configurations Report

Reports on the firewall configuration of the selected nodes.

Installed Software Updates Report

Reports on the updates (patches) that have been installed on each selected node. This test can take a long time.

Network Configurations Report

Reports on the network configuration of the selected nodes.

Pending Software Updates

Provides an overall list of updates that are available for all the nodes as well as a list of updates that are available for each node. This test reports on the updates (patches) identified as critical by Windows Server Update Services (WSUS) or Microsoft Update (MU).

This test fails if the winhttp proxy is not set on the compute node. Run the netsh winhttp show proxy command to see if the compute nodes have a proxy server set.

Service Configurations Report

Reports on the services configured on each selected node.

Software Updates Required

Compares the updates that are installed on the compute node against the updates specified in the node template. The report indicates if any compute nodes failed to meet the required update level that is specified in the template.

SOA

SOA Model Latency

Verifies network connectivity and measures network latency over HTTP and NetTCP on user-selected nodes or node groups. This test will report any nodes where a Service-Oriented Architecture (SOA) session was not started successfully on either the HTTP or NetTCP bindings. This test also sorts nodes into three latency response categories: Less than 5 milliseconds, Between 5 and 10 milliseconds, and Greater than 10 milliseconds.

SOA Service Configurations Report

Reports on the SOA service configuration of the selected nodes. This test displays the service name, location of the service assembly, service and contract type, architecture (x86 or x64) and environment variables. A service is displayed as being installed on all the compute nodes if the service registration file is installed on a file share designated by the CCP_SERVICEREGISTRATION_PATH environment variable, and the file share is readable by everyone.

Performance

MPI Ping Pong: Lightweight Throughput

Provides a fast measure of the network throughput between each node and two of its “neighbors”. Unlike latency measurements, throughput measurements highly stress a cluster’s network switches. This test reports average throughput, standard deviation, best link (the node pair with the highest measured throughput and the throughput value), worst link (the node pair with the lowest measured throughput and the throughput value), variability rating (a qualitative indication of consistency of throughput across the entire cluster), and histogram data (the number of network links measured in each of several throughput ranges). The throughput for given pair is calculated as the average (over 16 iterations) data transfer rate, in Mbytes/sec.

MPI Ping Pong: Quick Check

Provides a fast measure of the network latency between each pair of nodes in the cluster. This test reports average latency, standard deviation, best link (the node pair with the lowest measured latency and the latency value), worst link (the node pair with the highest measured latency and the latency value), variability rating (a qualitative indication of consistency of latency across the entire cluster), and histogram data (the number of network links measured in each of several latency ranges). The latency for given pair is calculated as the average (over 1024 iterations) of one-half the round-trip time, in micro-seconds.

If highly accurate measurements are required, you can use the command line version of MPI Ping Pong (mpipingpong.exe) provided with Windows HPC Server 2008 to make latency measurements on each link serially. For more information, see the Windows HPC Server 2008 Command Reference (https://go.microsoft.com/fwlink/?LinkId=120724).

Additional references