New Feature Evaluation Guide for Windows HPC Server 2008 R2
Updated: May 17, 2011
Applies To: Windows HPC Server 2008 R2
This guide provides scenarios and steps to try new features in Windows® HPC Server 2008 R2. You can download the Windows HPC Server 2008 R2 Suite Evaluation on the Microsoft download center (http://go.microsoft.com/fwlink/?LinkId=198810).
Important |
|---|
| Read the Release Notes for Windows HPC Server 2008 R2 before following the steps in this guide. |
This guide includes the following scenarios:
-
Use the patching wizard to add software updates to a node template
-
Use workstations to run cluster jobs
-
Create customizable dashboards that allow you to monitor nodes at a glance
-
Save a command or script as a diagnostic test in HPC Cluster Manager
-
Optimize job scheduling for SOA jobs and interactive workloads
-
Manage SOA service configuration settings from a single location
-
Enable and collect trace logs to troubleshoot SOA sessions
-
Provide accurate job prioritization for your cluster
-
Check for license availability before a job is started
-
Stop a running job or task immediately
-
Exclude particular nodes from running tasks in your job
-
Receive notification when your job is done
-
Provision or clean up the nodes that are allocated to your job
-
Provide custom job progress information
-
Allow canceled tasks time to save state information or clean up before exiting
The scenarios in this section help you try new management features in Windows HPC Server 2008 R2.
|
Scenario |
You have deployed cluster nodes, and now you want to use the node templates to manage and apply software updates (patches) to the nodes. |
||
|
Goal |
Use the Add Software Updates Wizard to add an Apply Updates task to a node template. |
||
|
Requirements |
|
||
|
Steps |
The Maintenance phase of a node template can include an Apply Updates task, with settings that you configure for which updates to apply. When you run the Maintain action on nodes, the Apply Updates task downloads updates to the compute nodes from the Microsoft Update website or the WSUS server in your enterprise, and then installs the updates.
The following procedure describes how to add the Apply Updates task to a node template. The node template must have already been used to deploy one or more nodes.
|
||
|
Expected results |
An Apply Updates task is added to the Maintenance phase of the selected node template. |
||
|
Related Resources |
For more information about applying updates by using and enterprise WSUS server or by using a node template, see the Best Practices topic in the updating nodes step-by-step guide (http://go.microsoft.com/fwlink/?LinkId=194794). |
|
Scenario |
You have powerful workstation computers that are not utilized overnight and on weekends. You want to harvest this processing power to run cluster jobs.
|
||
|
Goal |
Add workstation nodes to your HPC cluster and define a weekly availability policy to control when these nodes are brought online. |
||
|
Requirements |
|
||
|
Steps |
|
||
|
Expected results |
Workstation nodes come online and go offline according to the configured availability policy. |
||
|
Related Resources |
Adding Workstation Nodes in Windows HPC Server 2008 R2 Step-by-Step Guide |
|
Scenario |
When administrating clusters of up to 1000 nodes, you need the ability to create customizable dashboards that allow you to monitor several node metrics for the entire cluster at a glance. To more easily identify outliers and bottlenecks and quickly switch between views, you can create multiple node list or heat map tabs that focus on sets of information such as:
|
|
Goal |
Create one or more new tabs in Node Management. |
|
Requirements |
|
|
Steps |
To change the settings on a tab, right-click the tab name, then click Customize Tab. If you are creating a Heat Map tab, you can customize the following display options:
|
|
Expected results |
|
|
Related Resources |
HPC R2 Demo: New heat map and location-based node management features – video (7 min.) |
|
Scenario |
When managing your cluster, there are some commands or scripts that you run regularly to check the status of your nodes. You would like to be able to run your own tests and the built-in tests from a single location. |
|
Goal |
Save the |
|
Requirements |
|
|
Steps |
Step 1: Define the test
Step 2: Add the test to the cluster
Step 3: Run the test and view results
|
|
Expected results |
The results from the test should look similar to this: NODE 1 - - > Finished ------------------------------------ Total # of free bytes : 33324670976 Total # of bytes : 41910938752 Total # of avail free bytes : 33324670976 |
|
Related Resources |
|
The scenarios in this section help you try new SOA scheduling and runtime features in Windows HPC Server 2008 R2.
|
Scenario |
Your cluster runs mostly interactive workloads, such as service-oriented architecture (SOA) jobs. One or two large jobs may be taking up most of the cluster, but there are many other interactive jobs that need to run. You want as many jobs to start as possible, rather than having most of the resources allocated to the top of the job queue. To optimize job scheduling for interactive workloads, you can change the scheduling mode from Queued to Balanced. In Balanced mode, the scheduler attempts to start all incoming jobs as soon as possible at their minimum resource requirements. After all the jobs in the queue have their minimum resources, additional cluster resources are allocated to jobs based on their load and priority. Resource allocation is periodically rebalanced to fill idle resources and accommodate new jobs. |
|
Goal |
Change the scheduling mode from Queued to Balanced. |
|
Requirements |
|
|
Steps |
After you have set the Balanced mode, you can adjust how additional resources are allocated with the PriorityBias setting, and how often the scheduler rebalances with the ReBalancingInterval setting. PriorityBias controls how additional resources are allocated to jobs. In terms of Balanced mode, “additional resources” refers to cluster resource above the total minimum resources for all running jobs. Tasks that are running on additional resources can be canceled with immediate preemption to accommodate new jobs or to converge on the desired allocation pattern. You can choose from the following three options:
ReBalancingInterval represents the time, in seconds, between scheduler rebalancing passes. You can use one of the following methods to change Priority Bias and ReBalancingInterval:
|
|
Expected results |
Jobs are started as soon as possible at their minimum resources requirements. If all jobs in the queue have started, all remaining resources in the cluster are added to jobs based on their priority and workload. As new jobs start, cluster resources are reallocated in proportion to each job’s priority. |
|
Related Resources |
Queued mode is priority-based, first come first served scheduling like in Windows HPC Server 2008. For information, see Understanding Job Scheduling Policies (http://go.microsoft.com/fwlink/?LinkId=177866). |
|
Scenario |
You have multiple SOA services installed to a central location on your cluster, and you want the ability to see all of the deployed services, change settings to help diagnose and troubleshoot specific services, and modify the service configuration files from a centralized location. In HPC Cluster Manager, in Configuration, the Services view lets you:
|
|
Goal |
Add a service on the cluster and manage the service configuration settings from HPC Cluster Manager. |
|
Requirements |
|
|
Steps |
|
|
Expected results |
|
|
Related Resources |
|
|
Scenario |
You have a development cluster and you are testing SOA clients and services. Your service DLL includes code to generate trace information. |
||
|
Goal |
Enable tracing on the head node and collect the trace logs from each node that was used during the session. |
||
|
Requirements |
|
||
|
Steps |
When you enable tracing in the service configuration file, the trace information is logged to a file on the compute nodes. The log files trace steps from the service call and the intermediate results on cluster. You can collect and remove traces by using the Job Management view or the HPC PowerShell cmdlets. You can view the trace log files with the WCF Service Trace Viewer (SvcTraceViewer.exe).
|
||
|
Expected results |
Easily enable and retrieve service tracing. |
||
|
Related Resources |
|
The scenarios in this section help you try new job scheduling and runtime features in Windows HPC Server 2008 R2.
|
Scenario |
Your cluster serves many departments and user groups, and you need accurate job prioritization to meet business needs. Each department has a prioritized list of jobs, and you want the jobs from each department to run in the requested order. Occasionally, you need to make adjustments to the order of the job queue based on particular circumstances or needs. Priority and submit time help determine when the job will run, and how many resources the job will get. When multiple jobs are submitted with the same priority level, the jobs scheduler attempts to start the jobs in each priority level on a first-come, first-served basis. To ensure that business need has a stronger impact on the order of the job queue than submit time, you ask cluster users to specify a granular priority level for each job. |
|
Goal |
Users submit jobs with numerical priority levels. When necessary, manually adjust priority levels on submitted jobs. |
|
Requirements |
|
|
Steps |
In HPC Pack 2008 R2, the job priority can have a value between 0-4000. Users can specify priority in terms of a priority band, a priority number, or a combination of the two. The priority bands and their corresponding numerical values are as follows:
The numerical priority can have a value between 0 (Lowest) and 4000 (Highest). If you enter a value numerically, it is displayed as the corresponding priority band, or as a combination. For example, if you specify a value of 2500, the priority is displayed as Normal+500. Monitor and adjust the job queue Cluster administrators and job owners can modify the Priority job property for any active job (Queued or Running).
|
|
Expected results |
|
|
Related Resources |
|
Scenario |
Your cluster runs several applications that use licenses that are shared on a licensing server. You want to:
The HPC Job Scheduler Service can run a custom activation filter on queued jobs that are about to start. A job activation filter is a custom application that you can write to provide additional checks and controls, such as checking for license availability. Depending on the return value from your filter, the HPC Job Scheduler Service takes the appropriate action on the job. The HPC 2008 R2 SDK samples include an example of an activation filter that checks for license availability against a FlexLM license file. |
||
|
Goal |
Build and try the Activation Filter sample that is included in the HPC 2008 R2 SDK samples (HPC2008R2.SampleCode.zip). The sample is a Visual Studio 2008 project named FlexLM.sln that is in the HPC2008R2.SampleCode \Scheduler\Activation Filter folder. |
||
|
Requirements |
The following must be installed on the head node:
|
||
|
Steps |
FlexLM.sln includes a sample activation filter that checks for license availability and the FlexLM.exe.config file that you can use to specify the location of the FlexLM utilites and license file. In the FlexLM projects properties, there are custom pre-build event commands and post-build event commands. The pre-build commands are used to create the files needed to create event log entries. The post-build commands unregister any old version of the FlexLM activation filter and then register the new version so that it can create events and the event viewer can display them. The commands assume that Visual Studio is creating the files in c:\Program Files\Microsoft HPC Pack 2008 R2\Bin\. The cluscfg command tells the HPC Job Scheduler to use the new Activation Filter. The following steps describe how to configure and build the solution:
To test the filter, submit jobs to the cluster that require licenses. |
||
|
Expected results |
The following list describes the supported exit codes for an activation filter, and the corresponding Job scheduler action:
|
||
|
Related Resources |
None. |
|
Scenario |
You want to stop a running job or task immediately. In HPC Server 2008 R2, a cluster administrator defines a Task Cancelation Grace Period that can allow tasks that are canceled time to save state information and clean up before exiting. To use the grace period, the application must process the You can force cancel a job or task to skip grace periods and node release tasks. |
|
Goal |
Force cancel a job or task. |
|
Requirements |
|
|
Steps |
|
|
Expected results |
Force cancelling a task: the task stops immediately and does not use the Task Cancel Grace period (the application must process the Force cancelling a job: the job stops immediately. The tasks in the job do not use the Task Cancel Grace period, and the Node Release task does not run. |
|
Related Resources |
|
Scenario |
You notice that one particular node keeps failing tasks in your job. You want the job scheduler to stop scheduling your tasks on that node. In Windows HPC Server 2008 R2, you can specify a list of nodes to exclude from your job.
|
||
|
Goal |
Add one or more nodes to the Excluded Nodes job property. See all excluded nodes on the cluster (Administrator). |
||
|
Requirements |
|
||
|
Steps |
Defining excluded nodes for a job For any active job, you can add or remove nodes in the Excluded Nodes jobs property, or clear the list. The following lists the commands to modify and view the Excluded Nodes list using HPC PowerShell or a command prompt. In HPC PowerShell, use the following cmdlets:
At a command prompt, use the following commands:
Monitoring excluded nodes on the cluster To see all excluded nodes on a cluster, use the Get-HpcJob PowerShell cmdlet. The following example shows how to list all of the excluded nodes for jobs that were submitted today. The script also lists the job template that was used for the job that excluded the node. In the following cmdlet, <today’s date> is specified in a date format such as mm/dd/yyyy:
If the cluster administrator detects and resolves the issue on one or more nodes, the administrator can remove the fixed node from any node exclusion list in which it appears. The following cmdlet gets all active jobs and removes the fixed nodes from the node exclusion lists (this has no effect on jobs that do not list the specified nodes):
|
||
|
Expected results |
|
||
|
Related Resources |
|
Scenario |
You submitted a long-running job to the cluster, and would like to be notified when the job is done. |
|
Goal |
Enable email notification on the cluster and submit a job that requests notification on job completion. |
|
Requirements |
|
|
Steps |
Enable email notification on the cluster:
Submit a job that requests notification on completion:
|
|
Expected results |
If notification is selected for a specific job, and email notification is enabled on the cluster, job owners receive the requested email messages to the e-email account that is associated with their domain credentials. |
|
Related Resources |
None. |
|
Scenario |
You want to perform some basic provisioning of the nodes that are allocated to your job. For example, you may want to copy files or verify the running environment before your primary tasks run. To prepare the nodes that are allocated to your job, you can add a Node Preparation task to your job. After your tasks complete, you need to collect data or log files from the nodes that were allocated to your job or return the nodes to their pre-job state. To clean up nodes after running your primary tasks, you can add a Node Release task to your job. |
||||
|
Goal |
Submit a job with Node Preparation and Node Release tasks. |
||||
|
Requirements |
|
||||
|
Steps |
For detailed step-by-step instructions, see Submitting a Job with Node Preparation and Node Release Tasks in Windows HPC Server 2008 R2 Step-by-Step Guide.
Now try to cancel a Running job that includes a Node Release task. |
||||
|
Expected results |
|
||||
|
Related Resources |
|
Scenario |
Many of the applications that you run on your cluster run for a long time, and they consist of many internal stages. To better monitor job progress, you want to be able to see information about the percentage of completion or about the internal state of the application (such as data file loaded, running simulation, or writing data). You can include commands in your application or script files to set and maintain custom job progress information with the Progress and Progress Message job properties.
|
|
Goal |
Set and maintain values for job Progress and Progress Message from an application or script. |
|
Requirements |
|
|
Steps |
Include commands to set Progress and Progress Message in your scripts or applications. For example, if your application includes a loop that performs some work, you can update the progress properties at each iteration. To set the Progress and Progress Message properties in a batch (.bat) file, an HPC PowerShell script (.ps1), or an application, you can use the %CCP_JOBID% environment variable to get the job ID of the current job, as follows:
You can use one of the following methods to see the progress information for a running job:
|
|
Expected results |
|
|
Related Resources |
|
Scenario |
When a running task is stopped during execution, you want to allow time for the application to save state information, write a log message, create or delete files, or for services to finish computation of their current service call. You can configure the amount of time, in seconds, to allow applications to exit gracefully by setting the Task Cancelation Grace Period cluster property. The default Task Cancelation Grace Period is 15 seconds.
|
||
|
Goal |
Allow tasks that are canceled time to perform cleanup or completion steps before exiting. |
||
|
Requirements |
|
||
|
Steps |
You can use one of the following methods to change the Task Cancellation Grace Period to 10 seconds:
|
||
|
Expected results |
|
||
|
Related Resources |
None. |
Important