Compute Cluster Server Command Line Interface Reference

Applies To: Windows Compute Cluster Server 2003

This reference provides command descriptions and syntax for all cluster-specific command line executables used in Windows 2003 Compute Cluster Server. The Command Line Interface (CLI) commands provide a keyboard alternative to most actions otherwise performed using the Job Manager or Administrator interfaces.

All commands in this reference except clusrun follow this general syntax:

<command> <operator>[options]

—Or—

<command> <operator>[options] <command_line>[arguments]

There are five commands in the CLI:

  • job

  • task

  • node

  • cluscfg

  • clusrun

Each command (except clusrun) has its own set of operators. It is the combination of a command and an operator that constitutes a CLI executable. For example, job new creates a new job.

Each operator has a set of options. Options are all preceded by a forward slash and take the form /<option>:<value>. For example, job new /jobname:my_job creates a job named my_job.

The command line parameter is the command line of a task. Arguments are the arguments associated with that command line.

To obtain help for a CLI command, type:

<command> /? or <command> /help

To obtain help for a CLI command operator, type:

<command> <operator> /? or <command> <operator> /help

Syntax conventions

This document uses the following special syntax conventions:

  • /scheduler:<host> is a universal option and is used to specify a host other than the local host.

  • /jobfile:<template_file> is the XML file in which the specifics of a job and its tasks are stored.

  • /taskfile:<template_file> is the equivalent of /jobfile:<template_file> when the template file is being used as a task property.

  • standard_job_options are one or more of a set of options applicable to a job. These are defined once for job new and afterward are referenced.

  • standard_task_options are a set of options applicable to a task. These are defined once for job add and afterward are referenced.

  • task_options_subset is the subset of standard_task_options that can be used with the job submit command.

  • credential_options is the option pair /user:<domain\user> /password:<password> and applies when the job is run under a user other than the invoking user.

  • jobId is the system-assigned identification number for a job.

  • jobId.taskID is the system-assigned identification number for a task.

Job command

The job command is used to create, submit, view, and manage jobs. The job command operators are:

  • add

  • cancel

  • list

  • listtasks

  • modify

  • new

  • requeue

  • submit

  • view

job add

Adds a task to the task queue for a specified job and returns a unique task ID. Tasks can be added only to jobs in the Not_Submitted, Queued or Running state.

SYNOPSIS

job add <JobID> [standard_task_options] [/scheduler:<host>] <command> [arguments]

Standard Task Options
Option Description Maximum Characters

/name:<task_name>

Name of the task.

80

/numprocessors:<min_processors> or <min_processors>, <max_processors>

Minimum and maximum number of processors to be allocated. The default is one processor.

N/A

/rerunnable: true | false

A flag indicating that that a task can be rerun after a failure. Default is true.

The scheduler allows a failed job to be requeued if the failure is due to any error that can be fixed without changing the task command line. If the task or task fails for reasons of system failure (for example, a node crashes), the scheduler requeues the job automatically. Only incomplete tasks are re-run.

N/A

/requirednodes:<node1>,<node2>,…<nodeN>

Specifies by name the nodes to be allocated to the task. /requirednodes overrides /numprocessors and also forces the job to reserve the nodes that are specified.

2080

/env:name1=val1 /env:<name2=val2 … /env:nameN=valN>

Specifies the environment variables for the task. (For more information about environment variables, see Use Environment Variables.)

2048

/exclusive: true | false

A flag indicating that the task has exclusive use of reserved nodes.

N/A

/runtime:[[[days:<num>]hours:<num>]minutes: <num>| infinite]

Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite.

8

/workdir:<path>

The full path of the working directory (the directory for input, output, and error files). The path may contain environment variables. Default is %USERPROFILE%.

160

/stdin:<file_name>

Take standard input for the task from file <file_name>.

160

/stdout:<file_name>

Redirect standard output of the task to the file <file_name>.

160

/stderr:<file_name>

Redirect standard error of the task to a file <file_name>.

160

/depend:<task_name1>

Specifies that this task depends on a task or tasks of the name <task_name>. (Multiple tasks of the same name will have different task IDs.) If multiple tasks of different names are depended on, the job add command needs to be repeated:

job add 21 /name:task3 /depend:task1 myapp3.exe

job add 21 /name:task3 /depend:task2 myapp3.exe

320

An additional task option is:

Option Description

/taskfile:<template_file>

Overwrites the contents of the task with the values in this job template file, except where a different value is explicitly set in the command line.

job cancel

Terminates the running job and cancels all of its resource reservations.

SYNOPSIS

job cancel [options] [/scheduler:<host>] <jobID>

Option Description

/message:<msg_string>

Msg_string is an optional user-written log entry for session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes.

job list

Lists all jobs in the cluster. The output contains a table for each job in the following form:

Job ID Description

USER

Submission user

NAME

User-specified name of job

STATUS

Not_Submitted, Queued, Running, Cancelled, Finished or Failed

PRIORITY

Highest, AboveNormal, Normal, BelowNormal, Lowest

By default, if the invoking user is an administrator, then all jobs are listed. If the invoking user is a user without administrative rights, only his or her jobs are listed.

By default, only active (queued or running) jobs are displayed.

SYNOPSIS

job list [options] [/scheduler:<host>]

Options
Option Description

/user:[<user_name> | *]

Show only jobs of the user <user_name>. If the keyword ‘*’ is specified, then all users jobs will be displayed.

/status:[<stat1,stat2,…,statN> | *]

Display the jobs of each status specified. (Not_Submitted, Queued, Running, Cancelled, Finished or Failed) If the keyword /all is used, both active and completed jobs will be displayed. If completed jobs are displayed, one more column, Complete_Time is displayed onscreen.

/all

List all the jobs in the system. This is the equivalent of “job list /user:* /status:*

job listtasks

List the tasks of the job <jobID>. Each task will be displayed with the following fields:

Task ID Description

Status

Status of the task.

Name

Name of the task.

Command line

Command line of the task.

Number of processors

Minimum and maximum number of processors.

Execution nodes

Compute nodes the task runs on. If the task is not running, this value is blank.

SYNOPSIS

job listtasks [/scheduler:<host>] <jobId>

job modify

Modifies a queued or running job. For jobs in the Queued state, all modified terms take effect immediately. For jobs in the Running state, only changes to the following options take effect:

  • /runtime:

  • /rununtilcancelled

  • /projectname:

The following modified terms will not take effect unless and until the job is requeued:

  • /jobname:

  • /numprocessors:

  • /askednodes:

  • /priority:

  • /license:

  • /exclusive

If /numprocessors: is specified at both the job and task level, the values for the job must be a superset of the values for the tasks.

If /runtime is specified at both the job and task level, the job run time must be equal to or greater than the longest task run time.

SYNOPSIS

job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>

job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

job modify [/jobfile:<template_file>] [credential_options] [jobterm_options] [/scheduler:<host>] <jobId>

Option Description

Standard job options

For job options, see job new.

Credential options

This is typically used when there is a need to update the user’s password for the Job Scheduler.

/jobfile:<template_file>

Overwrites the contents of the job with the values in this job template file, except where a different value is explicitly set in the command line.

Examples:

job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>

This command overwrites the content of an existing job jobId with the job options specified in the template file. The task options specified in the template file are ignored.

job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

This command updates the job jobId with the job options specified.

job modify [/jobfile:<template_file>] [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

This command overwrites the content of an existing job jobId with the job options specified in the template file, then updates the job options values with those made explicit on the command line.

Note

Modifying the run time for a backfill job (one that has jumped the queue to take advantage of idle reserved nodes) is not permitted, because doing so could delay the reserving job.

job new

Creates a new job and returns a unique job ID. The job is created in the Not_Submitted state and contains no tasks.

SYNOPSIS

job new [standard_job_options] [/jobfile:<template_file>] [/scheduler:<host>]

Option Description

/jobfile:<template_file>

Use the settings in this job template XML file, except where a different value is explicitly set in the command line.

Standard job options
Option Description Maximum Characters

/jobname:<job_name>

Name of the job.

80

/numprocessors:<min_processors> or <min_processors>-<max_processors>

Minimum and maximum number of processors to be allocated. The default is one processor.

N/A

/askednodes:<node1>,<node2>,…<nodN>

Specifies nodes to be allocated to the job by name. By default, all nodes in the cluster are candidates.

2080

/exclusive: true | false

By default, a job has exclusive use of nodes reserved by it. If /exclusive: is set to false, idle, reserved processors on these reserved nodes are available to other jobs. This is reciprocal, making nodes reserved to other jobs available to this job if they have also been flagged as nonexclusive.

N/A

/priority:<priority_class>

Schedule priority class: Highest, AboveNormal, Normal, BelowNormal,or Lowest. Highest and AboveNormal are available only to administrative users. The default is Normal. Within a priority class, the job is placed in the job queue in the order received unless requeued. If requeued, the job always goes to the top of its priority class.

N/A

/runtime:[[[days:<num>]hours:<num>]minutes: <num>| infinite]

Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite.

8

/rununtilcanceled: true | false

Flag indicating that the job will hold its resources until it is cancelled or reaches its run time limit. This way, additional tasks can be run on the nodes.

N/A

/projectname:<project_name>

Name of a project, if any, to which the job belongs.

80

/license:<feature1>:<amt1> /license:<feature2>:<amt2> …/license:<featureN>:<amtN>

License features required to run the tasks in the number of tokens of each.

160

job requeue

Requeues the job specified by jobId. To requeue a job is to stop it and reinsert it as the topmost job in its priority class segment. The job retains its original submission time, not the requeue time. Requeuing can be performed on running, canceled, and, in some cases, failed jobs. Only unfinished tasks are rerun.

By default, a failed job is requeued automatically if the failure is due to a system failure, such as a node reboot. If automatic requeue is not desired, set the task property /rerunnable to false.

A failed job can also be requeued manually if failure is due any error that can be fixed without changing the task command line. For example, a task may call for an input file that is not there or contains errors. Such jobs are not requeued automatically, because the error must first be corrected.

SYNOPSIS

job requeue [/scheduler:<host>] <jobId>

job submit

Submits a new or existing job to the queue.

SYNOPSIS

job submit /id:<jobID> [credential_options] [/scheduler:<host>]

job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]

job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] <command> [arguments]

Options Description

Standard job options

See job options for job new. Option /numprocessors: will apply to both job and task.

Task options subset

See standard task options for job add. Only this subset applies:

/name:

/rerunnable

/workdir:

/stdin:

/stdout:

/stderr:

Examples:

job submit /id:<jobID> [credential_options] [/scheduler:<host>]

This command submits a job created by the job new command by jobID. Only the creator of the job can submit the job.

job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]

This command submits a job based in a job template file. It returns jobID and taskID.

job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] command [arguments]

This command submits creates and submits a job. It returns jobID and taskID

job submit “cmd.exe /k myapp.exe 1> %my_resultdir%\myapp_%CCP_JOBID%.out 2> %my_resultdir%\myapp_%CCP_JOBID%.out”

This command submits the command line of a user executable as a job.

Job view

Displays the details of a specified job.

SYNOPSIS

job view [/scheduler:<host>] <JobID>

Display the details of the specified job, including:

Option Description

Job ID

Job ID.

Status

Not_Submitted, Queued, Running, Cancelled, Finished, or Failed.

Name

Job name specified by the user

Submitted by

Cluster user that submitted the job.

Number of processors

Minimum and maximum number of processors.

Allocated Nodes

Execution nodes.

Submit time

Submission time in date-hour-minute format.

Start time

Time job started, in date-hour-minute format.

End time

Time job ended, in date-hour-minute format.

Number of Tasks

Number of tasks.

Notsubmitted

Number of tasks not yet submitted to the cluster nodes.

Queued

Number of queued tasks.

Running

Number of running tasks.

Finished

Number of finished tasks.

Failed

Number of failed tasks.

Cancelled

Number of cancelled tasks.

Task command

The task command is used to view, cancel, and requeue tasks. The task command operators are:

  • cancel

  • requeue

  • view

task cancel

Terminates the running cancelled task and cancels all of its resource reservations.

SYNOPSIS

task cancel [<options>] <jobID.taskID>

Option Description

/message:<msg_string>

Msg_string is an optional user-written log entry for the session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes.

task requeue

Requeues the task specified by jobId.taskID, stopping it and reinserting it as the next task in the queue.

SYNOPSIS

task requeue [/scheduler:<host>] <jobId.taskId>

task view

Displays the details of task in the following form:

Term Description

Task ID

Task ID.

Status

Status of the task (for example, Finished).

Name

Task name.

Command line

Task command line.

Allocated nodes

Execution node list.

Exit code

Exit code of the task: 0=task finished; any other exit code = task failed.

Submit time

For a failed task, the error message.

For a cancelled task, the cancellation message provided by the user. Default is "cancelled by <invoking user>.”

Start time

Values for the current usage of a task. These include:

  • Kernel mode CPU time of all the processes since the start of the task.

  • User mode CPU time of all the processes since the start of the task.

  • Current total working set size of all the processes.

End time

For each node, displays the processes created on the node.

Kernel time

Kernel mode CPU time used by all processes since the start of the task.

User time

User mode CPU time used by all processes since the start of the task.

Working set

Current total working set size of all processes in the task.

SYNOPSIS

**task view [/scheduler:<host>] <jobId.TaskId> |<**jobId >

EXAMPLE:

task view 101.0

Displays the details of the first task of job 101.

Node command

The node command allows you to add, remove, and manage nodes. The node operators are:

  • approve

  • list

  • pause

  • resume

node approve

Approve an added node. After a node has been added, that node is in a pending state, awaiting approval by an administrative user.

SYNOPSIS

node approve [/scheduler:<host>] <node_name>

node list

Lists the nodes and the status and statistic for each. The output is a table with each row containing the following fields:

Term Description

NODE_NAME

Name of the node.

STATUS

Pending, Ready, Paused, Unreachable.

MAX

Maximum number of job slots available.

RUN

Number of job slots used by running jobs on this node.

IDLE

Number idle job slots available on this node.

SYNOPSIS

node list [/scheduler:<host>]

node pause | resume

Pause a node or resume the activity of a node that is paused. When a node is paused, jobs running on the node continue to run but no new jobs from users without administrative rights will be started. New administrator jobs will be accepted.

SYNOPSIS

node [pause |resume] [/scheduler:<host>] {node_name} [/all]

Option Description

/all

Pause or resume all nodes.

Cluscfg command

The cluscfg command allows monitoring and manipulation of the queue. cluscfg operators include:

  • delcreds

  • listenvs

  • listparams

  • setcreds

  • setenvs

  • setparams

  • view

cluscfg delcreds

Deletes the cached credential of the named user from the invoking user’s cache. If /user is not supplied, the invoking user is assumed.

SYNOPSIS

cluscfg delcreds [/user:<DOMAIN>\<user>] [/scheduler:<host>]

cluscfg listenvs

Lists the cluster-wide environment variables of the cluster.

SYNOPSIS

cluscfg listenvs [/scheduler:<host>]

cluscfg listparams

Returns the following cluster parameters, which are stored in HKLM\System\CurrentControlSet\Services\CCPSchedSvc\Enum.

Parameter Description Default Value

ActivationFilterProgram

Activation filter executable file name.

15 seconds

ActivationFilterTimeout

Activation filter program time-out.

15 seconds

BackFillLookahead

Specification of backfill behavior or number of jobs the scheduler searches to find jobs that can backfill the jobs at the top of the job queue.

<0=search through the entire job queue (default)

0=no backfill

>0=number of jobs to search

EventLogLevel

Sets the level of Job Scheduler events that that appear in the Event Viewer. Levels are:

ActivityTracing- Stop, Start, Suspend, Transfer, and Resume events are displayed.

All-All events are displayed.

Critical-Critical events are displayed.

Error-Critical and Error events are displayed.

Information-Critical, Error, Warning, and Information events are displayed.

Off-No events are displayed.

Verbose-Critical, Error, Warning, Information, and Verbose events are displayed.

Warning-Critical, Error, and Warning events are displayed.

For more information about event levels, see TraceEventType Enumeration (https://go.microsoft.com/fwlink/?LinkId=60988).

Error

HeartbeatInterval

Interval by which the scheduler sends health probes to the Node Manager.

60 seconds

InactivityCount

Number of missing beats (no reply from the health probes) before the Job Scheduler declares the node Unreachable.

3

JobRetryCount

Maximum time the system reruns a job.

3

JobRuntime

Format:<dd>:<hh>:<mm>.

Infinite

SpoolDir

The directory where the output of the clusrun command is redirected.

\\<head_node>\spooldir

SubmissionFilterProgram

Submission filter executable file name.

“”

SubmissionFilterTimeout

Time-out value (in seconds) for the submission filter.

15 seconds

TaskRetryCount

Maximum time the system reruns a task.

3

TTLCompletedJobs

Time in days for completed job records to remain in the MSDE.

5 days

SYNOPSIS

cluscfg listparams [/scheduler:<host>]

cluscfg setcreds

Sets the credential of a named user into the credential cache of the invoking user. If /user is not supplied, the invoking user is assumed. If /password is not provided, the Stored User Names and Passwords UI is prompted.

SYNOPSIS

cluscfg setcreds [/user:<DOMAIN>\<user>] [/password:<password>] [/scheduler:<host>]

cluscfg setenvs

Setting cluster-wide environment variables to specified values

Cluster-wide environment variables are variables that apply to the entire cluster. Cluster-wide environment variables can be viewed or set using the cluscfg command. To add or set a cluster-wide environment variable, you must have administrative credentials. For more information, see Compute Cluster Server Command Line Interface Reference (https://go.microsoft.com/fwlink/?LinkID=64065).

There are two preexisting cluster-wide environment variables. These are set during system deployment and are rarely changed manually or used in commands:

Environment Variable Description

CCP_CLUSTER_NAME

Name of the cluster.

CCP_MPI_NETMASK

Subnet mask for the interface to be used by the MPI process, if a separate MPI network exists. Example:

CCP_MPI_NETMASK=172.30.0.0./255.255.0.0.

MPICH_SOCKET_SBUFFER_SIZE

Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes. 

MPICH_SOCKET_SBUFFER_SIZE

Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes.

The most common example of an added cluster-wide environment variable is Path. Path functions identical to the Windows Path environment variable, but applies to all nodes in the cluster and only in the context of a job task.

SYNOPSIS

cluscfg setenvs “<name1=value1>” “<name2=value2>”… “<nameN=valueN>” [/scheduler:<host>]

To unset an environment variable, use an empty string as the value. Example:

cluscfg setenvs "MY_VAR=”

This unsets the environment variable MY_VAR.

cluscfg setparams

Sets the named parameters to the values specified. Refer to cluscfg listparams for parameter definitions.

SYNOPSIS

cluscfg setparams [TTLCompletedJobs=val] [JobRetryCount=val] [TaskRetryCount=val] [JobRuntime=val|Infinite] [SubmissionFilterProgram=val] [SubmissionFilterTimeout=val] [ActivationFilterProgram=val] [ActivationFilterTimeout=val] [BackFillLookahead=val] [HeartbeatInterval=val] [InactivityCount=val] [SpoolDir=val] [eventloglevel=off|critical|warning|error|information|verbose|activitytracing|all]

[/scheduler:<host>]

cluscfg view

Displays the details of a cluster. The output contains:

Term Description

Cluster name

Name of the cluster.

Total number of compute nodes

Number of nodes in cluster.

Number of ready compute nodes

Number of nodes with Ready status.

Number of paused compute nodes

Number of nodes with Paused status.

Number of unreachable compute nodes

Number of nodes with Unreachable status.

Number of compute nodes pending for approval

Number of nodes with Pending for Approval status.

Total number of processors

Number of processors in the cluster.

Number of idle processors

Number of processors not running tasks.

Number of busy processors

Number of processors running tasks.

Number of jobs not submitted

Number of pending jobs not submitted to the Job Scheduler.

Number of queued jobs

Number of jobs in the CCS job queue.

Number of jobs running

Number of jobs from the queue.

Number of finished jobs

Number of jobs that completed successfully.

Number of failed jobs

Number of jobs that have failed.

Number of cancelled jobs

Number of jobs that have been cancelled.

SYNOPSIS

cluscfg listparams [/scheduler:<host>]

Clusrun command

clusrun is an administrative command that runs an instance of a specified command on multiple nodes, redirecting output to the client node. The client node can be a head node or any compute node on the cluster, accessed directly or remotely. Redirected output includes the standard output and error streams as well as run time system error messages. The output from each node is delimited by a header indicating the node.

If clusrun isinterrupted or terminated, the remote command instances are also terminated.

clusrun requires administrative rights.

Running MPI applications through clusrun is not supported.

SYNOPSIS

clusrun [/scheduler:host] [credential_options]

[/nodes:node1,node2…nodeN]

[/all] [/pausednodes] [/oknodes]

[/stdin:file] [/workdir:dir] [/env:name1=val1] [/env:name2=val2] command [arguments]

Options
Option Description

/nodes:[<node1>[,<node2>…]]

Specify a list of nodes on which the command is invoked. Default is all Ready and Paused nodes.

/all

Run command on all Ready and Paused nodes. This is the default.

/oknodes

Run command on all Ready nodes. Default is all Ready and Paused nodes.

/pausednodes

Run command on all Paused nodes. Default is all Ready and Paused nodes.

/stdin:<file>

Take standard input for all command instances from file <file>.

/workdir:<file>

Work directory for input, output, and error files. Default is %USERPROFILE%.

/env:<name1>=<val1> /env:<name2>=<val2> … /env:<nameN>=<valN>

Specify the environment variables for the task. For more information, see Use Environment Variables. Compute node–side, environment variable expansion is not supported. For example:

/env:myvar=^%XYZ_HOME^%

will NOT cause the %XYZ_HOME% to be expanded on the remote node side.