Compute Cluster Server Command Line Interface Reference

Article
08/16/2010

Applies To: Windows Compute Cluster Server 2003

This reference provides command descriptions and syntax for all cluster-specific command line executables used in Windows 2003 Compute Cluster Server. The Command Line Interface (CLI) commands provide a keyboard alternative to most actions otherwise performed using the Job Manager or Administrator interfaces.

All commands in this reference except clusrun follow this general syntax:

<command> <operator>[options]

—Or—

<command> <operator>[options] <command_line>[arguments]

There are five commands in the CLI:

job
task
node
cluscfg
clusrun

Each command (except clusrun) has its own set of operators. It is the combination of a command and an operator that constitutes a CLI executable. For example, job new creates a new job.

Each operator has a set of options. Options are all preceded by a forward slash and take the form /<option>:<value>. For example, job new /jobname:my_job creates a job named my_job.

The command line parameter is the command line of a task. Arguments are the arguments associated with that command line.

To obtain help for a CLI command, type:

<command> /? or <command> /help

To obtain help for a CLI command operator, type:

<command> <operator> /? or <command> <operator> /help

Syntax conventions

This document uses the following special syntax conventions:

/scheduler:<host> is a universal option and is used to specify a host other than the local host.
/jobfile:<template_file> is the XML file in which the specifics of a job and its tasks are stored.
/taskfile:<template_file> is the equivalent of /jobfile:<template_file> when the template file is being used as a task property.
standard_job_options are one or more of a set of options applicable to a job. These are defined once for job new and afterward are referenced.
standard_task_options are a set of options applicable to a task. These are defined once for job add and afterward are referenced.
task_options_subset is the subset of standard_task_options that can be used with the job submit command.
credential_options is the option pair /user:<domain\user> /password:<password> and applies when the job is run under a user other than the invoking user.
jobId is the system-assigned identification number for a job.
jobId.taskID is the system-assigned identification number for a task.

Job command

The job command is used to create, submit, view, and manage jobs. The job command operators are:

add
cancel
list
listtasks
modify
new
requeue
submit
view

job add

Adds a task to the task queue for a specified job and returns a unique task ID. Tasks can be added only to jobs in the Not_Submitted, Queued or Running state.

SYNOPSIS

job add <JobID> [standard_task_options] [/scheduler:<host>] <command> [arguments]

Standard Task Options

Option	Description	Maximum Characters
/name:<task_name>	Name of the task.	80
/numprocessors:<min_processors> or <min_processors>, <max_processors>	Minimum and maximum number of processors to be allocated. The default is one processor.	N/A
/rerunnable: true \| false	A flag indicating that that a task can be rerun after a failure. Default is true. The scheduler allows a failed job to be requeued if the failure is due to any error that can be fixed without changing the task command line. If the task or task fails for reasons of system failure (for example, a node crashes), the scheduler requeues the job automatically. Only incomplete tasks are re-run.	N/A
/requirednodes:<node1>,<node2>,…<nodeN>	Specifies by name the nodes to be allocated to the task. /requirednodes overrides /numprocessors and also forces the job to reserve the nodes that are specified.	2080
/env:name1=val1 /env:<name2=val2 … /env:nameN=valN>	Specifies the environment variables for the task. (For more information about environment variables, see Use Environment Variables.)	2048
/exclusive: true \| false	A flag indicating that the task has exclusive use of reserved nodes.	N/A
/runtime:[[[days:<num>]hours:<num>]minutes: <num>\| infinite]	Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite.	8
/workdir:<path>	The full path of the working directory (the directory for input, output, and error files). The path may contain environment variables. Default is %USERPROFILE%.	160
/stdin:<file_name>	Take standard input for the task from file <file_name>.	160
/stdout:<file_name>	Redirect standard output of the task to the file <file_name>.	160
/stderr:<file_name>	Redirect standard error of the task to a file <file_name>.	160
/depend:<task_name1>	Specifies that this task depends on a task or tasks of the name <task_name>. (Multiple tasks of the same name will have different task IDs.) If multiple tasks of different names are depended on, the job add command needs to be repeated: job add 21 /name:task3 /depend:task1 myapp3.exe job add 21 /name:task3 /depend:task2 myapp3.exe	320

An additional task option is:

Option	Description
/taskfile:<template_file>	Overwrites the contents of the task with the values in this job template file, except where a different value is explicitly set in the command line.

job cancel

Terminates the running job and cancels all of its resource reservations.

SYNOPSIS

job cancel [options] [/scheduler:<host>] <jobID>

Option	Description
/message:<msg_string>	Msg_string is an optional user-written log entry for session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes.

job list

Lists all jobs in the cluster. The output contains a table for each job in the following form:

Job ID	Description
USER	Submission user
NAME	User-specified name of job
STATUS	Not_Submitted, Queued, Running, Cancelled, Finished or Failed
PRIORITY	Highest, AboveNormal, Normal, BelowNormal, Lowest

By default, if the invoking user is an administrator, then all jobs are listed. If the invoking user is a user without administrative rights, only his or her jobs are listed.

By default, only active (queued or running) jobs are displayed.

SYNOPSIS

job list [options] [/scheduler:<host>]

Options

Option	Description
/user:[<user_name*> \| ]**	Show only jobs of the user <user_name>. If the keyword ‘*’ is specified, then all users jobs will be displayed.
/status:[<stat1,stat2,…,statN*> \| ]**	Display the jobs of each status specified. (Not_Submitted, Queued, Running, Cancelled, Finished or Failed) If the keyword /all is used, both active and completed jobs will be displayed. If completed jobs are displayed, one more column, Complete_Time is displayed onscreen.
/all	List all the jobs in the system. This is the equivalent of “job list /user:* /status:*”

job listtasks

List the tasks of the job <jobID>. Each task will be displayed with the following fields:

Task ID	Description
Status	Status of the task.
Name	Name of the task.
Command line	Command line of the task.
Number of processors	Minimum and maximum number of processors.
Execution nodes	Compute nodes the task runs on. If the task is not running, this value is blank.

SYNOPSIS

job listtasks [/scheduler:<host>] <jobId>

job modify

Modifies a queued or running job. For jobs in the Queued state, all modified terms take effect immediately. For jobs in the Running state, only changes to the following options take effect:

/runtime:
/rununtilcancelled
/projectname:

The following modified terms will not take effect unless and until the job is requeued:

/jobname:
/numprocessors:
/askednodes:
/priority:
/license:
/exclusive

If /numprocessors: is specified at both the job and task level, the values for the job must be a superset of the values for the tasks.

If /runtime is specified at both the job and task level, the job run time must be equal to or greater than the longest task run time.

SYNOPSIS

job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>

job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

job modify [/jobfile:<template_file>] [credential_options] [jobterm_options] [/scheduler:<host>] <jobId>

Option	Description
Standard job options	For job options, see job new.
Credential options	This is typically used when there is a need to update the user’s password for the Job Scheduler.
/jobfile:<template_file>	Overwrites the contents of the job with the values in this job template file, except where a different value is explicitly set in the command line.

Examples:

job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>

This command overwrites the content of an existing job jobId with the job options specified in the template file. The task options specified in the template file are ignored.

job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

This command updates the job jobId with the job options specified.

job modify [/jobfile:<template_file>] [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>

This command overwrites the content of an existing job jobId with the job options specified in the template file, then updates the job options values with those made explicit on the command line.

Note

Modifying the run time for a backfill job (one that has jumped the queue to take advantage of idle reserved nodes) is not permitted, because doing so could delay the reserving job.

job new

Creates a new job and returns a unique job ID. The job is created in the Not_Submitted state and contains no tasks.

SYNOPSIS

job new [standard_job_options] [/jobfile:<template_file>] [/scheduler:<host>]

Option	Description
/jobfile:<template_file>	Use the settings in this job template XML file, except where a different value is explicitly set in the command line.

Standard job options

Option	Description	Maximum Characters
/jobname:<job_name>	Name of the job.	80
/numprocessors:<min_processors> or <min_processors>-<max_processors>	Minimum and maximum number of processors to be allocated. The default is one processor.	N/A
/askednodes:<node1>,<node2>,…<nodN>	Specifies nodes to be allocated to the job by name. By default, all nodes in the cluster are candidates.	2080
/exclusive: true \| false	By default, a job has exclusive use of nodes reserved by it. If /exclusive: is set to false, idle, reserved processors on these reserved nodes are available to other jobs. This is reciprocal, making nodes reserved to other jobs available to this job if they have also been flagged as nonexclusive.	N/A
/priority:<priority_class>	Schedule priority class: Highest, AboveNormal, Normal, BelowNormal,or Lowest. Highest and AboveNormal are available only to administrative users. The default is Normal. Within a priority class, the job is placed in the job queue in the order received unless requeued. If requeued, the job always goes to the top of its priority class.	N/A
/runtime:[[[days:<num>]hours:<num>]minutes: <num>\| infinite]	Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite.	8
/rununtilcanceled: true \| false	Flag indicating that the job will hold its resources until it is cancelled or reaches its run time limit. This way, additional tasks can be run on the nodes.	N/A
/projectname:<project_name>	Name of a project, if any, to which the job belongs.	80
/license:<feature1>:<amt1> /license:<feature2>:<amt2> …/license:<featureN>:<amtN>	License features required to run the tasks in the number of tokens of each.	160

job requeue

Requeues the job specified by jobId. To requeue a job is to stop it and reinsert it as the topmost job in its priority class segment. The job retains its original submission time, not the requeue time. Requeuing can be performed on running, canceled, and, in some cases, failed jobs. Only unfinished tasks are rerun.

By default, a failed job is requeued automatically if the failure is due to a system failure, such as a node reboot. If automatic requeue is not desired, set the task property /rerunnable to false.

A failed job can also be requeued manually if failure is due any error that can be fixed without changing the task command line. For example, a task may call for an input file that is not there or contains errors. Such jobs are not requeued automatically, because the error must first be corrected.

SYNOPSIS

job requeue [/scheduler:<host>] <jobId>

job submit

Submits a new or existing job to the queue.

SYNOPSIS

job submit /id:<jobID> [credential_options] [/scheduler:<host>]

job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]

job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] <command> [arguments]

Options	Description
Standard job options	See job options for job new. Option /numprocessors: will apply to both job and task.
Task options subset	See standard task options for job add. Only this subset applies: /name: /rerunnable /workdir: /stdin: /stdout: /stderr:

Standard job options

See job options for job new. Option /numprocessors: will apply to both job and task.

Task options subset

See standard task options for job add. Only this subset applies:

/name:

/rerunnable

/workdir:

/stdin:

/stdout:

/stderr:

Examples:

job submit /id:<jobID> [credential_options] [/scheduler:<host>]

This command submits a job created by the job new command by jobID. Only the creator of the job can submit the job.

job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]

This command submits a job based in a job template file. It returns jobID and taskID.

job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] command [arguments]

This command submits creates and submits a job. It returns jobID and taskID

job submit “cmd.exe /k myapp.exe 1> %my_resultdir%\myapp_%CCP_JOBID%.out 2> %my_resultdir%\myapp_%CCP_JOBID%.out”

This command submits the command line of a user executable as a job.

Job view

Displays the details of a specified job.

SYNOPSIS

job view [/scheduler:<host>] <JobID>

Display the details of the specified job, including:

Option	Description
Job ID	Job ID.
Status	Not_Submitted, Queued, Running, Cancelled, Finished, or Failed.
Name	Job name specified by the user
Submitted by	Cluster user that submitted the job.
Number of processors	Minimum and maximum number of processors.
Allocated Nodes	Execution nodes.
Submit time	Submission time in date-hour-minute format.
Start time	Time job started, in date-hour-minute format.
End time	Time job ended, in date-hour-minute format.
Number of Tasks	Number of tasks.
Notsubmitted	Number of tasks not yet submitted to the cluster nodes.
Queued	Number of queued tasks.
Running	Number of running tasks.
Finished	Number of finished tasks.
Failed	Number of failed tasks.
Cancelled	Number of cancelled tasks.

Task command

The task command is used to view, cancel, and requeue tasks. The task command operators are:

cancel
requeue
view

task cancel

Terminates the running cancelled task and cancels all of its resource reservations.

SYNOPSIS

task cancel [<options>] <jobID.taskID>

Option	Description
/message:<msg_string>	Msg_string is an optional user-written log entry for the session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes.

task requeue

Requeues the task specified by jobId.taskID, stopping it and reinserting it as the next task in the queue.

SYNOPSIS

task requeue [/scheduler:<host>] <jobId.taskId>

task view

Displays the details of task in the following form:

Term	Description
Task ID	Task ID.
Status	Status of the task (for example, Finished).
Name	Task name.
Command line	Task command line.
Allocated nodes	Execution node list.
Exit code	Exit code of the task: 0=task finished; any other exit code = task failed.
Submit time	For a failed task, the error message. For a cancelled task, the cancellation message provided by the user. Default is "cancelled by <invoking user>.”
Start time	Values for the current usage of a task. These include: Kernel mode CPU time of all the processes since the start of the task. User mode CPU time of all the processes since the start of the task. Current total working set size of all the processes.
End time	For each node, displays the processes created on the node.
Kernel time	Kernel mode CPU time used by all processes since the start of the task.
User time	User mode CPU time used by all processes since the start of the task.
Working set	Current total working set size of all processes in the task.

SYNOPSIS

**task view [/scheduler:<host>] <jobId.TaskId> |<**jobId >

EXAMPLE:

task view 101.0

Displays the details of the first task of job 101.

Node command

The node command allows you to add, remove, and manage nodes. The node operators are:

approve
list
pause
resume

node approve

Approve an added node. After a node has been added, that node is in a pending state, awaiting approval by an administrative user.

SYNOPSIS

node approve [/scheduler:<host>] <node_name>

node list

Lists the nodes and the status and statistic for each. The output is a table with each row containing the following fields:

Term	Description
NODE_NAME	Name of the node.
STATUS	Pending, Ready, Paused, Unreachable.
MAX	Maximum number of job slots available.
RUN	Number of job slots used by running jobs on this node.
IDLE	Number idle job slots available on this node.

SYNOPSIS

node list [/scheduler:<host>]

node pause | resume

Pause a node or resume the activity of a node that is paused. When a node is paused, jobs running on the node continue to run but no new jobs from users without administrative rights will be started. New administrator jobs will be accepted.

SYNOPSIS

node [pause |resume] [/scheduler:<host>] {node_name} [/all]

Option	Description
/all	Pause or resume all nodes.

Cluscfg command

The cluscfg command allows monitoring and manipulation of the queue. cluscfg operators include:

delcreds
listenvs
listparams
setcreds
setenvs
setparams
view

cluscfg delcreds

Deletes the cached credential of the named user from the invoking user’s cache. If /user is not supplied, the invoking user is assumed.

SYNOPSIS

cluscfg delcreds [/user:<DOMAIN>\<user>] [/scheduler:<host>]

cluscfg listenvs

Lists the cluster-wide environment variables of the cluster.

SYNOPSIS

cluscfg listenvs [/scheduler:<host>]

cluscfg listparams

Returns the following cluster parameters, which are stored in HKLM\System\CurrentControlSet\Services\CCPSchedSvc\Enum.

Parameter	Description	Default Value
ActivationFilterProgram	Activation filter executable file name.	15 seconds
ActivationFilterTimeout	Activation filter program time-out.	15 seconds
BackFillLookahead	Specification of backfill behavior or number of jobs the scheduler searches to find jobs that can backfill the jobs at the top of the job queue.	<0=search through the entire job queue (default) 0=no backfill >0=number of jobs to search
EventLogLevel	Sets the level of Job Scheduler events that that appear in the Event Viewer. Levels are: ActivityTracing- Stop, Start, Suspend, Transfer, and Resume events are displayed. All-All events are displayed. Critical-Critical events are displayed. Error-Critical and Error events are displayed. Information-Critical, Error, Warning, and Information events are displayed. Off-No events are displayed. Verbose-Critical, Error, Warning, Information, and Verbose events are displayed. Warning-Critical, Error, and Warning events are displayed. For more information about event levels, see TraceEventType Enumeration (https://go.microsoft.com/fwlink/?LinkId=60988).	Error
HeartbeatInterval	Interval by which the scheduler sends health probes to the Node Manager.	60 seconds
InactivityCount	Number of missing beats (no reply from the health probes) before the Job Scheduler declares the node Unreachable.	3
JobRetryCount	Maximum time the system reruns a job.	3
JobRuntime	Format:<dd>:<hh>:<mm>.	Infinite
SpoolDir	The directory where the output of the clusrun command is redirected.	\\<head_node>\spooldir
SubmissionFilterProgram	Submission filter executable file name.	“”
SubmissionFilterTimeout	Time-out value (in seconds) for the submission filter.	15 seconds
TaskRetryCount	Maximum time the system reruns a task.	3
TTLCompletedJobs	Time in days for completed job records to remain in the MSDE.	5 days

SYNOPSIS

cluscfg listparams [/scheduler:<host>]

cluscfg setcreds

Sets the credential of a named user into the credential cache of the invoking user. If /user is not supplied, the invoking user is assumed. If /password is not provided, the Stored User Names and Passwords UI is prompted.

SYNOPSIS

cluscfg setcreds [/user:<DOMAIN>\<user>] [/password:<password>] [/scheduler:<host>]

cluscfg setenvs

Setting cluster-wide environment variables to specified values

Cluster-wide environment variables are variables that apply to the entire cluster. Cluster-wide environment variables can be viewed or set using the cluscfg command. To add or set a cluster-wide environment variable, you must have administrative credentials. For more information, see Compute Cluster Server Command Line Interface Reference (https://go.microsoft.com/fwlink/?LinkID=64065).

There are two preexisting cluster-wide environment variables. These are set during system deployment and are rarely changed manually or used in commands:

Environment Variable	Description
CCP_CLUSTER_NAME	Name of the cluster.
CCP_MPI_NETMASK	Subnet mask for the interface to be used by the MPI process, if a separate MPI network exists. Example: CCP_MPI_NETMASK=172.30.0.0./255.255.0.0.
MPICH_SOCKET_SBUFFER_SIZE	Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes.
MPICH_SOCKET_SBUFFER_SIZE	Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes.

The most common example of an added cluster-wide environment variable is Path. Path functions identical to the Windows Path environment variable, but applies to all nodes in the cluster and only in the context of a job task.

SYNOPSIS

cluscfg setenvs “<name1=value1>” “<name2=value2>”… “<nameN=valueN>” [/scheduler:<host>]

To unset an environment variable, use an empty string as the value. Example:

cluscfg setenvs "MY_VAR=”

This unsets the environment variable MY_VAR.

cluscfg setparams

Sets the named parameters to the values specified. Refer to cluscfg listparams for parameter definitions.

SYNOPSIS

cluscfg setparams [TTLCompletedJobs=val] [JobRetryCount=val] [TaskRetryCount=val] [JobRuntime=val|Infinite] [SubmissionFilterProgram=val] [SubmissionFilterTimeout=val] [ActivationFilterProgram=val] [ActivationFilterTimeout=val] [BackFillLookahead=val] [HeartbeatInterval=val] [InactivityCount=val] [SpoolDir=val] [eventloglevel=off|critical|warning|error|information|verbose|activitytracing|all]

[/scheduler:<host>]

cluscfg view

Displays the details of a cluster. The output contains:

Term	Description
Cluster name	Name of the cluster.
Total number of compute nodes	Number of nodes in cluster.
Number of ready compute nodes	Number of nodes with Ready status.
Number of paused compute nodes	Number of nodes with Paused status.
Number of unreachable compute nodes	Number of nodes with Unreachable status.
Number of compute nodes pending for approval	Number of nodes with Pending for Approval status.
Total number of processors	Number of processors in the cluster.
Number of idle processors	Number of processors not running tasks.
Number of busy processors	Number of processors running tasks.
Number of jobs not submitted	Number of pending jobs not submitted to the Job Scheduler.
Number of queued jobs	Number of jobs in the CCS job queue.
Number of jobs running	Number of jobs from the queue.
Number of finished jobs	Number of jobs that completed successfully.
Number of failed jobs	Number of jobs that have failed.
Number of cancelled jobs	Number of jobs that have been cancelled.

SYNOPSIS

cluscfg listparams [/scheduler:<host>]

Clusrun command

clusrun is an administrative command that runs an instance of a specified command on multiple nodes, redirecting output to the client node. The client node can be a head node or any compute node on the cluster, accessed directly or remotely. Redirected output includes the standard output and error streams as well as run time system error messages. The output from each node is delimited by a header indicating the node.

If clusrun isinterrupted or terminated, the remote command instances are also terminated.

clusrun requires administrative rights.

Running MPI applications through clusrun is not supported.

SYNOPSIS

clusrun [/scheduler:host] [credential_options]

[/nodes:node1,node2…nodeN]

[/all] [/pausednodes] [/oknodes]

[/stdin:file] [/workdir:dir] [/env:name1=val1] [/env:name2=val2] command [arguments]

Options

Option	Description
/nodes:[<node1>[,<node2>…]]	Specify a list of nodes on which the command is invoked. Default is all Ready and Paused nodes.
/all	Run command on all Ready and Paused nodes. This is the default.
/oknodes	Run command on all Ready nodes. Default is all Ready and Paused nodes.
/pausednodes	Run command on all Paused nodes. Default is all Ready and Paused nodes.
/stdin:<file>	Take standard input for all command instances from file <file>.
/workdir:<file>	Work directory for input, output, and error files. Default is %USERPROFILE%.
/env:<name1>=<val1> /env:<name2>=<val2> … /env:<nameN>=<valN>	Specify the environment variables for the task. For more information, see Use Environment Variables. Compute node–side, environment variable expansion is not supported. For example: /env:myvar=^%XYZ_HOME^% will NOT cause the %XYZ_HOME% to be expanded on the remote node side.

Compute Cluster Server Command Line Interface Reference

Syntax conventions

Job command

job add

SYNOPSIS

Standard Task Options

job cancel

SYNOPSIS

job list

SYNOPSIS

Options

job listtasks

SYNOPSIS

job modify

SYNOPSIS

job new

SYNOPSIS

Standard job options

job requeue

SYNOPSIS

job submit

SYNOPSIS

Job view

SYNOPSIS

Task command

task cancel

SYNOPSIS

task requeue

SYNOPSIS

task view

SYNOPSIS

Node command

node approve

SYNOPSIS

node list

SYNOPSIS

node pause | resume

SYNOPSIS

Cluscfg command

cluscfg delcreds

SYNOPSIS

cluscfg listenvs

SYNOPSIS

cluscfg listparams

SYNOPSIS

cluscfg setcreds

SYNOPSIS

cluscfg setenvs

Setting cluster-wide environment variables to specified values

SYNOPSIS

cluscfg setparams

SYNOPSIS

cluscfg view

SYNOPSIS

Clusrun command

SYNOPSIS

Options

Additional resources