Understanding Job and Task States

Applies To: Windows HPC Server 2008

In Windows HPC Server 2008, jobs and tasks have almost identical life cycle states (tasks do not have the ExternalValidation state). The main life cycle states are Configuring, Queued, Running, Finished, Failed, and Canceled. Jobs and tasks also move through brief transitional states. The following table summarizes all life cycle states.

Job and task states

State Definition

Configuring

The job or task is in the system, but has not been submitted to the queue.

Submitted

The job or task has been submitted and is awaiting validation before it can be queued.

ExternalValidation

The job is running through a submission filter application that is defined by the cluster administrator. Examples of the conditions for these filters include:

  • Project validation: This condition verifies that the project name is that of a valid project and that the user is a member of the project.

  • Usage time: This condition ensures that the user’s time allocations are not exceeded. Unlike the mandatory policy, this filter limits jobs to the overall time allocations users have for all possible jobs.

If the job passes external validation, it moves to the Validating state. If the job does not pass external validation, you receive an error message and the job moves to the Failed state.

Validating

The Job Scheduler service is validating the job or task. During validation, the Job Scheduler service confirms permissions, applies default settings for any properties that you have not specified, and validates each property against constraints. Default settings and constraints are defined by the job template. For more information about job templates, see Understanding Job Templates. The Job Scheduler service also confirms that job properties encompass all task properties (for example, no task has a run time that is greater in value than the run time of the job).

If the job passes validation, it moves to the Queued state. If the job does not pass validation, you receive an error message and the job moves to the Failed state.

Queued

The job or task passed validation, and is waiting to be scheduled and activated (run).

Running

The job or task is running on one or more nodes.

Finishing

The job or task completed, and job or task clean-up is in progress.

Finished

The job or task completed successfully.

Failed

The job or task failed to complete or stopped running.

Tasks that return non-zero exit codes are marked as Failed. For more information, see Tasks That Complete Successfully Are Marked As Failed.

If a running task is canceled, the task is marked as Failed. Job owners and cluster administrators can manually cancel jobs or tasks. The HPC Job Scheduler Service cancels tasks if they exceed their run time or are preempted. Typically, the HPC Job Scheduler Service automatically requeues preempted jobs.

See also Troubleshooting Jobs.

Canceling

The job or task was canceled and clean-up is in progress.

Canceled

The job or task was canceled before it started running. If a running task is canceled, the task is marked as Failed. Job owners and cluster administrators can manually cancel jobs or tasks. The Job Scheduler service cancels tasks if they exceed their run time or are preempted. Typically, the Job Scheduler service automatically requeues preempted jobs.

Additional references