0 out of 1 rated this helpful - Rate this topic

Understanding Policy Configuration

Updated: December 10, 2012

Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Windows HPC Server 2008 R2

The policy configuration settings control how resources are allocated to queued jobs. The Scheduling Mode lets you optimize resource allocation for large batch and MPI workloads or for service workloads. For information about how to change the configuration options, see Configure the HPC Job Scheduler Service.

The following table summarizes the two scheduling modes and their default configurations:

 

  Queued Balanced

Optimized for

  • Large MPI and batch jobs

  • Long running tasks

  • Stateful tasks (rely on state)

  • Parametric sweeps

  • Interactive workloads, such as service-oriented architecture (SOA) jobs and service-type applications

  • Short running tasks

  • Stateless tasks (do not rely on state)

  • Parametric sweeps

Description

Start jobs in queue order, and attempt to allocate the maximum requested resources to running jobs.

  • Finish highest priority jobs as soon as possible

  • Give jobs their maximum requested resources

  • Minimize job run time

  • Longer wait in the job queue for lower priority jobs

  • Available resources are first used to meet the maximum resource request of running jobs, and then to start new jobs (when automatic growth is enabled)

Attempt to start all incoming jobs as soon as possible at their minimum resource requirements. If additional resources are available, grow jobs based on priority.

  • Start all incoming jobs as soon as possible

  • Give jobs their minimum requested resources

  • Minimize wait time in the job queue

  • Jobs take longer to complete

  • Resources can be taken from running jobs in order to start new jobs, even if the running has a higher priority

Additional settings

  • Preemption

    Default: Graceful preemption

  • Adaptive resource allocation (grow/shrink)

    Default: Increase and decrease resources automatically both enabled

See Queued settings in this topic.

  • Priority bias

    Default: Medium Bias

  • Rebalancing interval

    Default: 10 seconds

See Balanced settings in this topic.

In Queued mode, the HPC Job Scheduler Service start jobs in queue order, and attempts to allocate the maximum requested resources to running jobs. The following sections describe the preemption and adaptive resource allocation settings that are associated with this mode.

Preemption allows higher priority jobs that are waiting in the queue to start sooner by taking resources away from lower priority, preemptable jobs that are already running. If you enable the Grow by preemption policy (see “Adaptive resource allocation” below), preemption will also be used to help grow higher priority, running jobs to their maximum resource request (available starting with HPC Pack 2008 R2 with Service Pack 2 (SP2).

noteNote
The Preemptable job property is defined by the administrator in job templates. Use job templates to define the types of jobs that can be preempted, or the sets of users who can submit preemptable or nonpreemptable jobs. Preemptable cannot be defined when submitting a job through HPC Cluster Manager, HPC Job Manager, the HPC PowerShell, or the HPC command-line tools. It is only possible to do this by using the HPC API, if the selected job template specifies both True and False as valid values for the Preemptable job property.

Preemption has the following options:

  • Graceful preemption (Default): Take resources from the preempted job as its running tasks complete so that work is not lost.

  • Immediate preemption: Take resources from the preempted job by canceling all running tasks so that resources can be allocated to the high priority job immediately. For more information about job and task cancelation, see the Additional Considerations section in Cancel a Job or Task.

  • Task level preemption (introduced in HPC Pack 2008 R2 with SP3): Enable preemption of individual tasks instead of entire jobs. With the default immediate preemption settings, the scheduler will cancel an entire job if any of its resources are needed for a higher priority job. When you enable task level preemption, the scheduler will cancel individual tasks instead. For example, if a Normal priority job is running 100 tasks on 1 core each, and a High priority job is submitted that requires 10 cores, task level preemption will cancel 10 tasks, rather than canceling the entire job. This option can improve job throughput by minimizing the amount of rework that must be done due to preemption.



    noteNote
    Starting with HPC Pack 2012, in Queued scheduling mode, the default option for preemption behavior is task-level immediate preemption, rather than job-level preemption. This default behavior means that only as many tasks of low priority jobs are preempted as are needed to provide the resources required for the higher priority jobs, rather than preempting all of the tasks in the low priority jobs.



  • No preemption: Do not preempt jobs.

Adaptive resource allocation dynamically adjusts the resources allocated to a job based on its tasks. Enabling resource adjustments can result in a significant improvement in cluster utilization and reduced job queue times, especially for clusters which run jobs composed of multiple tasks, such as parametric sweep computations. Only jobs that contain more than one task or subtask can benefit from automatic resource adjustment.

Adaptive allocation has the following settings that can be enabled or disabled:

  • Increase resources automatically (enabled by default): Use available resources to grow higher priority, running jobs to their maximum before starting lower priority jobs. With automatic growth enabled, the HPC Job Scheduler Service can allocate free resources to running jobs that have additional tasks to run. The service will not allocate more resources than the maximum requested for the job. This results in jobs spending more time in the queue waiting for resources, but they finish more quickly after they are started. Available resources are allocated first to the highest-priority job in the system, whether this job is running or queued.

    • Grow by preemption (introduced in HPC Pack 2008 R2 with SP2): To help grow higher priority running jobs to their maximum, use preemption to take resources away from lower priority, running jobs. Preemption must be enabled to use this setting.

  • Decrease resources automatically (enabled by default): With automatic shrink enabled, the HPC Job Scheduler Service can release unused resources from running jobs that have no additional tasks to run. The service will not shrink resources below the minimum requested for the job. Automatic shrink results in better overall cluster utilization, but it may cause problems if you add tasks to jobs that are already in progress.

noteNote
In the default job template, the job properties Auto Calculate Maximum and Auto Calculate Minimum are set to a default value of True. If a job template specifies that True is the only valid value for these properties, the submitting user will not have the option of specifying maximum and minimum resources for a job submitted with that template, and resources will be automatically calculated based on the tasks in the job.

In Balanced mode, the HPC Job Scheduler Service attempts to start all incoming jobs as soon as possible at their minimum resource requirements. After all the jobs in the queue have their minimum resources, additional cluster resources are allocated to jobs based on their priority. Resource allocation is periodically rebalanced to fill idle resources, start new jobs, and adjust allocation according the Priority Bias setting. The following sections describe the settings associated with this mode.

noteNote
Balanced scheduling is limited in situations where node groups overlap. Balanced mode is more effective in non-overlapping node groups.

Priority Bias controls how additional resources are allocated to jobs. In terms of Balanced mode, “additional resources” refers to cluster resource above the total minimum resources for all running jobs. Tasks that are running on additional resources can be canceled with immediate preemption to accommodate new jobs or to converge on the desired allocation pattern.

Priority Bias has the following options:

  • High Bias: All additional resources are allocated to higher priority jobs.

  • Medium Bias (Default): Each priority band is given a higher proportion of additional resources than the band below it. The priority bands are Highest, Above Normal, Normal, Below Normal, and Lowest.

  • No Bias: Additional resources are allocated equally across the job queue.

The Rebalancing Interval represents the time, in seconds, between rebalancing passes. The default value is 10 seconds.

A longer interval can improve scheduler performance, but it can take longer to respond to new jobs and converge on the desired allocation pattern. Longer intervals are good if you do not need instant growing and shrinking. If your cluster has a high turnaround rate (jobs are submitted frequently and finish quickly), you might want a longer interval to avoid excessive growing and shrinking.

A shorter rebalancing interval provides a faster response when new jobs are submitted, at the cost of additional load on the head node. The other settings that you can adjust if you need faster responses are the Task Cancel Grace Period and the Release Task Timeout, which can cause it to take longer for running work to get pushed out of the way.

Did you find this helpful?
(1500 characters remaining)
© 2013 Microsoft. All rights reserved.