Understanding the Windows Azure Node Availability Policy

Article
01/14/2014

Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2

The Windows Azure node availability policy determines how and when the Windows Azure nodes are started (the role instances are deployed in Windows Azure) and stopped (the role instances are removed in Windows Azure).

You have the following two options to configure availability for your Windows Azure nodes:

Automatic The nodes are automatically configured to be started (provisioned) and then brought to the Online state during one or more scheduled intervals each week. You can specify multiple times each week when you want the nodes to be available to run jobs. At the end of each time block, the nodes are automatically stopped: the nodes are taken offline and the role instances are removed. Optionally, you can specify a time interval before the end of an online block when any jobs running on the nodes are drained.
Manual To make the Windows Azure nodes available to run jobs, you must first start (provision) the nodes, and then bring them online.

Additional considerations

Provisioning the Windows Azure role instances can take several minutes under some conditions, and stopping and deleting the instances can also take several minutes.
The nodes are available to run jobs in an online time block only after the role instances have been provisioned in Windows Azure. The scheduled time to start (and bring online) the nodes does not include the time that Windows Azure takes to provision the role instances.
If an automatic availability policy is configured, as a best practice, plan for 60 minutes in each online time block for node deployment, in addition to the time that you want the nodes to be available to run jobs. You should also avoid scheduling online time blocks at short intervals.
Editing the Windows Azure node availability policy changes the policy for nodes that are already added to the HPC cluster by using the node template, as well as for nodes that you add later. For example, you can edit the Windows Azure node template so that nodes that are configured to start and stop automatically according to a weekly schedule are now configured to start and stop manually.
Depending on the configuration of the availability policy in the Windows Azure node template and the Task Cancel Grace Period setting in Job Scheduler Configuration, the exact time when Windows Azure nodes are stopped and the deployment ends can differ from the scheduled end of an online time block. This can occur when HPC tasks are still running near the end of the online time block. For more information, see the section Interaction of the availability policy with the Task Cancel Grace Period setting.

Interaction of the availability policy with the Task Cancel Grace Period setting

When an automatic availability policy is configured, the Windows Azure nodes do not start jobs after an online time block passes. However, HPC tasks that are still running at the end of an online time block can continue to run for a period if the Task Cancel Grace Period setting is configured. The Task Cancel Grace Period cluster property allows applications to save state information and clean up for a period before exiting (the default period is 15 seconds). The exact time that a task ends depends on whether and how quickly the task responds to the CTRL_BREAK event (the equivalent of the CTRL+BREAK key combination). Tasks that do not process the event will exit immediately, while those that do process the event can take as long as the Task Cancel Grace Period to exit gracefully.

The following table summarizes when HPC tasks will stop running as a result of the interaction between the Windows Azure node availability policy and the Task Cancel Grace Period setting. Possible impacts and workarounds are listed. The interaction differs depending on whether a “drain” period is configured in the availability policy. The drain period is an optional setting specifies the number of minutes before the end of an online time block during which when no new tasks will start on those nodes.

Task drain period configured in the availability policy	When Task Cancel Grace Period begins	When running HPC tasks end	Impacts	Workarounds
Yes	Beginning of drain period	Between the beginning and the end of the Task Cancel Grace Period, depending on whether the task exits upon receiving the signal, or uses the period of time provided by the Task Cancel Grace Period. Can be before the scheduled end of online time block. Example Scheduled end of online time block: 8:00 PM Grace period: 5 min Drain period: 10 min Running tasks will end between 7:50 and 7:55 PM	Windows Azure nodes are stopped and the deployment is taken down earlier than expected. Usage of Windows Azure resources for HPC tasks may not be optimal.	Adjust the Task Cancel Grace Period to be the same as the drain period, or as similar as possible. Specify small values for the drain period and grace period, if your applications allow them.
No	End of configured online time block	Between the beginning and the end of the Task Cancel Grace Period, depending on whether the task exits upon receiving the signal, or uses the period of time provided by the Task Cancel Grace Period. Can be after the scheduled end of online time block. Example Scheduled end of online time block: 8:00 PM Grace period: 5 min Running tasks will end between 8:00 and 8:05 PM	HPC tasks can continue running beyond the end of the online time block for as long as the Task Cancel Grace Period. Windows Azure node deployment can be extended beyond the end of the node time block for as long as the Task Cancel Grace Period. If the end of the online time block ends on the hour (for example, 8:00 PM), the node deployment might incur one additional hour of charges. For more information, see Windows Azure Pricing Overview.	If your applications allow it, adjust the Task Cancel Grace Period to be a smaller value. If your applications need a longer Task Cancel Grace Period, configure the availability policy to end the online time block on the half-hour (for example, 7:30 PM) instead of on the hour. This will ensure that the Windows Azure deployment does not incur additional hourly charges.

Yes

Beginning of drain period

Between the beginning and the end of the Task Cancel Grace Period, depending on whether the task exits upon receiving the signal, or uses the period of time provided by the Task Cancel Grace Period. Can be before the scheduled end of online time block.

Example

Scheduled end of online time block: 8:00 PM
Grace period: 5 min
Drain period: 10 min

Running tasks will end between 7:50 and 7:55 PM

Windows Azure nodes are stopped and the deployment is taken down earlier than expected.
Usage of Windows Azure resources for HPC tasks may not be optimal.

Adjust the Task Cancel Grace Period to be the same as the drain period, or as similar as possible.
Specify small values for the drain period and grace period, if your applications allow them.

End of configured online time block

Example

Scheduled end of online time block: 8:00 PM
Grace period: 5 min

Running tasks will end between 8:00 and 8:05 PM

HPC tasks can continue running beyond the end of the online time block for as long as the Task Cancel Grace Period.
Windows Azure node deployment can be extended beyond the end of the node time block for as long as the Task Cancel Grace Period.
If the end of the online time block ends on the hour (for example, 8:00 PM), the node deployment might incur one additional hour of charges. For more information, see Windows Azure Pricing Overview.

If your applications allow it, adjust the Task Cancel Grace Period to be a smaller value.
If your applications need a longer Task Cancel Grace Period, configure the availability policy to end the online time block on the half-hour (for example, 7:30 PM) instead of on the hour. This will ensure that the Windows Azure deployment does not incur additional hourly charges.

Understanding the Windows Azure Node Availability Policy

Additional considerations

Interaction of the availability policy with the Task Cancel Grace Period setting

See Also

Reference

Concepts

Additional resources