Deploying Applications to Windows Azure Nodes in a Windows HPC Cluster
Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2, Windows HPC Server 2008 R2
Starting with HPC Pack 2008 R2 with Service Pack 1 (SP1), HPC Pack includes built-in utilities and mechanisms that help cluster administrators deploy applications (such as executable files, SOA services, XLL files, and cluster-enabled Microsoft Excel workbooks) to Windows Azure nodes that are joined to an on-premises cluster in a Windows Azure “burst” scenario. By default, Windows Azure nodes cannot access on-premises resources and shared folders directly, so the methods that you use to deploy applications on Windows Azure nodes differ from the methods that you use to deploy applications on-premises. Additionally, Windows Azure nodes added to a Windows HPC cluster are deployed and reprovisioned dynamically, so the recommended methods for application deployment help ensure that applications are automatically available on new Windows Azure node instances.
In this topic:
Considerations for HPC application workloads for Windows Azure burst
Selecting a method to deploy applications and data to Windows Azure nodes
Using hpcpack and hpcsync
How to stage application files to Windows Azure storage
How to deploy staged packages to Windows Azure nodes
Accessing files from Windows Azure nodes
Additional references
Before deploying applications to Windows Azure nodes, evaluate whether your existing or planned HPC workloads will run or scale efficiently in Windows Azure. The detailed migration, development, and design considerations for HPC applications that run in a Windows Azure burst scenario are beyond the scope of this topic. In addition, the capabilities of the Windows Azure platform are continuously evolving. However, the following are current characteristics of successful burst to Windows Azure workloads with HPC Pack, particularly for large-scale deployments of Windows Azure nodes:
Highly distributed, single-node computations These include many parametric sweep and certain service oriented architecture (SOA) jobs. Other job types, including Message Passing Interface (MPI) jobs, can run in a Windows Azure burst configuration. However, the current Windows Azure infrastructure is not designed for jobs that require high bandwidth and low latency communication between Windows Azure nodes, or that use task dependencies between nodes. For general information about HPC job types, see Understanding Parallel Computing Jobs. For detailed considerations to run MPI jobs, see Guidelines for Running MPI Applications in Windows Azure.
Computation time exceeds data movement time Certain HPC workloads require uploading large amounts of data to Windows Azure for computation, or return large amounts of data that has been processed. Ensure that data movement is not a bottleneck in your HPC workflow.
File-based data access Existing HPC applications that access data files on-premises can be readily migrated to run in Windows Azure by accessing data files that are uploaded to Windows Azure blob storage. New HPC applications can be developed to access the variety of storage types in Windows Azure. However, depending on the sensitivity of the data, legal requirements, cost considerations, and other factors, it may not be possible to store application data in Windows Azure.
“Bursty” workload pattern The Windows Azure burst scenario is ideal for resource-intensive workloads that are not easily completed by using the fixed resources of an on-premises cluster. The workloads can include irregular computation spikes, regularly scheduled jobs, or one-off jobs.
For more information about running applications on Windows Azure nodes, see Guidelines for Running HPC Applications on Windows Azure Nodes.
The method that you use to deploy applications and data (in some cases) to Windows Azure nodes depends on what is required to install your application, as outlined in the following table.
Installation requirements | Method | Availability |
---|---|---|
Installation can be accomplished by copying files and any dependencies such as DLLs or configuration files to the node. |
|
HPC Pack 2008 R2 with SP1 |
Installation requires silently running an installer or requires additional configuration steps such as setting environment variables or creating firewall exceptions. |
|
HPC Pack 2008 R2 with SP2 |
Installation and data distribution to Windows Azure nodes can occur during a preparation task at the time the job is run. |
|
HPC Pack 2008 R2 with SP1 |
Installation requires steps that are not easily scripted, and the application and application data can be accessed from a durable drive in Windows Azure. |
|
HPC Pack 2012 |
Each Windows Azure node deployment is associated with a Windows Azure storage account that is specified in the node template. A cluster administrator can stage files (such as applications, SOA services, XLLs, cluster-enabled Excel workbooks, and utilities) to the storage account by using the hpcpack commands. You can use hpcpack create to package files or folders in a compressed format (.zip) that can be uploaded to Windows Azure storage. Each application, SOA service, or XLL must be packaged separately, and the package must include any required dependencies such as DLLs or configuration files. You can then use hpcpack upload to upload the package to the storage account. You can run the hpcpack commands from the head node, or on a computer that has the HPC client utilities installed.
All packages in the storage account are automatically deployed to new Windows Azure node instances during the provisioning process. This happens when you deploy a set of Windows Azure nodes by using the HPC management utilities, and if your node instances are reprovisioned automatically by the Window Azure system. The hpcsync command runs on each Windows Azure node and copies all packages from storage to the node, and then extracts the files. If you upload packages to storage after the Windows Azure nodes are started, you can deploy the packages by running the hpcsync command manually on each Windows Azure node.
Note
If you create multiple Windows Azure node templates that reference the same storage account, the same staged files will be deployed to all Windows Azure node sets. To deploy different files to different node sets, create a separate Windows Azure storage account for each Windows Azure node template.
The following diagram illustrates the basic workflow and mechanisms for copying applications to Windows Azure nodes:
By default, hpcsync extracts files to a location that is specified by the CCP_PACKAGE_ROOT environment variable. This variable is set on Windows Azure nodes during the provisioning process. The extracted files are placed in a folder that is determined as follows: %CCP_PACKAGE_ROOT%\<packageName>\<uploadTimeStamp>. This is the expected location for SOA services, XLLs, and Excel workbooks. However, this is not convenient for applications that cluster users will be calling in their command lines. To simplify the folder structure for executable files, you can set the relative path property for the package when you upload it to storage. hpcsync applies the relative path when extracting the files, so that the path is determined as follows: %CCP_PACKAGE_ROOT%\<relativePath>. Users can then specify the path to their application as in the following example of a job submit command: job submit %CCP_PACKAGE_ROOT%\myRelativePath\myapp.exe
The following are important considerations about hpcsync and CCP_PACKAGE_ROOT:
On Windows Azure worker nodes, the %CCP_PACKAGE_ROOT% folder is created on a 10 GB disk partition. This means that all application files on a node instance cannot exceed 10 GB. If an application has considerable input and output files, you can use a startup script to grant user permissions on the C:\ drives so that users can write to all available scratch space on the node.
When you run hpcsync manually, you can override the default location (%CCP_PACKAGE_ROOT%). For example, you can create a folder on each Azure Node and then specify that location when you run hpcsync. All packages will be extracted to that folder. However, any new node instances that are deployed (or automatically reprovisioned) will not include that folder, and packages will be automatically deployed to the default location. Additionally, cluster users only have write permissions to folders in %CCP_PACKAGE_ROOT%. Unless you modify folder permissions on the Windows Azure nodes, only administrators can run applications outside of %CCP_PACKAGE_ROOT%.
When hpcsync deploys a package, none of the extracted files can have a full path length longer than 256 characters. The root directories where the extracted files are temporarily and then finally placed can take up to 136 characters, leaving 120 characters for the file name, subdirectories (if any), and the relativePath (if specified). If the path for the extracted files exceeds 256 characters, the package deployment fails.
The hpcsync mechanism is sufficient for deploying SOA services, XLL files, and applications that can be installed by simply copying files to a node. If you need to run an installer to install an application, or if the application requires additional configuration steps such as setting environment variables, adding firewall exceptions, modifying folder permissions, or creating folders, you can include a startup script in the node template. This script will run during the provisioning process after hpcsync runs, and can be used to configure the nodes and perform the required application installation steps.
This section provides information about how to package applications and stage them to Windows Azure storage by using hpcpack. Staged packages are automatically deployed to new Windows Azure node instances that you provision (or that are automatically reprovisioned by the Windows Azure system).
Note
You must be a cluster administrator or at least have the Windows Azure subscription ID and storage account key to stage files to Windows Azure storage.
If you are packaging a SOA service:
The name of the package must be the name of the SOA service (that is, the service name that the SOA client specifies in the SessionStartInfo constructor). For example, serviceName.zip or serviceName_serviceVersion.zip.
You must include the service DLL, any dependent DLLs, and the service configuration files in the package.
The service configuration file must also be deployed to the head node. All settings are determined by the on-premises copy of the configuration file.
Do not specify a relative path when you upload the package. SOA services must be unpacked to the default location.
If you are packaging an XLL file:
The name of the package must be the name of the XLL file. For example, XLLName.zip.
If the XLL has dependencies, place the XLL and supporting files in a folder and package the folder. The XLL must be in the top level of the folder (not in a subfolder).
Do not specify a relative path when you upload the package. XLLs must be unpacked to the default location.
If you are packaging an Excel workbook:
The name of the package must be the name of the workbook. For example, workbookName.zip.
If the workbook has dependencies, place the workbook and supporting files in a folder and package the folder. The workbook must be in the top level of the folder (not in a subfolder).
Do not specify a relative path when you upload the package. Workbooks must be unpacked to the default location.
If you are packaging an executable file (such as an MPI application), application installer, or utility that you will call from a startup script:
You must include any dependent DLLs or files in the package.
When you upload the package, specify the relative path property.
If you are packaging a startup script:
The name of the package must be the name of the startup script. For example, startup.bat.zip.
Do not specify a relative path when you upload the package. The startup script must be unpacked to the default location.
If your startup script calls installers or utilities, ensure that you package and stage the required files separately.
As examples, the following procedures illustrate how to stage various types of application files to Windows Azure storage.
Note
You do not need an elevated command prompt (run as Administrator) to run hpcpack create. However, hpcpack upload requires elevation. To perform the following procedures, run the commands in an elevated command prompt window.
If the SOA service is not already registered and deployed to the on-premises cluster, register the SOA service by placing a copy of the service configuration file in the service registration folder on the head node (typically this is %CCP_HOME%\ServiceRegistration). For detailed information, see Deploy and Edit the Service Configuration File.
Copy the service configuration file, the service assembly, and any dependent DLLs to an empty folder. For example, copy the files to a folder named C:\myFiles\myServiceFiles.
At an elevated command prompt, run hpcpack create and specify a name for your package and the folder that contains your service files.
Important
The name of the package must be the name of the SOA service (that is, the service name that the SOA client specifies in the
SessionStartInfo
constructor).For example, to package the content of C:\myFiles\myServiceFiles as myServiceName.zip (and save the package to a folder called AzurePackages):
hpcpack create C:\AzurePackages\myServiceName.zip C:\myFiles\myServiceFiles
Run hpcpack upload to upload the package to Windows Azure storage by using the following command, where myHeadNode is the name of your head node, and myAzureTemplate is the name of the template that you used to deploy the Windows Azure nodes. For example:
hpcpack upload C:\AzurePackages\myServiceName.zip /nodetemplate:myAzureNodeTemplate /scheduler:myHeadNode
If the XLL or workbook has dependencies on DLLs or other files, copy the XLL or workbook and its dependencies to a folder, such as c:\myFiles\myExcelFiles.
At an elevated command prompt, run hpcpack create to package your XLL or workbook. Specify a name for the package and specify the XLL or workbook. The name of the package must be the name of the XLL file or of the Excel workbook.
For example, if your XLL or workbook has dependencies, package the entire folder (and save the package to a folder called AzurePackages):
hpcpack create C:\AzurePackages\myXLL.zip C:\myFiles\myExcelFiles
If your XLL or workbook has no dependencies, you can package it directly. For example, to package C:\myFiles\myXLL.xll as myXLL.zip:
hpcpack create C:\AzurePackages\myXLL.zip C:\myFiles\myXLL.xll
Run hpcpack upload to upload the package to Windows Azure storage by using the following command, where myHeadNode is the name of your head node, and myAzureTemplate is the name of the template that you used to deploy the Windows Azure nodes. For example:
hpcpack upload C:\AzurePackages\myXLL.zip /nodetemplate:myAzureNodeTemplate /scheduler:myHeadNode
Copy the executable and any dependencies or DLLs to a folder, such as C:\myFiles\myAppFiles.
At an elevated command prompt, run hpcpack create to package your application files. Specify a name for your package and specify the folder that contains your application files.
For example, to package the content of c:\myFiles\myAppFiles as myApp.zip (and save the package to a folder called AzurePackages):
hpcpack create c:\AzurePackages\myApp.zip c:\myFiles\myAppFiles
Upload the package to Windows Azure storage by using the following command, where myHeadNode is the name of your head node, and myAzureTemplate is the name of the template that you used to deploy the Windows Azure nodes. Specify a relative path for the application files. For example:
hpcpack upload c:\AzurePackages\myApp.zip /scheduler:myHeadNode /nodetemplate:myAzureTemplate /relativepath:myApp
Packages that are staged to Windows Azure storage are automatically deployed to new node instances. You can manually deploy packages - for example, to verify that you have all the necessary dependencies in a package before automating the deployment to all new nodes, or to deploy packages to nodes that are already running. You can use clusrun and hpcsync to deploy the files from the Windows Azure storage account to the Windows Azure nodes.
For example:
clusrun /nodegroup:AzureWorkerNodes hpcsync
To see a list of folders or files that have been deployed to the Windows Azure nodes, you can run the following command:
clusrun /nodegroup:AzureWorkerNodes dir %CCP_PACKAGE_ROOT% /s
If your HPC application requires file access, the following are options for accessing files from the applications that are deployed to Windows Azure nodes.
Option | Prerequisites | Notes |
---|---|---|
Windows Azure drive |
Administrator configures and mounts an application VHD on Windows Azure nodes. |
|
File server on Windows Azure Virtual Machine |
Administrator configures a Windows Azure virtual machine instance, attaches a data disk to the virtual machine, enables the File Server role, and creates a file share folder. |
|
Mirror local files to Windows Azure blob storage |
Administrator uses a Windows Azure storage tool such as AzCopy to mirror on-premises files to a container in Windows Azure blob storage. See AzCopy – Uploading/Downloading files for Windows Azure Blobs. |
|
Access Windows Azure blob storage directly |
Application is architected to perform data access operations directly on Windows Azure blobs |
|