This topic has not yet been rated - Rate this topic

Deploying a Windows HPC Server Cluster to Run Jobs Using the LINQ to HPC Components (Preview)

Updated: November 14, 2011

Applies To: Windows HPC Server 2008 R2

[This topic is pre-release documentation.]

This topic provides information about deploying an on-premises Windows® HPC Server 2008 R2 cluster that can perform LINQ to HPC jobs by using the LINQ to HPC components (Preview). LINQ to HPC jobs run applications that are based on the Language-Integrated Query (LINQ) technology. These jobs are performed on a group of compute nodes that are registered with a distributed storage catalog (DSC) that runs on the head node of the cluster. Using the nodes in the DSC, you can store, manipulate, and analyze very large data sets.

noteNote
LINQ to HPC is a preview, not a production release. For more information, see the Windows HPC team blog.

ImportantImportant
To run LINQ to HPC jobs, the head node and the compute nodes in your cluster must be upgraded to at least Microsoft® HPC Pack 2008 R2 with Service Pack 3 (SP3).

For information about developing LINQ to HPC applications, see the LINQ to HPC Programmer’s Guide.

In this topic:

The following are guidelines for a Windows HPC Server 2008 R2 cluster that can run LINQ to HPC jobs. These guidelines supplement the general guidance in the Design and Deployment Guide for Windows HPC Server 2008 R2.

HPC Pack 2008 R2 with SP3

  • The head node must be running or updated to Service Pack 3 (SP3). For more information about SP3, see Release Notes for Microsoft HPC Pack 2008 R2 Service Pack 3.

  • If you updated the head node from HPC Pack 2008 R2 with SP2 to HPC Pack 2008 R2 with SP3, you must separately run the installation program for the HPC Pack 2008 R2 2008 R2 LINQ to HPC components (Preview) (DISC_x64.msi). This installation program configures the necessary DSC databases and services for LINQ to HPC jobs. The installation program is included in the HPC Pack 2008 R2 SP3 download package available at the Microsoft Download Center. Save the installation program to a network or other location to install it on the head node and compute node computers in your cluster. For more information, see Install LINQ to HPC components on the cluster in this topic.

  • Although the LINQ to HPC components (Preview) can be installed on a head node configured for high availability in a failover cluster, running LINQ to HPC jobs in this configuration is not supported.

HPC databases

  • The SQL Server instance or instances for the HPC databases can be installed either on the head node or on a remote server.

  • LINQ to HPC jobs and operations require a large number of transactions in the HPC databases. If the HPC databases are installed in the default SQL Server Express instance on the head node, this may affect the performance of the head node in larger deployment.

  • If you are deploying a new cluster, consider installing the HPC databases on one or more remote servers. For detailed information and step-by-step procedures for installing the HPC databases on remote servers, see the Deploying an HPC Cluster with Remote Databases Step-by-Step Guide.

Network configuration

  • The head node, compute nodes, and client computers that submit LINQ to HPC jobs must connect in a topology that allows them to access all the resources they need to interact with. You should generally choose a topology that includes the Enterprise network (topology 2, 4, or 5). For information about each network topology and each HPC cluster network, see HPC Cluster Networking in the Design and Deployment Guide for Windows HPC Server 2008 R2.

  • If a private network connects the compute nodes, Gigabit Ethernet or faster is recommended.

    noteNote
    Network capacity can limit the performance of LINQ to HPC jobs.

Job scheduler configuration

  • In Policy Configuration, the Scheduling Mode should be set to Queued. For more information about the scheduling mode, see Policy Configuration.

  • You must use the default job template to submit LINQ to HPC jobs. For more information, see Verify the default job template in this topic.

Other configuration

  • If you need to stage LINQ to HPC job resources on the Runtime$ file share for the cluster (installed by default on the head node), the file share must be configured with enough capacity. By default, the Runtime$ file share on the head node is configured with a quota of 25 GB. You can use File Server Resource Manager on the head node to configure this quota.

The following are guidelines for the compute nodes that will run LINQ to HPC jobs. For general compute node requirements in a Windows HPC Server 2008 R2 cluster, see Prepare for your Deployment in the Design and Deployment Guide for Windows HPC Server 2008 R2.

General considerations

  • LINQ to HPC jobs can only run on compute nodes (and the head node, if it has the compute node role enabled). The compute nodes that run the LINQ to HPC jobs must be added to the DSC as storage nodes, as described in Add compute nodes to the DSC in this topic. Although your cluster can include other nodes such as workstation nodes, Windows Communication Foundation (WCF) broker nodes, and Windows Azure nodes, you cannot use these nodes to run LINQ to HPC jobs in an on-premises HPC cluster.

  • If you are deploying an HPC cluster that is not dedicated to LINQ to HPC jobs, ensure that you create a node group for the compute nodes that will be added to the DSC and to which you will submit LINQ to HPC jobs. See Create a node group for LINQ to HPC jobs in this topic.

  • You can use any supported method for compute node deployment in Windows HPC Server 2008 R2, including deploying nodes from bare metal, using a node XML file, and adding preconfigured nodes. For more information about these methods, see Add Nodes to the Cluster.

Hardware guidelines

  • RAM: 4-8 GB minimum is recommended

    noteNote
    The effect of RAM on performance depends on the LINQ to HPC job and the underlying application. In some LINQ to HPC jobs, more than 4 GB of RAM may not improve performance.

  • Available disk space: 200 GB minimum , but larger amounts are strongly recommended

    • The appropriate amount of hard disk drive storage depends on the amount of data that you expect to process. Production compute nodes should typically have 1-3 TB of hard disk drive space each.

    • For greater performance and reliability, we recommend that you configure hard disk drive storage on each node as a redundant array of independent disks (RAID).

    • Each node added to the DSC is configured with two file shares, HpcData and HpcTemp. These are used to store the DSC data and temporary files associated with LINQ to HPC jobs. The recommended configuration for these shares is to associate both shares with one volume where that volume is constructed using software striping across multiple physical disks. The striped physical disks should be different from the system disk. For more information about the HpcData and HpcTemp shares, see Data management on the nodes in this topic.

      noteNote
      Assigning the HpcData and HpcTemp shares to the same partition is recommended, but can lead to small inaccuracies in the size of the available free space calculated by the DSC. This is because the DSC assumes that the whole partition is allocated to the DSC data and no other files are present. Typically HpcTemp is small, relative to HpcData, so this inaccuracy is insignificant.

Software requirements

Other configuration

  • Windows Firewall ports 8050 and 8051 must be open for inbound TCP traffic on each compute node. A firewall rule is automatically configured to allow this traffic when the LINQ to HPC Preview components are installed on a compute node.

  • If you are adding compute nodes that run the Windows Server 2008 operating system to the DSC, you must enable the remote administration exception in Windows Firewall on each node.

The following are specific requirements for a client computer that will submit LINQ to HPC jobs. For general requirements for a client computer in a Windows HPC Server 2008 R2 cluster, see Prepare for your Deployment in the Design and Deployment Guide for Windows HPC Server 2008 R2.

  • HPC Pack 2008 R2 with SP3 client utilities are installed.

  • You must install the LINQ to HPC Preview components on each client computer. For more information, see Install LINQ to HPC Preview components on the cluster in this topic.

  • To run a LINQ to HPC job from a client computer, you need user or administrator permissions on the cluster or must be a member of a domain group that has been added as a user or administrator on the cluster.

  • Optionally, if you are programming a LINQ to HPC application, Microsoft Visual Studio® 2010 is installed.

    ImportantImportant
    .NET Framework 3.5 and .NET Framework 4.0 are both supported for building a LINQ to HPC application. If your LINQ to HPC application makes use of .NET 4.0, then the .NET Framework 4.0 must be installed on each compute node that is registered with the DSC.

To install the HPC Pack 2008 R2 LINQ to HPC Preview components on a head node, compute nodes, or client computer that has been updated from HPC Pack 2008 R2 with SP2 to HPC Pack 2008 R2 with SP3 , run the installation program that is appropriate for the operating system on each computer (DISC_x64.msi or DISC_x86.msi). You can use the installation wizard to install the Preview components on a single head node, compute node, or client computer. To install the components on multiple compute nodes in your HPC cluster, you can use a clusrun command to run DISC_x64.msi with unattended installation. You can also use other methods, including using a task in a node template.

ImportantImportant
You do not have to install the LINQ to HPC Preview components separately on a computer where you have performed a clean installation of HPC Pack 2008 R2 with SP3. The LINQ to HPC Preview components are installed automatically when you perform this installation.

  1. On the computer, run DISC_x64.msi (or DISC_x86.msi, if appropriate) from the location where you downloaded the file.

  2. On the Getting Started page, click Next.

  3. On the Microsoft Software License Terms page, read or print the software license terms in the license agreement, and accept or reject the terms of that agreement. If you accept the terms, click Next.

  4. On the Installation Folder page, type the path or browse to the folder where you want to install the HPC Pack 2008 R2 LINQ to HPC Preview components.

  5. Continue to follow the steps in the installation wizard.

  1. Copy DISC_x64.msi to a network share.

  2. On the head node or client computer that you use to manage the cluster, open an elevated command prompt. click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

  3. Type a command similar to the following:

    clusrun /nodegroup:computenodes \\<Share>\DISC_x64.msi –unattend –computenode:<HeadNode>
    

    where <Share> is the name of the network share

    <HeadNode> is the name of the head node computer of the HPC cluster.

For more information about running clusrun, see clusrun.

If you are deploying a test cluster for a proof of concept of LINQ to HPC features, it may be easiest to add all compute nodes to the DSC and use the cluster only to submit LINQ to HPC jobs. Alternatively, you can ensure that LINQ to HPC jobs and other types of jobs do not run at the same time. LINQ to HPC runs tasks on persistent data that is stored on the compute nodes that are registered with the DSC. It schedules jobs so that a node runs only a single task at a time. This means that jobs are scheduled sequentially on the cluster.

If you are using an HPC cluster that is not dedicated to LINQ to HPC jobs, we strongly recommend that you create a custom node group for the compute nodes that will be added as storage nodes to the DSC and that will run LINQ to HPC jobs. This ensures that you run LINQ to HPC jobs only on the storage nodes. If you run these jobs on a mixture of compute nodes (that is, both storage nodes and nodes that are not configured as storage nodes), the jobs might fail. For information about creating a node group, see Grouping Nodes.

You can name the group according to your preference, but a descriptive name such as LinqToHpcNodes is suggested. The sample code that accompanies the HPC Pack 2008 R2 with SP3 SDK, available from the Microsoft Download Center, refers by default to a node group named LinqToHpcNodes to calculate the number of available DSC nodes.

Before you run LINQ to HPC jobs, you must add the compute nodes that you want to function as storage nodes to the DSC. To do this, you can use the dsc node add command to add a single node at a time, or use the dsc-nodes-add PowerShell function (provided separately in the admin.ps1 HPC PowerShell script) to add a group of nodes.

noteNote
  • If you are adding compute nodes that run the Windows Server 2008 operating system to the DSC, ensure that the remote administration Windows Firewall exception is configured. If you need to configure the exception on the compute nodes in the LinqToHpcNodes group, you can run a clusrun command similar to the following:

    clusrun /nodegroup:LinqToHpcNodes netsh firewall set service type=remoteadmin
    
  • To function properly, the DSC needs at least as many nodes as the cluster replication factor. The default cluster replication factor is 3, so a minimum of 3 nodes must be added to the DSC in order to create file sets in it.

  • 256 nodes is the maximum number of DSC nodes that is supported in Windows HPC Server 2008 R2 with SP3.

  • See Data management on the nodes in this section for information about the HpcData and HpcTemp file shares that are configured on each compute node that is added to the DSC.

  • If you are adding a node to the DSC that already has the HpcData and HpcTemp shares configured, it is recommended that you manually delete all the data in the existing shares before you recreate them by running dsc node add. Otherwise, the dsc node add action can take a long time.

  1. Log on to the head node or client computer as an HPC administrator.

  2. Open a command prompt:

    • If you are on the head node, click Start, point to All Programs, click Accessories, and then click Command Prompt.

    • If you are on a client computer, click Start, point to All Programs, click Accessories, right-click Command Prompt, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

  3. To add the compute node MyComputeNode to the DSC specifying a local data path C:\LinqHPC\Data, local temp path C:\LinqHPC\Temp, and cluster head node MyHeadNode, type the following command:

    dsc node add MyComputeNode /datapath:C:\LinqHPC\Data /temppath:C:\LinqHPC\Temp /service:MyHeadNode
    

    where

    /datapath sets the local path for the directory that is used by the HpcData file share on the node.

    /temppath sets the local path that is used by the HpcTemp file share on the node.

    /service specifies the computer where the DSC service instance is installed. Specify /service if the CCP_SCHEDULER environment variable is not set on the computer where you are running the command. On the head node computer, CCP_SCHEDULER is set by default to the name of the head node. The CCP_SCHEDULER variable is not set by default on a client computer.

    noteNote
    • After you add a node running the Windows Server 2008 operating system to the DSC, you must restart the node.

    • A node that is already in the DSC must be removed (using the dsc node remove command) before it can be added again. The dsc node add command may not function as expected if it is run on a node that is already added to the DSC.

    For more information about dsc node add, type dsc node help, or see the LINQ to HPC Programmer’s Guide.

  1. Log on to the head node or client computer as an HPC administrator.

  2. Start HPC PowerShell:

    • If you are on the head node, click Start, point to All Programs, click Microsoft HPC Pack, right-click HPC PowerShell, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

    • If you are on a client computer, click Start, point to All Programs, click Microsoft HPC Pack, and then click HPC PowerShell.

  3. To add all the nodes in the LinqToHpcNodes group to the DSC, specifying a local data path C:\LinqHPC\Data and a local temp path C:\LinqHPC\Temp, type the following script:

    $nodes = get-hpcnode -groupname "LinqToHpcNodes"
    foreach ($n in $nodes) 
    {
      $name = $n.NetBiosName
      dsc node add $name /temppath:c:\LinqHpc\Temp /datapath:c:\LinqHpc\Data /service:MyHeadNode
    }
    
    
  1. Log on to the head node or client computer as an HPC administrator.

  2. Start HPC PowerShell:

    • If you are on the head node, click Start, point to All Programs, click Microsoft HPC Pack, right-click HPC PowerShell, and then click Run as administrator. If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.

    • If you are on a client computer, click Start, point to All Programs, click Microsoft HPC Pack, and then click HPC PowerShell.

  3. If you have not already done so, download and extract the Admin.ps1 script.

  4. Change directory to the location of the Admin.ps1 script.

  5. Load the script by typing the following:

    . .\Admin.ps1

  6. To add all the nodes in the LinqToHpcNodes node group to the DSC, specifying a local data path C:\LinqHPC\Data and a local temp path C:\LinqHPC\Temp, type the following function command:

    dsc-nodes-add LinqToHpcNodes /temppath:c:\LinqHpc\Temp /datapath:c:\LinqHpc\Data
    

Usage of the HpcData share is governed by the size of file sets created by users and the replication factor used. It is recommended that the default replication factor of 3 is used. Replication factors of 1 - 4 are supported depending on the desired tradeoff between storage overhead and tolerance to node failures. In addition to file sets created by users, additional temporary file sets are created during query execution. These files sets have a lease time of 24 hours, after which they are automatically removed.

The HpcTemp share is used to store information related to LINQ to HPC queries. Information is stored in a UserName\jobID folder for each job. This share is automatically cleaned up, and files relating to jobs older than 24 hours are removed. Typically each job will create megabytes of associated data.

noteNote
On-premises, some management and security utilities (such as firewall and antivirus technologies) may have some performance overhead. Customers who want to maximize performance can configure these utilities to reduce their impact on performance. For example, because LINQ to HPC primarily reads and writes data to the HpcData and HpcTemp shares, some customers may want to exclude these shares from antivirus scanning. In these cases, customers have other options for restricting access to these shares, such as using NTFS ACLs.

LINQ to HPC jobs can only be submitted using the default job template (the template named Default). You cannot submit a LINQ to HPC job using a different job template.

When you submit LINQ to HPC jobs, the values of certain job properties in the default template are set or can be overridden as shown in the following table. If a job property is not shown in the table, it is set to its default configuration in the default template. For information about job template properties, see Job Template Properties in the Design and Deployment Guide for Windows HPC Server 2008 R2.

 

Job property Notes

Auto Calculate Maximum

Default value is set to False

Auto Calculate Minimum

Default value is set to False

Fail on Task Failure

Default value is set to False

Job Name

Can be set by the LINQ to HPC application or user

Licenses

N/A

Maximum Cores

N/A

Maximum Nodes

Can be set by the LINQ to HPC application or user

Maximum Sockets

N/A

Minimum Cores

N/A

Minimum Nodes

Can be set by the LINQ to HPC application or user

Minimum Sockets

Ignored

Node Groups

Can be set by the LINQ to HPC application or user

Node Ordering

Ignored

Preemptable

Default value is set to false

Priority

Ignored

Project

Ignored

Requested Nodes

Ignored

Run Time

Can be set by the LINQ to HPC application or user

Run Until Canceled

Default value is set to False

Service Name

Ignored

Unit Type

Default value is set to Node

ImportantImportant
In addition, when you submit a LINQ to HPC job, you must select node as the type of resource for the job. You cannot select another resource type.

To verify that the cluster, DSC, and the storage nodes are configured properly, you can compile and run a sample project such as the Histogram sample. This sample counts the occurrences of words in a DSC file set that contains text files. It first loads the example data onto the DSC. It then executes a query that breaks up each line into words, and then counts the occurrences of each word. It then returns an ordered list of the 200 most frequently used words.

Did you find this helpful?
(1500 characters remaining)
© 2013 Microsoft. All rights reserved.