Windows Server
This topic has not yet been rated - Rate this topic

Database Capacity Planning and Tuning in Windows HPC Server 2008 R2

Applies To: Windows HPC Server 2008 R2

The Windows HPC management features in Microsoft® HPC Pack 2008 R2 rely on four databases to support management, job scheduling, diagnostic, and reporting functionality. When you install HPC Pack 2008 R2 on a server to create a head node, the default setup installs the Express edition of Microsoft SQL Server 2008 (if no other edition of SQL Server is detected) and creates the four databases on the head node. The Express edition has no additional licensing fees, and is included to provide out-of-box experience for proof-of-concept or development clusters, and for smaller production clusters. Depending on the size, throughput, and requirements of your cluster, you can install a different edition of SQL Server or install the databases on remote servers. The information in this document is intended to help you determine the database configuration and additional tuning options that are appropriate for your cluster.

In this topic:

Before considering the editions of SQL Server or performance tuning options, it is helpful to understand the three basic options for database setup.

SQL Express on the head node

This is the out-of-box experience. This is typically used for proof-of-concept or development clusters, or for smaller production clusters. SQL Server 2008 R2 Express allows larger database sizes than the SQL Server 2008 Express edition, and is automatically included if you create a new head node with the HPC Pack 2008 R2 SP2 installer. The basic steps for this setup are as follows:

  1. Install HPC Pack on a server to create a head node.

  2. Optionally, specify database and log file locations in the installation wizard (or accept defaults).

  3. SQL Express is installed automatically, and the HPC databases are created automatically.

  4. Deploy nodes.

noteNote
HPC Pack 2008 R2 (version 3.0.xxxx) ships with the Express edition of SQL Server 2008. With this edition, each database is limited to 4 GB. Service Pack 2 of HPC Pack 2008 R2 (version 3.2.xxxx) ships with the Express edition of SQL Server 2008 R2. With this edition, each database is limited to 10 GB. If you install SP2 as a patch, no changes are made to the edition of SQL server. If you install SP2 to create a new head node, then SQL Server 2008 R2 Express is the default SQL edition that will be used if no other edition is detected during setup.

SQL Standard on the head node

This is a basic configuration for medium size clusters. SQL Server Standard edition (or another full edition, not Compact) allows larger databases and additional management abilities to support more nodes and higher job throughput. The basic steps for this setup are as follows:

  1. Install SQL Server Standard edition (2008 or 2008 R2) on the server that will act as the head node.

  2. Install HPC Pack on the server to create a head node.

  3. Optionally, specify database and log file locations in the installation wizard (or accept defaults).

  4. The HPC databases are created automatically.

  5. Optionally, tune databases as required using SQL Server Management Studio.

  6. Deploy nodes.

Remote databases (SQL Standard or Express)

Installing one or more of the HPC databases on a remote server is a recommended configuration for larger clusters or for clusters that are configured for high-availability of the head node. For more information, see Deploying a Windows HPC Server 2008 R2 Cluster with Remote Databases Step-by-Step Guide. For high-availability head nodes, you would typically use the Standard edition of SQL Server to support high-availability of the databases (which is different than high-availability of the HPC management services). For more information, see Configuring Windows HPC Server 2008 R2 for High Availability. The basic steps for this setup are as follows:

  1. Install SQL Server (2008 or 2008 R2) on a remote server.

  2. Create the remote HPC databases manually and tune as required using SQL Server Management Studio.

  3. Install HPC Pack on a server to create a head node.

  4. Specify the connection information for the remote databases in the installation wizard.

  5. Deploy nodes.

The following general guidelines can help you determine what edition of SQL Server to use for your cluster. The node and job throughput numbers are meant as general guidelines only, as performance will vary according to the hardware and topology that you select for the cluster, and the workload that your cluster supports.

Consider using the Standard edition (or another full edition, not Compact) of SQL Server 2008 or SQL Server 2008 R2 if any of the following conditions apply:

  • The cluster has many nodes. Information such as node properties, configurations, metrics and performance history are stored in the databases. Larger clusters require more room in the databases. As a general guideline, the Express edition is sufficient for up to 64 nodes with SQL Server Express 2008, or up to 128 nodes with SQL Server Express 2008 R2.

  • The cluster supports a very high rate of job throughput. For example, greater than 10,000 tasks/subtasks per day.Every job, task, and subtask has entries in the database to store properties and allocation information and history. The default retention period for this data is 5 days. You can adjust the retention period to reduce your capacity requirements (see HPC data retention settings in this topic).

  • The cluster is configured for high-availability of the head node and you also want to configure high-availability for SQL Server.

  • You need to store job and task data in the scheduler database for an extended period of time and will exceed the 4 GB limit imposed by SQL Server Express 2008 or the 10 GB limit imposed by SQL Server Express 2008 R2.

  • You use the reporting database heavily, and possibly use the data extensibility features for custom reporting. For information about disabling the reporting extensibility and reducing the size requirements for the reporting database, see HPC data retention settings in this topic.

  • You require the additional reliability, performance, and flexibility provided by the SQL Server Management Studio tools (including support for maintenance plans). For example, the SQL Standard provides the following features (among others) that can be helpful to HPC Server administrators:

    • Unlimited database size

    • Support for high-availability configurations

    • Unlimited RAM usage for database caching

noteNote
SQL Server Management Studio is not automatically included with the Express editions of SQL Server. You can download it separately if you want to change settings for your HPC databases.

This section includes some guidelines and best practices for performance tuning the HPC databases. Sample configuration settings for a larger scale cluster are outlined in the list below. More information about these options is provided in the sections that follow.

  • On a server with three platters (physical disks) configure:

    • The operating system on a dedicated platter.

    • The four databases on a dedicated platter.

    • The four database log files on a dedicated platter.

  • In SQL Server Management Studio configure:

    • Scheduler database: Initial size 30 GB, grow rate 100%

    • Scheduler database logs: Initial size 2 GB

    • Management database: Initial size 20 GB, grow rate 100%

    • Management database logs: Initial size 2 GB

    • Reporting database: Initial size 30 GB, grow rate 100%

    • Reporting database logs: Initial size 2 GB

    • Diagnostics database and logs: Use defaults

  • In SQL Server Management Studio, configure the memory for the database to 8-10 GB (based on a 16 GB head node).

  • In SQL Server Management Studio, set the parallelization flag to 1 (0 is the default).

By default in SQL Server Standard, the SQL Server recovery model for each database is set to Full. This model can cause the log files to grow very large. To reclaim log space and keep the disk space requirements small, you can change the recover model for each of the databases to Simple. The recovery model that you select depends on your recovery requirements. If you use the Full model, ensure that you plan enough space for the log files. For more information, see Recovery Model Overview.

Autogrowth means that when a database or log file runs out of space, it will automatically increase its size by a predefined percentage (as defined by the autogrowth parameter). During the autogrowth process, the database is locked. This impacts cluster operations and performance, and can cause operation deadlocks and timeouts. Pre-sizing your databases helps you avoid these performance issues, and by configuring a larger autogrowth percentage, you reduce the frequency of the autogrowth operations. Each database has an associated log file. You can also tune the initial size and autogrowth settings of the log files.

The default configurations for the databases and log files (regardless of SQL edition) are as follows:

 

HPC DB and log Initial size (MB) Autogrowth

Management

Database: 1024

Log: 128

Database: 50%

Log: 50%

Scheduling

Database: 256

Log: 64

Database: 10%

Log: 10%

Reporting

Database: 128

Log: 64

Database: 10%

Log: 10%

Diagnostics

Database: 256

Log: 64

Database: 10%

Log: 10%

As an example, the following table lists initial size and autogrowth settings that might be appropriate for a cluster with 500 or more nodes:

noteNote
The initial size in this table is expressed in gigabytes (GB), not megabytes (MB) as in the previous table.

 

HPC database and log Initial size (GB) Autogrowth

Management

Database: 20

Log: 2

Database: 100%

Log: 10%

Scheduling

Database: 30

Log: 2

Database: 100%

Log: 10%

Reporting

Database: 30

Log: 2

Database: 100%

Log: 10%

Diagnostics

Database: default

Log: default

Database: default

Log: default

The following screen snip illustrates the HPC databases in the SQL Server Management Studio, and the database properties dialog box that you can use to configure initial size and autogrowth for the databases.

Configure HPC Databases in SQL Management Studio

You can improve performance by creating the databases on a separate platter (physical disk) than the log files. This applies to databases that are on the head node and to remote databases. For databases on the head node, you can specify the database and log file locations during setup (in the installation wizard). Ideally, place the system partition, data, and logs on separate platters.

If reporting is heavily used, consider moving the reporting database to a separate platter.

To minimize memory paging, ensure that your SQL server instance has a sufficient allocation of memory. You can set the memory for your SQL Server instance through SQL Manager. For example, if your databases are on a head node that has 16GB of memory, you might want to allocate 8-10GB for SQL Server.

To minimize core contention on the head node between SQL processes and HPC processes, set parallelization flag for the SQL Server instance to 1. By default, the flag is set to 0, which means that there are no limits to how many cores SQL will use. By setting it to 1, you limit SQL process to 1 core.

Monitor index fragmentation using the SQL Server Management Studio and defragment indexes when appropriate through a maintenance plan.

HPC Scheduling database

Job properties, allocation, and history are stored in the HPC Scheduling database. By default, data about completed jobs is kept for five days. The job record retention period (TTLCompletedJobs) determines how long to store data for the following records:

  • Data about completed jobs (Finished, Failed, or Canceled) in the HPC Scheduling database.

  • Diagnostic test results and data in the HPC Diagnostics database.

  • Messages for completed durable sessions that are stored by the broker node using MSMQ.

Jobs that are in the Configuring state are not deleted from the database. The jobs must be canceled by the job owner or a cluster administrator (or completed in some other way) and then they will be deleted according to the job history policy.

You can configure this property by using the Set-HpcClusterProperty cmdlet. For example, to set the job record retention period to three days, run HPC PowerShell as an administrator and type the following cmdlet:

Set-HpcClusterProperty –TTLCompletedJobs 3

This property can also be configured in the Job History settings of the HPC Job Scheduler configuration dialog box. For more information, see Configure the HPC Job Scheduler Service.

HPC Management database

The node performance counter data is stored in the HCP Management database. By default, this data is kept for three days. The data retention period for the node performance data is defined by the MinuteCounterRetention cluster property. You can configure this property by using the Set-HpcClusterProperty cmdlet. For example, to retain performance counter data for one week, run HPC PowerShell as an administrator and type the following cmdlet:

Set-HpcClusterProperty –MinuteCounterRetention 7

HPC Reporting database

Historical data about the cluster such as cluster utilization, node availability, and job statistics is aggregated and stored in the HPC Reporting database. The database also stores raw data about jobs that is available to support custom reporting when data extensibility is enabled (it is enabled by default). For example, you can create custom charge back reports that correspond to the charging methods that are used by your organization. For information about using the raw data for custom reporting, see the Reporting Extensibility Step-by-Step Guide.

The following table describes the settings that control the data extensibility and retention periods for the raw data. These settings do not affect the aggregated data that is used for the built-in reports. You can view the values of the settings with the Get-HPCClusterProperty cmdlet and set the values with the Set-HpcClusterProperty cmdlet. For example, to disable data extensibility, run HPC PowerShell as an administrator and type the following cmdlet:

Set-HpcClusterProperty –DataExtensibilityEnabled $false

 

Setting Description

DataExtensibilityEnabled

Specifies whether the cluster stores information for custom reporting about jobs, nodes, and the allocation of jobs to nodes.

True indicates that the cluster stores information for custom reporting about jobs, nodes, and the allocation of jobs to nodes. False indicates that the cluster does not store this information. The default value is True.

DataExtensibilityTtl

Specifies the number of days that the HPCReporting database should store all of the information about jobs and nodes except for the allocation of jobs to nodes. This parameter has a default value of 365.

AllocationHistoryTtl

Specifies the number of days that the HPCReporting database should store information about the allocation of jobs to nodes. This parameter has a default value of 5.

ReportingDBSize

Contains the current size of the HPCReporting database. This value is a string that includes the units of measurement for the size. This parameter is read-only.

To view this property, the computer that is running HPC PowerShell must be able to access the HPCReporting database. For more information about enabling remote database access, see deploying a cluster with remote databases step-by-step guide (http://go.microsoft.com/fwlink/?LinkId=194989).

HPC Diagnostics database

Information and results from diagnostic test runs is stored in the HPC Diagnostics database. The job record retention period (TTLCompletedJobs) determines how long to store data about completed test runs.

A typical SQL Server maintenance plan covers the following:

  • Database backup

  • Consistency checks

  • Index defragmentation

We generally recommend that you rebuild your indexes after 250,000 jobs or one month (whichever is shorter), if not more often.  How often you do consistency checks and backups will depend on your business requirements. We recommend running maintenance only when there is little to no user activity, preferably during a scheduled downtime (especially for larger clusters), as it can severely impact job throughput and user experience.

noteNote
For important information about backing up and restoring the HPC databases, see Back Up and Restore in Windows HPC Server.

Did you find this helpful?
(1500 characters remaining)