Best practices for configuring and operating server clusters
Updated: January 21, 2005
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
Best practices for configuring and operating server clusters
The following guidelines will help you effectively use a server cluster:
Secure your server cluster.
To prevent your server cluster from being adversely affected by denial of service attacks, data tampering, and other malicious attacks, it is highly recommended that you plan for and implement the security measures detailed in Best practices for securing server clusters.
Check that your server cluster hardware is listed in the Windows Catalog.
For Windows Server 2003, Enterprise Edition and Windows Server 2003, Datacenter Edition, Microsoft supports only complete server cluster systems chosen from the Windows Catalog. To see if your system or hardware components, including your cluster disks, are compatible, see Support resources. For a geographically dispersed cluster, both the hardware and software configuration must be certified and listed in the Windows Catalog.
The network interface controllers (NICs) used in certified cluster configurations must be selected from the Windows Catalog.
It is recommended that your cluster configuration consist of identical storage hardware on all cluster nodes to simplify configuration and eliminate potential compatibility problems.
Partition and format disks before adding the first node to your cluster.
Partition and format all disks on the cluster storage device before adding the first node to your cluster. You must format the disk that will be the quorum resource. All partitions on the cluster storage device must be formatted with NTFS (they can be either compressed or uncompressed), and all partitions on one disk are managed as one resource and move as a unit between nodes.
Cluster disks on the cluster storage device must be partitioned as master boot record (MBR) and not as GUID partition table (GPT) disks.
Correctly set up your server cluster's networks.
Follow the guidelines below to reduce network problems in your server cluster:
Use identical network adapters in all cluster nodes, that is, make sure each adapter is the same make, model, and firmware version.
Use at least two interconnects. Although a server cluster can function with only one interconnect, at least two interconnects are necessary to eliminate a single point of failure and are required for the verification of original equipment manufacturer (OEM) clusters.
Reserve one network exclusively for internal node-to-node communication (the private network). Do not use teaming network adapters on the private networks.
Set the order of the network adapter binding as follows:
External public network
Internal private network (Heartbeat)
[Remote Access Connections]
- External public network
Manually set the speed and duplex mode for multiple speed adapters to the same values and settings. If the adapters are connected to a switch, ensure that the port settings of the switch match those of the adapters. For more information, see Change network adapter settings.
Use static IP addresses for each network adapter on each node.
For private networks, define the TCP/IP properties for static IP addresses following the guidelines at Private network addressing options. That is, specify a class A, B, or C private address.
Do not configure a default gateway or DNS or WINS server on the private network adapters. Also, do not configure private network adapters to use name resolution servers on the public network; otherwise, a name resolution server on the public network might map a name to an IP address on the private network. If a client then received that IP address from the name resolution server, it may fail to reach the address because no route from the client to the private network address exists.
Configure WINS and/or DNS servers on the public network adapters. If Network Name resources are used on the public networks, set up the DNS servers to support dynamic updates; otherwise, the Network Name resources may not fail over correctly. For more information, see Configure TCP/IP settings.
Configure a default gateway on the public network adapters. If there are multiple public networks in the cluster, configure a default gateway on only one of these. For more information, see Configure TCP/IP settings.
Clearly identify each network by changing the default name. For example, you could change the name of the private network connection from the default Local Area Connection to Private Cluster Network.
Change the role of the private network from the default setting of All communications (mixed network) to Internal cluster communications only (private network) and verify that each public network is set to All communications (mixed network). For more information, see Change how the cluster uses a network.
Place the private network at the top of the Network Priority list for internal node-to-node communication in the cluster. For more information, see Change network priority for communication between nodes.
Do not install applications into the default Cluster Group.
Do not delete or rename the default Cluster Group or remove any resources from that resource group.
The default Cluster Group contains the settings for the cluster and some typical resources that provide generic information and failover policies. This group is essential for connectivity to the cluster. It is therefore very important to keep application resources out of the default Cluster Group and so prevent clients from connecting to the Cluster Group's IP address and network name resources. If a resource for an application is added to this group and the resource fails, it may cause the cluster group to fail also, therefore reducing the overall availability of the entire cluster. It is highly recommended that you create separate resource groups for application resources.
Back up your server cluster.
To be able to effectively restore your server cluster in the event of application data or quorum loss, or individual node or complete cluster failure, follow these steps when preparing backups:
Perform an Automated System Recovery (ASR) backup on each node in the cluster.
Back up the cluster disks from each node.
Back up each individual application (for example, Microsoft Exchange Server or Microsoft SQL Server) running on the nodes.
By default, Backup Operators do not have the user rights necessary to create an Automated System Recovery (ASR) backup on a cluster node. However, Backup Operators can perform this procedure if that group is added to the security descriptor for the Cluster service. You can do that using Cluster Administrator or cluster.exe. For more information, see Give a user permissions to administer a cluster and Cluster.
For more information, see Backing up and restoring server clusters. For more information on backing up applications in a cluster, see the documentation for that application.
Maintain a backup of the RAID controller.
In a single quorum device server cluster, the RAID controller is a single point of failure. Always maintain a backup of the RAID controller configuration in case the RAID controller is replaced.
Do not use APM/ACPI Power saving features.
APM/ACPI Power saving features must not be enabled on server cluster members. A cluster member that turns off disk drives or enters "system standby" or "hibernate" mode can initiate a failure in the cluster. If multiple cluster nodes have power saving enabled, this can result in the entire cluster becoming unavailable.
Cluster members must use any power scheme that sets the Turn off hard disks option to Never, for example, the Always On power scheme. For more information on choosing a power scheme (located under Power Options in Control Panel), see Choose a power scheme.
For cluster nodes without Terminal Services installed, see Configure the Always On power scheme without Terminal Services installed.
For cluster nodes with Terminal Services installed, see Configure the Always On power scheme with Terminal Services installed.
Installing Terminal Services on a system reduces the power management options available to the user. The System standby and System hibernates options are not available.
Give the Cluster service account full rights to administer computer objects if Kerberos authentication is enabled for virtual servers.
If you enable Kerberos authentication for a virtual server's Network Name resource, the Cluster service account does not need full access rights to the computer object associated with that Network Name resource. The Cluster service can use the default access rights given to members of the authenticated users group, but certain operations (for example, renaming the computer object) will be restricted. It is recommended that you work with your domain administrator to set up appropriate administration rights and permissions for the Cluster service account.
For more information, see information about Kerberos authentication in Virtual servers.
Do not install scripts used by Generic Script resources on cluster disks.
It is recommended that you install script files used by Generic Script resources on local disks, not on cluster disks. Incorrectly written script files can cause the cluster to stop responding. Installing the script files on a local disk makes it easier to recover from this scenario. For guidelines on writing scripts for the Generic Script resource, see the Microsoft Platform Software Development Kit (SDK). For information on troubleshooting Generic Script resource issues, see article Q811685, "A Server Cluster with a Generic Script Resource Stops Responding" in the Microsoft Knowledge Base.