Determining failover and move policies for groups

Article
10/08/2009

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Determining failover and move policies for groups

For every resource group, the Cluster service maintains a prioritized list of the nodes that are supposed to act as its host. To fully exploit the processing power of a cluster, it is recommended that you understand how the Cluster service uses this prioritized list to move, failover, and failback resource groups. This will allow you to better balance groups among all nodes and maximize the performance of your server cluster.

Failover policies

You can assign the failover policies for each group of resources in your cluster. These policies determine exactly how a group behaves when failover occurs. You can choose which policies are most appropriate for each resource group you set up. For more information on setting failover policies, see Setting group properties.

Failover policies for groups include three settings:

Failover timing

Preferred nodes list

Failback timing

Failover timing

You can set a group for immediate failover when a resource fails, or you can instruct the Cluster service to try to restart the failed resource a number of times before failover occurs. If it is possible that the resource failure can be overcome by restarting all resources within the group, then set the Cluster service to restart the group.

Verify that the failover threshold does not exceed the number of nodes in the cluster. For example, if the cluster contains three nodes, set the failover threshold to three.

Preferred nodes list

The Cluster service maintains an ordered, internal list of available nodes for resource group moves, failovers, and failbacks. Optionally, you can set up a prioritized list of nodes that are preferred owners for a group so that the Cluster service can choose the best available node on that list rather than choose a node at random. This list, known as the preferred nodes list, is useful if one or more of the nodes is better equipped to host the group.

Important

When setting up a preferred nodes list for a resource group, it is highly recommended that you define a complete preferred nodes list, that is, one listing all nodes in your server cluster, in order of priority.

Failback timing

You can set a group to fail back to its preferred node as soon as the Cluster service detects that the failed node has been restored, or you can instruct the Cluster service to wait until a specified hour of the day, such as after peak business hours.

Important

Failback only occurs when you have defined a preferred nodes list for a resource group and failback is allowed for that resource group.

If you specify that a group failback to a preferred node and then restart the node to test the failback policy you set, the resource group will not failback. A resource group will not failback when a node is restarted after a planned shutdown and restart. To test the failback policy, you must press the reset button on the node. For more information on testing failback policies, see Test node failure.

Moving groups

You can use Cluster Administrator or cluster.exe to manually move a resource group from one node to another. If you specify the destination node for the move, the Cluster service will move the group to the node you specify. However, if you do not specify the destination node, but instead select the Best Possible option, the Cluster service will select a destination node. The logic that the Cluster service uses to select the destination node will be different for each of these scenarios:

You have specified a list of nodes as preferred owners for that resource group.
You have not specified a list of nodes as preferred owners for that resource group.

For more information on moving resource groups, see Move a group to another node.

The table below summarizes the move, failover, and failback logic that the Cluster service uses depending on whether the user defines a preferred nodes list.

Scenario	Move group	Resource or node failures	Failback enabled
Preferred Nodes List Defined	Fails over to the first node on the list that is available.	Fails over to the next node on the list.	Fails back to the original node.
Preferred Nodes List Not Defined	Fails over to a randomly selected node.	Fails over to a randomly selected node.	No failback.

The section below provides details on the logic behind manual moves, failovers, and failbacks.

Moves, failovers, and failbacks

Preferred nodes list not defined

If you have not set up a preferred owners list for a group, the Cluster service creates an internal list of nodes ordered by their node IDs. In general, this is the same as the order in which the nodes were added to the server cluster. It may, however, be ordered differently if nodes have been evicted from the server cluster. When a resource group or node fails and you have not defined a preferred list of owners for that group, the Cluster service randomly selects a node from the internal list of nodes and attempts to transfer the group to that node.

If you manually move a resource group off a node without specifying a destination node, the Cluster service transfers the group to a randomly selected node.

Preferred nodes list defined

If you define a complete preferred nodes list for a group (that is, one listing all the nodes in the cluster), then the Cluster service uses this defined list as its internal list. However, if you define a partial preferred nodes list for a group, then the Cluster service uses this defined list as its internal list and appends any other installed nodes not on the preferred list, ordered by their node IDs. For example, if you created a 5-node cluster (installing the nodes in the order Node1, Node2, Node3, Node4, and Node5) and defined Node3, Node4, and Node5 as preferred owners for the resource group, PRINTGR1, the Cluster service would maintain this ordered list for PRINTGR1: Node3, Node4, Node5, Node1, Node2. How the Cluster service uses this list depends on whether the resource group move is due to a resource/node failure or a manual move group request.

Important

When setting up a preferred nodes list for a resource group, it is recommended that you define a complete preferred nodes list, that is, one listing all nodes in your server cluster, in order of priority.

Preferred lists and resource or node failures

For resource group or node failures, the group fails over to the node next to the current owner on the preferred nodes list. In the example above, if the resource group PRINTGR1 on Node3 fails, then the Cluster service would fail that group over to the next node on the list, Node4. If you allow failback for that group, then when Node3 comes up again, the Cluster service will fail back PRINTGR1 to that node.

Preferred lists and manual moves

If you manually move a resource group without specifying a destination node, the Cluster service fails over that group to the first node listed on the preferred nodes list that is available.

Determining failover and move policies for groups