When deploying an application, it is always important to consider what demands it will make on a servers resources. With clustering, there is a related issue that also needs to be taken into account: how is the load re-distributed after a failover?
Consider one of the simplest cases, an active/active, 2-node file server cluster, with node A and node B each serving a single share. If node A fails, its resources will move to node B, placing an additional load on node B. In fact, if node A and B were each running at only 50% capacity before the failure, node B will be completely saturated (100% of capacity) after the failover is completed, and performance may suffer.
While this situation may not be optimal, it is important to remember that having all of the applications still running, even in a reduced performance scenario, is a 100% improvement over what you would have without the high availability protection that clusters provide. But this does bring up the notion of risk, and what amount of it you are willing to accept in order to protect the performance, and ultimately the availability, of your applications.
We have intentionally chosen the worst case (an active/active, 2-node cluster with each node running a single application that consumes half of the servers resources) for the purpose of clarity. With an additional node, the equation changes: there are more servers to support the workload, but if all three nodes are running at 50% capacity and there are two failures, the single remaining server will simply not be able to handle the accumulated load of the applications from both of the failed servers. Of course, the likelihood of two failures is considerably less than that of a single failure, so the risk is mitigated somewhat.
Nevertheless, the load/risk tradeoff must be considered when deploying applications on a cluster. The more nodes in a cluster, the more options you have for distributing the workload. If your requirements dictate that all of your clustered applications must run with no performance degradation, then you may need to consider some form of active/passive configuration. But even in this scenario, you must consider the risks of the various configurations. If you cannot accept even the slightest risk of any reduced performance under any conditions whatsoever, you will need a dedicated passive node for each active node.
If, on the other hand, you are convinced that the risk of multiple failures is small, you have other choices. If you have a 4-node, or 8-node cluster, you may want to consider an N+I configuration. N+I, which is discussed in more detail in section 1.4.3, is a variant of active/passive, where N nodes are active, and I nodes are passive, or reserve nodes. Typically, the value for I is less than the value for N, and an N+I cluster topology can handle I failures before any performance degradation is likely. The risk is that with more than I failures, performance will likely decline, but once again, the likelihood of multiple failures is increasingly remote.
For this reason, N+I clusters are a useful configuration that balances the hardware cost of having 100% passive server capacity against the low level risk of multiple cluster node failures.
Server Load Some More Realistic Configurations
The scenarios above were intentionally simplistic, assuming that one application imposed a monolithic load on each server, and thus its resource utilization could not be spread among more than one other server in the event of a failover. That is often not the case in the real world, especially for file and print servers, so we will take a look at some additional scenarios with a 4-node cluster, named ABCD, and having nodes A, B, C, and D.
Typically a single server will support the load of more than one application. If under normal conditions, each server was loaded at 25%, then the ABCD cluster could survive the loss of three members before a likely loss of application availability, which would be nearly a worst-case scenario.
The following series of figures illustrates what would happen with the application load in a 4-node cluster for successive node failures. The shaded, or patterned, areas indicate the capacity demands of the running applications. Further, the example below assumes that the application load on any given server is divisible, and can be redistributed among any of the surviving nodes.
Figure 1.1: Cluster under normal operating conditions (each node loaded at 25%)
Figure 1.2: Cluster after a single node failure. Note redistribution of application load.
Figure 1.3: Cluster after two node failures. Each surviving node is now approximately 50% loaded.
Figure 1.4: After three node failures, single surviving node is at full capacity.
If each node were running at 75% capacity, then without sensible failover policies, even a single node failure could result in loss of application availability. Depending on the application(s), however, you can specify that, in the event of a server failure, some percentage of the applications should fail over to node A, node B, node C, and node D. If the applications are spread evenly among the surviving nodes, then this cluster could now survive the loss of a single machine, because one third of the failed servers load (one third of 75% is 25%) is allocated to each of the surviving three machines. The result is three fully loaded servers (each of the nodes running at 75% capacity now have an additional 25%), but all applications are still available.
As a variation on the example that was illustrated previously, the following series of figures will depict what happens to a 4-node cluster, each of which is running at approximately 33% capacity. Further, in this case, the application load is indivisible, and can not be spread among multiple other servers in the event of a failover (perhaps it is a single application or multiple applications that depend on the same resource).
Figure 2.1: Cluster under normal operating conditions (each node loaded at approximately 33%)
Figure 2.2: Cluster after a single node failure. Note redistributed application load.
Figure 2.3: Cluster after second node failure. Each surviving node is now running at approximately 66% capacity. Note that in the event of another node failure, this cluster will no longer be capable of supporting all four of these applications.
Figure 2.4: After third failure, the single surviving server can only support three of the four applications.
Another style of failover policy is Failover Pairs, also known as Buddy Pairs. Assuming each of the four servers is loaded at 50% or less, a failover buddy can be associated with each machine. This allows a cluster in this configuration to survive two failures. More details on Failover Pairs can be found in section 1.4.1.
Taking the previous Failover Pair example, we can convert it to an active/passive configuration by loading two servers at 100% capacity, and having two passive backup servers. This active/passive configuration can also survive two failures, but note that under ordinary circumstances, two servers remain unused. Furthermore, performance of these servers at 100% load is not likely to be as good as with the Failover Pair configuration, where each machine is only running at 50% utilization. Between these two examples, note that you have the same number of servers, the same number of applications, and the same survivability (in terms of how many nodes can fail without jeopardizing application availability). However, the Failover Pair configuration clearly comes out ahead of active/passive in terms of both performance and economy.
Still using our ABCD 4-node cluster, consider what happens if we configure it as an N+I cluster (explained in more detail in section 1.4.3) where three nodes are running at 100% capacity, and there is a single standby node. This cluster can only survive a single failure. As before, however, comparing it to the example where each server is running at 75% capacity, you again have the same number of servers, applications, and same survivability, but the performance and economy can suffer when you have passive servers backing up active servers running at 100% load.
Failover is the mechanism that single instance applications and the individual partitions of a partitioned application typically employ for high availability (the term Pack has been coined to describe a highly available, single instance application or partition).
In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is to failover to the remaining node. As the size of a cluster increases, different failover policies are possible and each one has different characteristics.
Failover Pairs
In a large cluster, failover policies can be defined such that each application is set to failover between two nodes. The simple example below shows two applications App1 and App2 in a 4-node cluster.
Figure 8: Failover pairs
This configuration has pros and cons:
|
Pro
|
Good for clusters that are supporting heavy-weight2 applications, such as databases. This configuration ensures that in the event of failure, two applications will not be hosted on the same node.
|
|
Pro
|
Very easy to plan capacity. Each node is sized based on the application that it will need to host (just like a 2-node cluster hosting one application).
|
|
Pro
|
Effect of a node failure on availability and performance of the system is very easy to determine.
|
|
Pro
|
Get the flexibility of a larger cluster. In the event that a node is taken out for maintenance, the buddy for a given application can be changed dynamically (may end up with standby policy below).
|
|
Con
|
In simple configurations, such as the one above, only 50% of the capacity of the cluster is in use.z
|
|
Con
|
Administrator intervention may be required in the event of multiple failures.
|
Failover pairs are supported by server clusters on all versions of Windows by limiting the possible owner list for each resource to a given pair of nodes.
Hot-Standby Server
To reduce the overhead of failover pairs, the spare node for each pair may be consolidated into a single node, providing a hot standby server that is capable of picking up the work in the event of a failure.
Figure 9: Standby Server
The standby server configuration has pros and cons:
|
Pro
|
Good for clusters that are supporting heavy-weight applications such as databases. This configuration ensures that in the event of a single failure, two applications will not be hosted on the same node.
|
|
Pro
|
Very easy to plan capacity. Each node is sized based on the application that it will need to host, the spare is sized to be the maximum of the other nodes.
|
|
Pro
|
Effect of a node failure on availability and performance of the system is very easy to determine.
|
|
Con
|
Configuration is targeted towards a single point of failure.
|
|
Con
|
Does not really handle multiple failures well. This may be an issue during scheduled maintenance where the spare may be in use.
|
Server clusters support standby servers today using a combination of the possible owners list and the preferred owners list. The preferred node should be set to the node that the application will run on by default and the possible owners for a given resource should be set to the preferred node and the spare node.
N+I
Standby server works well for 4-node clusters in some configurations, however, its ability to handle multiple failures is limited. N+I configurations are an extension of the standby server concept where there are N nodes hosting applications and I nodes which are spares.
Figure 10: N+I Spare node configuration
N+I configurations have the following pros and cons:
|
Pro
|
Good for clusters that are supporting heavy-weight applications such as databases or Exchange. This configuration ensures that in the event of a failure, an application instance will failover to a spare node, not one that is already in use.
|
|
Pro
|
Very easy to plan capacity. Each node is sized based on the application that it will need to host.
|
|
Pro
|
Effect of a node failure on availability and performance of the system is very easy to determine.
|
|
Pro
|
Configuration works well for multiple failures.
|
|
Con
|
Does not really handle multiple applications running in the same cluster well. This policy is best suited to applications running on a dedicated cluster.
|
Server clusters supports N+I scenarios in the Windows Server 2003 release using a cluster group public property AntiAffinityClassNames. This property can contain an arbitrary string of characters. In the event of a failover, if a group being failed over has a non-empty string in the AntiAffinityClassNames property, the failover manager will check all other nodes. If there are any nodes in the possible owners list for the resource that are NOT hosting a group with the same value in AntiAffinityClassNames, then those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that contain the same value in the AntiAffinityClassNames property, then the preferred node list is used to select a failover target.
Failover Ring
Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the application on the failed node is moved to the next node in sequence.
Figure 11: Failover Ring
This configuration has pros and cons:
|
Pro
|
Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.
|
|
Pro
|
Effect on performance of a node failure is easy to predict.
|
|
Pro
|
Easy to plan capacity for a single failure.
|
|
Con
|
Configuration does not work well for all cases of multiple failures. If Node 1 fails, Node 2 will host two application instances and Nodes 3 and 4 will host one application instance. If Node 2 then fails, Node 3 will be hosting three application instances and Node 4 will be hosting one instance.
|
|
Con
|
Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.
|
Failover rings are supported by server clusters on the Windows Server 2003 release. This is done by defining the order of failover for a given group using the preferred owner list. A node order should be chosen and then the preferred node list should be set up with each group starting at a different node.
Random
In large clusters or even 4-node clusters that are running several applications, defining specific failover targets or policies for each application instance can be extremely cumbersome and error prone. The best policy in some cases is to allow the target to be chosen at random, with a statistical probability that this will spread the load around the cluster in the event of a failure.
Random failover policies have pros and cons:
|
Pro
|
Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.
|
|
Pro
|
Does not require an administrator to decide where any given application should failover to.
|
|
Pro
|
Provided that there are sufficient applications or the applications are partitioned finely enough, this provides a good mechanism to statistically load-balance the applications across the cluster in the event of a failure.
|
|
Pro
|
Configuration works well for multiple failures.
|
|
Pro
|
Very well tuned to handling multiple applications or many instances of the same application running in the same cluster well.
|
|
Con
|
Can be difficult to plan capacity. There is no real guarantee that the load will be balanced across the cluster.
|
|
Con
|
Effect on performance of a node failure is not easy to predict.
|
|
Con
|
Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.
|
The Windows Server 2003 release of server clusters randomizes the failover target in the event of node failure. Each resource group that has an empty preferred owners list will be failed over to a random node in the cluster in the event that the node currently hosting it fails.
Customized control
There are some cases where specific nodes may be preferred for a given application instance.
A configuration that ties applications to nodes has pros and cons:
|
Pro
|
Administrator has full control over what happens when a failure occurs.
|
|
Pro
|
Capacity planning is easy, since failure scenarios are predictable.
|
|
Con
|
With many applications running in a cluster, defining a good policy for failures can be extremely complex.
|
|
Con
|
Very hard to plan for multiple, cascaded failures.
|
Server clusters provide full control over the order of failover using the preferred node list feature. The full semantics of the preferred node list can be defined as:
|
Preferred Node List
|
Move group to best possible initiated via administrator
|
Failover due to node or group failure
|
|
Contains all nodes in cluster
|
Group is moved to highest node in preferred node list that is up and running in the cluster.
|
Group is moved to the next node on the preferred node list.
|
|
Contains a subset of the nodes in the cluster
|
Group is moved to highest node in preferred node list that is up and running in the cluster.
If no nodes in the preferred node list are up and running, the group is moved to a random node.
|
Group is moved to the next node on the preferred node list.
If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.
|
|
Empty
|
Group is moved to a random node.
|
Group is moved to a random node.
|
1 This is really a misnomer, the applications have state, however, the state does not span individual client requests.
2 A heavy-weight application is one that consumes a significant number of system resources such as CPU, memory or IO bandwidth.