Server cluster Concepts (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows Server 2003)

Article
10/08/2009

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Q. What hardware do you need to build a Server cluster?

A. The most important criteria for Server cluster hardware is that it be included in a validated Cluster configuration on the Microsoft Hardware Compatibility List (HCL), indicating it has passed the Microsoft Cluster Hardware Compatibility Test. All qualified solutions appear on the Microsoft HCL (https://go.microsoft.com/fwlink/?linkid=67738). Only cluster solutions listed on the HCL are supported by Microsoft.

In general, the criteria for building a server cluster include the following:

Servers: Two or more PCI-based machines running one of the operating system releases that support Server clusters (see below). Server clusters can run on all hardware architectures supported by the base Windows operating system, however, you cannot mix 32-bit and 64-bit architectures in the same cluster.
Storage: Each server needs to be attached to a shared, external storage bus(es) that is/are separate from the bus containing the system disk, the startup disk or the pagefile disk. Applications and data are stored on one or more disks attached to this bus. There must be enough storage capacity on the shared cluster bus(es) for all of the applications running in the cluster environment. This shared storage configuration allows applications to failover between servers in the cluster.

Microsoft recommends hardware Redundant Array of Inexpensive Disks (RAID) for all cluster disks to eliminate disk drives as a potential single point of failure. This means using a RAID storage unit, a host-based RAID adapter that implements RAID across disks, etc.

SCSI is supported for 2-node cluster configurations only. Fibre channel arbitrated loop is supported for 2-node clusters only. Microsoft recommends using fibre channel switched fabrics for clusters of more than two nodes.
Network: Each server needs at least two network cards. Typically, one is the public network and the other is a private network between the two nodes. A static IP address is needed for each group of applications that move as a unit between nodes. Server clusters can project the identity of multiple servers from a single cluster by using multiple IP addresses and computer names: this is known as a virtual server.

Q. What is a cluster resource?

A. A cluster resource is the lowest level unit of management in a Server cluster. A resource represents a physical object or an instance of running code. For example, a physical disk, an IP address, an MSMQ queue, a COM object all of these things are considered to be resources. From a management perspective, resources can be independently started and stopped and each is monitored to ensure that it is healthy.

Server cluster can monitor any arbitrary resource type. This is possible because Server clusters define a resource plug-in model. Each resource type has an associated resource plug-in or resource dll that is used to start, stop and provide health information that is specific to the resource type. For example, starting and stopping SQL Server is different from starting and stopping a physical disk. The resource dll takes care of the differences. Application developers and system administrators can build new resource dlls for their applications that can be registered with the cluster service.

Server clusters provides some generic plug-ins that can be used to make existing applications cluster-aware very quickly, known as Generic Service and Generic Application. With Windows Server 2003, a Generic Script resource plug-in was added that allows the resource dll to be written in any scripting language supported by the Windows operating system.

Q. What is a resource dependency?

A. A complete application actually consists of multiple pieces or multiple resources, some pieces are code and others are physical resources required by the application. The resources are related in different ways; for example, an application that writes to a disk cannot come online until the disk is online. If the disk fails, then, by definition, the application cannot continue to run since it writes to the disk. Resource dependencies can be defined by the application developer or system administrator to capture these relationships. Resource dependencies define the order that resources are brought online and control how failures are propagated to the various pieces of the application.

Q. What is a resource group?

A. A resource group is a collection of one or more resources that are managed and monitored as a single unit. A resource group can be started or stopped. If a resource group is started, each resource in the group is started (taking into account any start order defined by the dependencies between resources in the group). If a resource group is stopped, all of the resources in the group are stopped. Dependencies between resources cannot span a group. In other words, the set of resources within a group is an autonomous unit that can be started and stopped independently from any other group. A group is a single, indivisible unit that is hosted on one server in a Server cluster at any point in time and it is the unit of failover.

Q. Can I have dependencies between resources in different groups?

A. No, resource dependencies are confined to a single group.

Q. What is a virtual server?

A. A virtual server is a resource group that contains an IP address resource and a network name resource. When an application is hosted in a virtual server, the application can be accessed by clients using the IP address or network name in that resource group. As the resource group fails over across the cluster, the IP address and network name remain the same, therefore the client becomes unaware of the physical location of the application and will continue to work in the event of a failure of one of the servers in the cluster.

Q. How can I take advantage of extensibility features of ISA Server?

A. A number of third-party vendors offer solutions such as virus detection, content filtering, site categorization, reporting, and administration. Customers and developers also have the ability to create their own extensions to ISA Server. ISA Server includes a comprehensive software development kit for developing tools that build on ISA Server firewall, caching, and management features.

Q. What is failover?

A. Server clusters monitor the health of the nodes in the cluster and the resources in the cluster. In the event of a server failure, the cluster software re-starts the failed server's workload on one or more of the remaining servers. If an individual resource or application fails (but the server does not), Server clusters will typically try to re-start the application on the same server; if that fails, it moves the application's resources and re-starts it on the other server. The process of detecting failures and restarting the application on another server in the cluster is known as failover.

The cluster administrator can set various recovery policies such as whether or not to re-start an application on the same server, and whether or not to automatically "failback" (re-balance) workloads when a failed server comes back online.

Q. Is failover transparent to users?

A. Server clusters do not require any special software on client computers, so the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection can be made transparent, because the Server clusters software has restarted the applications, file shares, etc. at exactly the same IP address.

If a client is using "state-less" connections such as a standard browser connection, then the client would be unaware of a failover if it occurred between server requests. If a failure occurs while a client is connected to the failed resource, then the client will receive whatever standard notification is provided by the client side of their application when the server side becomes unavailable. This might be, for example, the standard "Abort, Retry, or Cancel?" prompt you get when using the Windows Explorer to download a file at the time a server or network goes down. In this case, client reconnection is not automatic (the user must choose "Retry"), but the user is fully informed of what is happening and has a simple, well-understood method of re-establishing contact with the server. Of course, in the meantime, the cluster service is busily re-starting the service or application so that, when the user chooses "Retry", it re-appears as if it never went away.

Q. What is failback?

A. In the event of the failure of a server in a cluster, the applications and resources are failed over to another node in the cluster. When the failed node rejoins the cluster (after reboot for example), that node now is free to be used by applications. A cluster administrator can set policies on resources and resource groups that allow an application to automatically move back to a node if it becomes available, thus automatically taking advantage of a node rejoining the cluster. These policies are known as failback policies. You should take care when defining automatic failback policies since depending on the application, automatically moving the application (which was working just fine) may have undesirable consequences on the clients using the applications.

Q. When an application restarts after failover, does it restore the application state at the time of failure?

A. No, Server clusters provide a fast crash restart mechanism. When an application is failed over and restarted, the application is restarted from scratch. Any persistent data written out to a database or to files is available to the application, but any in-memory state that the application had before the failover is lost.

Q. At what level does failover exist?

A. At the resource group level.

Q. What is a Quorum Resource and how does it help Server clusters provide high availability?

A. Server clusters require a quorum resource to function. The quorum resource, like any other resource, is a resource which can only be owned by one server at a time, and for which servers can negotiate for ownership. Negotiating for the quorum resource allows Server clusters to avoid "split-brain" situations where the servers are active and think the other servers are down. This can happen when, for example, the cluster interconnect is lost and network response time is problematic. The quorum resource is used to store the definitive copy of the cluster configuration so that regardless of any sequence of failures, the cluster configuration will always remain consistent.

Q. What is active/active verses active/passive?

A. Active/Active and Active/Passive are terms used to describe how applications are deployed in a cluster. Unfortunately, they mean different things to different people and so the terms tend to cause confusion.

From the perspective of a single application or database:

Active/Active means that the same application or pieces of the same service can be run concurrently on different nodes in the cluster. For example SQL Server 2000 can be configured such that the database is partitioned and each node can be running a single instance of the database. SQL Server provides the notion of views to provide a single image of the entire database.
Active/Passive means that only one node in the cluster can be hosting the given application. For example, a single file share is active/passive. Any given file share can only be hosted on one node at a time.

From the perspective of a set of instances of an application or service:

Active/Active means that different instances of the same application can be running concurrently on different cluster nodes. For example, each node in a cluster can be running SQL Server against a different database. A single cluster can support many file shares that are hosted on the nodes in a cluster concurrently.
Active/Passive means that only one instance of a service can be running anywhere in the cluster. For example, there must only be a single instance of the DHCP service running in the cluster at any point in time.

From the perspective of the cluster:

Active/Active means that all nodes in the cluster are running applications. These may be multiple instances of the same application or different applications (for example, in a 2-node cluster, WINS may be running on one node and DHCP may be running on the other node).
Active/Passive means that one of the cluster nodes is spare and not being used to host applications.

Server clusters support all of these different combinations; the terms are really about how specific applications or sets of applications are deployed.

With the advent of more than two servers in a cluster, starting with Windows 2000 Datacenter, the term active/active is confusing because there may be four servers. When there are multiple servers, the set of options available for deployment becomes more flexible, allowing different configurations such as N+I.

Q. How do I benefit from more than two nodes in a cluster?

A. Failover is the mechanism that instance applications and the individual partitions of a partitioned application typically employ for high availability (the term Pack has been coined to describe a highly available, single instance application or partition).

In a 2-node cluster, defining failover policies is trivial. If one node fails, the only option is to failover to the remaining node. As the size of a cluster increases, different failover policies are possible and each one has different characteristics.

Failover Pairs

In a large cluster, failover policies can be defined such that each application is set to failover between two nodes. The simple example below shows two applications App1 and App2 in a 4-node cluster.

d3606a01-202b-4c98-af93-f8c852007d63

Figure 1: Failover pairs

Configuration has pros and cons:

Pro	Good for clusters that are supporting heavy-weight1 applications such as databases. This configuration ensures that in the event of failure, two applications will not be hosted on the same node.
Pro	Very easy to plan capacity. Each node is sized based on the application that it will need to host (just like a 2-node cluster hosting one application).
Pro	Effect of a node failure on availability and performance of the system is very easy to determine.
Pro	Get the flexibility of a larger cluster. In the event that a node is taken out for maintenance, the buddy for a given application can be changed dynamically (may end up with standby policy below).
Con	In simple configurations such as the one above, only 50% of the capacity of the cluster is in use.
Con	Administrator intervention may be required in the event of multiple failures.

1 A heavy-weight application is one that consumes a significant number of system resources such as CPU, memory or IO bandwidth.

Failover pairs are supported by server clusters on all versions of Windows by limiting the possible owner list for each resource to a given pair of nodes.

Hot-Standby Server

To reduce the overhead of failover pairs, the spare node for each pair may be consolidated into a single node, providing a hot standby server that is capable of picking up the work in the event of a failure.

5dcf9736-6f00-4737-bcd7-e872dab93657

Figure 2: Standby Server

Configuration has pros and cons:

Pro	Good for clusters that are supporting heavy-weight applications such as databases. This configuration ensures that in the event of a single failure, two applications will not be hosted on the same node.
Pro	Very easy to plan capacity. Each node is sized based on the application that it will need to host, the spare is sized to be the maximum of the other nodes.
Pro	Effect of a node failure on availability and performance of the system is very easy to determine.
Con	Configuration is targeted towards a single point of failure.
Con	Does not really handle multiple failures well. This may be an issue during scheduled maintenance where the spare may be in use.

Server clusters support standby servers today using a combination of the possible owners list and the preferred owners list. The preferred node should be set to the node that the application will run on by default and the possible owners for a given resource should be set to the preferred node and the spare node.

N+I

Standby server works well for 4-node clusters in some configurations, however, its ability to handle multiple failures is limited. N+I configurations are an extension of the standby server concept where there are N nodes hosting applications and I nodes spare.

f58ca427-f41a-48ee-a859-62d441c75912

Figure 3: N+I Spare node configuration

Configuration has pros and cons:

Pro	Good for clusters that are supporting heavy-weight applications such as databases or Exchange. This configuration ensures that in the event of a failure, an application instance will failover to a spare node, not one that is already in use.
Pro	Very easy to plan capacity. Each node is sized based on the application that it will need to host.
Pro	Effect of a node failure on availability and performance of the system is very easy to determine.
Pro	Configuration works well for multiple failures.
Con	Does not really handle multiple applications running in the same cluster well. This policy is best suited to applications running on a dedicated cluster.

Server cluster supports N+I scenarios in the Windows Server 2003 release using a cluster group public property AntiAffinityClassNames. This property can contain an arbitrary string of characters. In the event of a failover, if a group being failed over has a non-empty string in the AntiAffinityClassNames property, the failover manager will check all other nodes. If there are any nodes in the possible owners list for the resource that are NOT hosting a group with the same value in AntiAffinityClassNames, then those nodes are considered a good target for failover. If all nodes in the cluster are hosting groups that contain the same value in the AntiAffinityClassNames property, then the preferred node list is used to select a failover target.

Failover Ring

Failover rings allow each node in the cluster to run an application instance. In the event of a failure, the application on the failed node is moved to the next node in sequence.

b3d9ec24-a915-4373-a12a-8d066cd7e47a

Figure 4: Failover Ring

Configuration has pros and cons:

Pro	Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.
Pro	Effect on performance of a node failure is easy to predict.
Pro	Easy to plan capacity for a single failure.
Con	Configuration does not work well for all cases of multiple failures. If one Node 1 fails, Node 2 will host two application instances and Nodes 3 and 4 will host one application instance. If Node 2 then fails, Node 3 will be hosting three application instances and Node 4 will be hosting one instance
Con	Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.

Failover rings are supported by server clusters on the Windows Server 2003 release. This is done by defining the order of failover for a given group using the preferred owner list. A node order should be chosen and then the preferred node list should be set up with each group starting at a different node.

Random

In large clusters or even 4-node clusters that are running several applications, defining specific failover targets or policies for each application instance can be extremely cumbersome and error prone. The best policy in some cases is to allow the target to be chosen at random, with a statistical probability that this will spread the load around the cluster in the event of a failure.

Configuration has pros and cons:

Pro	Good for clusters that are supporting several small application instances where the capacity of any node is large enough to support several at the same time.
Pro	Does not require an administrator to decide where any given application should failover to.
Pro	Provided that there are sufficient applications or the applications are partitioned finely enough, this provides a good mechanism to statistically load balance the applications across the cluster in the event of a failure.
Pro	Configuration works well for multiple failures.
Pro	Very well tuned to handling multiple applications or many instances of the same application running in the same cluster well.
Con	Can be difficult to plan capacity. There is no real guarantee that the load will be balanced across the cluster.
Con	Effect on performance of a node failure is not easy to predict.
Con	Not well suited to heavy-weight applications since multiple instances may end up being hosted on the same node even if there are lightly-loaded nodes.

The Windows Server 2003 release of server clusters randomizes the failover target in the event of node failure. Each resource group that has an empty preferred owners list will be failed over to a random node in the cluster in the event that the node currently hosting it fails.

Customized control

There are some cases where specific nodes may be preferred for a given application instance.

Configuration has pros and cons:

Pro	Administrator has full control over what happens when a failure occurs.
Pro	Capacity planning is easy, since failure scenarios are predictable.
Con	With many applications running in a cluster, defining a good policy for failures can be extremely complex.
Con	Very hard to plan for multiple cascaded failures.

Server clusters provide full control over the order of failover using the preferred node list feature. The full semantics of the preferred node list can be defined as:

Preferred Node List	Move group to best possible initiated via administrator	Failover due to node or group failure
Contains all nodes in cluster	Group is moved to highest node in preferred node list that is up and running in the cluster.	Group is moved to the next node on the preferred node list.
Contains a subset of the nodes in the cluster	Group is moved to highest node in preferred node list that is up and running in the cluster. If no nodes in the preferred node list are up and running, the group is moved to a random node.	Group is moved to the next node on the preferred node list. If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.
Empty	Group is moved to a random node.	Group is moved to a random node.

Contains all nodes in cluster

Group is moved to highest node in preferred node list that is up and running in the cluster.

Group is moved to the next node on the preferred node list.

Contains a subset of the nodes in the cluster

Group is moved to highest node in preferred node list that is up and running in the cluster.

If no nodes in the preferred node list are up and running, the group is moved to a random node.

Group is moved to the next node on the preferred node list.

If the node that was hosting the group is the last on the list or was not in the preferred node list, the group is moved to a random node.

Empty

Group is moved to a random node.

Q. How many resources can be hosted in a cluster?

A. The theoretical limit for the number of resources in a cluster is 1,674; however, you should be aware that the cluster service periodically polls the resources to ensure they are alive. As the number of resources increases, the overhead of this polling also increases.

Server cluster Concepts (Server Clusters: Frequently Asked Questions for Windows 2000 and Windows Server 2003)

Additional resources