Step 5: Applying Fault Tolerance

Article
02/25/2008

Published: November 12, 2007 | Updated: February 25, 2008

Application fault tolerance requirements place specific technical requirements on the virtualization host server, storage, and network infrastructure. In this step, the most appropriate fault tolerance approach for each application that will be virtualized will be selected. The technical approach can vary based on the details of the underlying operating system and applications that are running in the virtual environment. Some workloads (such as Web servers, database servers, and messaging servers) have their own methods of implementing fault tolerance. For example, a Web server can store session state information in a shared memory space or in a database, so the services can automatically fail-over to another node without causing a disruption in service. Cluster-aware applications can rely on operating system functionality such as Microsoft Cluster Services to provide automatic fail-over. For applications that do not provide their own fault tolerance methods, it is possible to use virtualization fault tolerance options.

In this step, map the requirements identified in step 3 to specific options for implementing high-availability virtual systems.

Option 1: Network Load Balancing

Stateless applications such as Web servers can have fault tolerance support by establishing network load balancing across multiple identical instances of the application. Network load balancing technology distributes the inbound traffic headed for the application across multiple machines running the same application, which allows for one server to fail and the remaining servers to pick up the load. Windows Server has a software implementation of network load balancing built in.

A hardware network load balancing solution can distribute requests based on a variety of load-distribution algorithms. It can also monitor various nodes in the server farm and ensure that they are operating properly before sending requests to them.

This option requires that at least one additional VM be added for each application using network load balancing.

Option 2: Application-Specific Clustering

Many enterprise applications that customers consider mission critical have fail over capabilities built into them through cluster awareness. These applications were designed and built to run on an MSCS cluster. Examples include SQL Server and Exchange Server. An MSCS cluster can be configured by using multiple VMs that have a common shared disk.

This option requires that at least one additional VM be added for each VM that is being clustered.

Option 3: Host Clustering

A significant number of applications cannot effectively use network load balancing and were never designed to be cluster aware. However, one additional option can help mitigate the exposure of a failure of systems running these applications.

The Virtual Server 2005 host system itself can be configured in an MSCS cluster. In this configuration, if the host server running the VMs fails, the Virtual Server 2005 application and all its VMs fail over to another node in the MSCS cluster.

The cluster would then attempt to restart each VM on the new node of the cluster. Note that because none of the applications inside each VM are cluster aware, there is no guarantee that the application will restart in the correct manner.

Evaluating the Characteristics

The following tables compare the characteristics of the options.

Complexity	Justification
Network load balancing	Can be implemented independent of the application technology (assuming that workloads support this approach).	M
Application-specific clustering	Requires expertise in several high-availability approaches and procedures.	H
Host clustering	Uses a standard approach for protecting against host failures but requires cluster configuration.	H

Cost	Justification
Network load balancing	Can be implemented in software or commodity hardware.	M
Application-specific clustering	Shared storage and configuration requirements increase cost.	H
Host clustering	Protects against VM and host failures.	H

Fault Tolerance	Justification
Network load balancing	If appropriate for the application, provides a highly scalable and resilient method of ensuring reliability.	↑
Application-specific clustering	If available for the application, provides a highly resilient method of ensuring reliability.	↑
Host clustering	Protects against VM and host failures.	→

Performance	Justification
Network load balancing	Delivers a high performance solution through load balancing.	↑
Application-specific clustering	Clustering does not significantly affect performance.	→
Host clustering	Clustering does not significantly affect performance.	→

Scalability	Justification
Network load balancing	Can be scaled out to the largest implementations.	↑
Application-specific clustering	Can be scaled up, but at additional cost.	→
Host clustering	Can be scaled up, but at additional cost.	→

Validating with the Business

Because numerous technical considerations are involved in each fault tolerance approach, ensure that technical decisions meet business requirements. Specific questions to ask include:

Are all critical areas of the application infrastructure protected? It is easy to focus on protecting applications by themselves. However, fault tolerance requires a focus on areas such as the power infrastructure, the network, and storage devices. Applications might have dependencies on a wide array of services, all of which must remain available to support mission-critical activities.

Decision Summary

The process of determining the best fault tolerance approach for specific applications involves many considerations. For applications that support these approaches, application-level and network-level clustering offer simplified implementation and management.

Additional Reading

The following white papers and articles discuss clustering options for VMs and Virtual Server 2005:
- “An Overview of Windows Clustering Technologies: Server Clusters and Network Load Balancing” at https://technet2.microsoft.com/windowsserver/en/library/c35dd48b-4fbc-4eee-8e5c-2a9a35cf63b21033.mspx?mfr=true
- “Server Clusters: Cluster Configuration Best Practices for Windows Server 2003” at https://technet2.microsoft.com/windowsserver/en/library/5172c43a-2e6d-4d94-bd44-163a8735ef921033.mspx?mfr=true
- “Clustering virtual machines” at https://technet2.microsoft.com/windowsserver/en/library/73b03235-bad1-4ca8-939f-c507d00e273f1033.mspx?mfr=true
The Microsoft TechNet article, “NLB Design Process,” at https://technet2.microsoft.com/windowsserver/en/library/251c6d81-b2c7-43eb-892c-2488a57ec9a81033.mspx?mfr=true provides information about implementing Network Load Balancing Service (NLBS) on Windows Server 2003.

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.