Step 5: Applying Fault Tolerance

Published: November 12, 2007   |   Updated: February 25, 2008

 

Application fault tolerance requirements place specific technical requirements on the virtualization host server, storage, and network infrastructure. In this step, the most appropriate fault tolerance approach for each application that will be virtualized will be selected. The technical approach can vary based on the details of the underlying operating system and applications that are running in the virtual environment. Some workloads (such as Web servers, database servers, and messaging servers) have their own methods of implementing fault tolerance. For example, a Web server can store session state information in a shared memory space or in a database, so the services can automatically fail-over to another node without causing a disruption in service. Cluster-aware applications can rely on operating system functionality such as Microsoft Cluster Services to provide automatic fail-over. For applications that do not provide their own fault tolerance methods, it is possible to use virtualization fault tolerance options.

In this step, map the requirements identified in step 3 to specific options for implementing high-availability virtual systems.

Option 1: Network Load Balancing

Stateless applications such as Web servers can have fault tolerance support by establishing network load balancing across multiple identical instances of the application. Network load balancing technology distributes the inbound traffic headed for the application across multiple machines running the same application, which allows for one server to fail and the remaining servers to pick up the load. Windows Server has a software implementation of network load balancing built in.

A hardware network load balancing solution can distribute requests based on a variety of load-distribution algorithms. It can also monitor various nodes in the server farm and ensure that they are operating properly before sending requests to them.

This option requires that at least one additional VM be added for each application using network load balancing.

Option 2: Application-Specific Clustering

Many enterprise applications that customers consider mission critical have fail over capabilities built into them through cluster awareness. These applications were designed and built to run on an MSCS cluster. Examples include SQL Server and Exchange Server. An MSCS cluster can be configured by using multiple VMs that have a common shared disk.

This option requires that at least one additional VM be added for each VM that is being clustered.

Option 3: Host Clustering

A significant number of applications cannot effectively use network load balancing and were never designed to be cluster aware. However, one additional option can help mitigate the exposure of a failure of systems running these applications.

The Virtual Server 2005 host system itself can be configured in an MSCS cluster. In this configuration, if the host server running the VMs fails, the Virtual Server 2005 application and all its VMs fail over to another node in the MSCS cluster.

The cluster would then attempt to restart each VM on the new node of the cluster. Note that because none of the applications inside each VM are cluster aware, there is no guarantee that the application will restart in the correct manner.

Evaluating the Characteristics

The following tables compare the characteristics of the options.

Complexity

Justification

Network load balancing

Can be implemented independent of the application technology (assuming that workloads support this approach).

M

Application-specific clustering

Requires expertise in several high-availability approaches and procedures.

H

Host clustering

Uses a standard approach for protecting against host failures but requires cluster configuration.

H

 

Cost

Justification

Network load balancing

Can be implemented in software or commodity hardware.

M

Application-specific clustering

Shared storage and configuration requirements increase cost.

H

Host clustering

Protects against VM and host failures.

H

 

Fault Tolerance

Justification

Network load balancing

If appropriate for the application, provides a highly scalable and resilient method of ensuring reliability.

Application-specific clustering

If available for the application, provides a highly resilient method of ensuring reliability.

Host clustering

Protects against VM and host failures.

 

Performance

Justification

Network load balancing

Delivers a high performance solution through load balancing.

Application-specific clustering

Clustering does not significantly affect performance.

Host clustering

Clustering does not significantly affect performance.

 

Scalability

Justification

Network load balancing

Can be scaled out to the largest implementations.

Application-specific clustering

Can be scaled up, but at additional cost.

Host clustering

Can be scaled up, but at additional cost.

Validating with the Business

Because numerous technical considerations are involved in each fault tolerance approach, ensure that technical decisions meet business requirements. Specific questions to ask include:

  • Are all critical areas of the application infrastructure protected? It is easy to focus on protecting applications by themselves. However, fault tolerance requires a focus on areas such as the power infrastructure, the network, and storage devices. Applications might have dependencies on a wide array of services, all of which must remain available to support mission-critical activities.

Decision Summary

The process of determining the best fault tolerance approach for specific applications involves many considerations. For applications that support these approaches, application-level and network-level clustering offer simplified implementation and management.

Additional Reading

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.

Download

Get the IPD Windows Server Virtualization guide

Solution Accelerators Notifications

Sign up to learn about updates and new releases

Feedback

Send us your comments or suggestions