Step 5: Applying Fault Tolerance
Published: November 12, 2007 | Updated: February 25, 2008
Application fault tolerance requirements place specific technical requirements on the virtualization host server, storage, and network infrastructure. In this step, the most appropriate fault tolerance approach for each application that will be virtualized will be selected. The technical approach can vary based on the details of the underlying operating system and applications that are running in the virtual environment. Some workloads (such as Web servers, database servers, and messaging servers) have their own methods of implementing fault tolerance. For example, a Web server can store session state information in a shared memory space or in a database, so the services can automatically fail-over to another node without causing a disruption in service. Cluster-aware applications can rely on operating system functionality such as Microsoft Cluster Services to provide automatic fail-over. For applications that do not provide their own fault tolerance methods, it is possible to use virtualization fault tolerance options. In this step, map the requirements identified in step 3 to specific options for implementing high-availability virtual systems. Option 1: Network Load BalancingStateless applications such as Web servers can have fault tolerance support by establishing network load balancing across multiple identical instances of the application. Network load balancing technology distributes the inbound traffic headed for the application across multiple machines running the same application, which allows for one server to fail and the remaining servers to pick up the load. Windows Server has a software implementation of network load balancing built in. A hardware network load balancing solution can distribute requests based on a variety of load-distribution algorithms. It can also monitor various nodes in the server farm and ensure that they are operating properly before sending requests to them. This option requires that at least one additional VM be added for each application using network load balancing. Option 2: Application-Specific ClusteringMany enterprise applications that customers consider mission critical have fail over capabilities built into them through cluster awareness. These applications were designed and built to run on an MSCS cluster. Examples include SQL Server and Exchange Server. An MSCS cluster can be configured by using multiple VMs that have a common shared disk. This option requires that at least one additional VM be added for each VM that is being clustered. Option 3: Host ClusteringA significant number of applications cannot effectively use network load balancing and were never designed to be cluster aware. However, one additional option can help mitigate the exposure of a failure of systems running these applications. The Virtual Server 2005 host system itself can be configured in an MSCS cluster. In this configuration, if the host server running the VMs fails, the Virtual Server 2005 application and all its VMs fail over to another node in the MSCS cluster. The cluster would then attempt to restart each VM on the new node of the cluster. Note that because none of the applications inside each VM are cluster aware, there is no guarantee that the application will restart in the correct manner. Evaluating the CharacteristicsThe following tables compare the characteristics of the options.
Validating with the BusinessBecause numerous technical considerations are involved in each fault tolerance approach, ensure that technical decisions meet business requirements. Specific questions to ask include:
Decision SummaryThe process of determining the best fault tolerance approach for specific applications involves many considerations. For applications that support these approaches, application-level and network-level clustering offer simplified implementation and management. Additional Reading
|
|