Cloud Computing: Virtual Clusters

Virtual clusters convey certain benefits over physical clusters in terms of speed, storage and flexibility.

Kai Hwang, Jack Dongarra and Geoffrey Fox

Adapted from “Distributed and Cloud Computing: From Parallel Processing to the Internet of Things” (Syngress, an imprint of Elsevier)

There are several differences and similarities between physical and virtual clusters, and different benefits conveyed by each. A physical cluster is a collection of servers (physical machines) connected by a physical network such as a LAN. Virtual clusters have different properties and potential applications. There are three critical design issues of virtual clusters: live migration of virtual machines (VMs), memory and file migrations, and dynamic deployment of virtual clusters.

When you initialize a traditional VM, you need to manually write configuration information or specify the configuration sources. When more VMs join a network, an inefficient configuration always causes problems with overloading or underutilization. Amazon Elastic Compute Cloud (EC2) is a good example of a Web service that provides elastic computing power in a cloud. EC2 lets customers create VMs and manage user accounts over the time of their use.

Most virtualization platforms support a bridging mode that lets all domains appear on the network as individual hosts. Using this mode, VMs can communicate with one another freely through the virtual NIC and configure the network automatically.

Physical vs. Virtual Clusters

Virtual clusters are built with VMs installed at distributed servers from one or more physical clusters. The VMs in a virtual cluster are logically connected by a virtual network across several physical networks. Each virtual cluster is formed with physical machines or a VM hosted by multiple physical clusters. The virtual cluster boundaries are distinct boundaries.

Dynamically provisioning VMs to a virtual cluster presents the following properties:

  • Virtual cluster nodes can be either physical machines or VMs, and of course you can have multiple VMs running different OSes on the same physical node.
  • A VM runs with a guest OS, which is often different from the host OS that manages the resources in the physical machine upon which the VM is running.
  • The purpose of using VMs is to consolidate multiple functionalities on the same server, which greatly enhances server utilization and application flexibility.
  • You can have VMs replicated in multiple servers for the purpose of promoting distributed parallelism, fault tolerance and disaster recovery.
  • The number of nodes within a virtual cluster can grow or shrink dynamically, similar to the way an overlay network varies in size in a peer-to-peer network.
  • The failure of any physical nodes may disable some VMs installed on the failing nodes, but VM failure won’t pull down the host system.

You need to consider effective strategies for managing VMs running on a mass of physical computing nodes (also called virtual clusters). This involves virtual cluster deployment and monitoring and managing large-scale clusters, as well as resource scheduling, load balancing, server consolidation, fault tolerance and other tactics. In a virtual cluster system, you will also have a large number of VM images, and it’s essential to determine how to store those images efficiently.

There are common installations for most users or applications, such as OSes or user-level programming libraries. You can preinstall these as templates (called template VMs). With these templates, users can build their own software stacks. They can also copy new OS instances from the template VM. You can have user-specific components such as programming libraries and applications installed to those instances.

You can install each VM on a remote server or replicate VMs on multiple servers belonging to the same or different physical clusters. The boundary of a virtual cluster can change as VM nodes are added, removed or migrated dynamically over time.

Fast Deployment and Effective Scheduling

Your virtual cluster system should have the capability for fast deployment. In this case, deployment means being able to construct and distribute software stacks (including OSes, libraries and applications) to a physical node within clusters as fast as possible. It also means the ability to quickly switch runtime environments from one user’s virtual cluster to another user’s virtual cluster. If one user finishes using his system, the corresponding virtual cluster should shut down or suspend quickly to save the resources to run other VMs for other users.

The concept of “green computing” has attracted much attention recently. However, previous approaches have focused on saving the energy cost of components in a single workstation without a global vision. Consequently, they don’t necessarily reduce the power consumption of the whole cluster. Virtual clusters can go a long way toward accomplishing this.

Other cluster-wide energy-efficient techniques are only applicable to homogeneous workstations and specific applications. The live migration of VMs lets you transfer workloads from one node to another node. However, it doesn’t guarantee that you can randomly migrate VMs among themselves. In fact, you can’t ignore the potential overhead caused by live migrations of VMs.

The overhead may have serious negative effects on cluster utilization, throughput and quality of service issues. The challenge then is to determine how to design migration strategies to implement green computing without influencing the performance of clusters.

Another advantage of virtualization is load balancing applications in a virtual cluster. You can achieve load balancing using the load index and frequency of user logins. The automatic scale-up and scale-down mechanism of a virtual cluster can be implemented based on this model.

Consequently, you can increase node resource utilization and shorten system response time. Mapping VMs onto the most appropriate physical node should also promote performance. Dynamically adjusting loads among nodes by live VM migration is a desirable approach when the loads on cluster nodes become unbalanced.

High-Performance Virtual Storage

You can distribute the template VM to several physical hosts within the cluster to customize other VMs. In addition, pre-designed software packages reduce the time required for customization and switching between virtual environments. It’s important to efficiently manage the disk spaces.

You can apply some storage architecture design to reduce duplicated blocks within a distributed file system of virtual clusters. Use hash values to compare the contents of data blocks. Users have their own profiles that store data block identification for corresponding VMs within a user-specific virtual cluster. When users modify the corresponding data, it creates new data blocks. Newly created blocks are then identified in the users’ profiles.

Basically, there are four steps to deploy a group of VMs onto a target cluster:

  1. Prepare the disk image
  2. Configure the VMs
  3. Choose the destination nodes
  4. Execute the VM deployment command on every host

Many systems use templates to simplify the disk image preparation process. A template is a disk image that includes a preinstalled OS with or without certain application software.

Users choose a proper template according to their requirements and make a backup to use as their own disk image. Templates could implement the Copy on Write (COW) format. A new COW backup file is small and easy to create and transfer. Therefore, it definitely reduces disk space consumption.

VM deployment time is much shorter than copying the whole raw image file. Every VM is configured with a name, disk image, network setting, and allocated CPU and memory. You simply need to record each VM configuration into a file.

This method can be inefficient, however, when managing a large group of VMs. VMs with the same configurations could use pre-edited profiles to simplify the process. In this scenario, the system configures the VMs according to the chosen profile.

Most configuration items use the same settings. Some of them—such as universally unique identifier, or UUID, VM name and IP address—are assigned with automatically calculated values. Normally, users don’t care which host is running their VM.

You’ll need a strategy to choose the proper destination host for any VM. The principle of your deployment strategy is to fulfill the VM requirement and to balance workloads among the whole host network.

Kai Hwang

Kai Hwang is a professor of computer engineering for the University of Southern California and a visiting Chair Professor for Tsinghua University, China. He earned a Ph.D. in EECS from the University of California, Berkeley. He has published extensively in computer architecture, digital arithmetic, parallel processing, distributed systems, Internet security and cloud computing.

Jack Dongarra

Jack Dongarra is a University Distinguished Professor of Electrical Engineering and Computer Science for the University of Tennessee, a Distinguished Research Staff at Oak Ridge National Laboratory and a Turning Fellow at the University of Manchester. Dongarra pioneered the areas of supercomputer benchmarks, numerical analysis, linear algebra solvers and high-performance computing, and has published extensively in these areas.

Geoffrey Fox

Geoffrey Fox is a Distinguished Professor of Informatics, Computing and Physics and Associate Dean of Graduate Studies and Research in the School of Informatics and Computing at Indiana University. He received his Ph.D. from Cambridge University, U.K. Fox is well-known for his comprehensive work and extensive publications in parallel architecture, distributed programming, grid computing, Web services and Internet applications.

©2011 Elsevier Inc. All rights reserved. Printed with permission from Syngress, an imprint of Elsevier. Copyright 2011. “Distributed and Cloud Computing: From Parallel Processing to the Internet of Things” by Kai Hwang, Jack Dongarra and Geoffrey Fox. For more information on this title and other similar books, please visit elsevierdirect.com.