Resource management for an inherently complex system such as cloud computing requires different ways of measuring and allocating resources.
Adapted from “Cloud Computing: Theory and Practice” (Elsevier Science & Technology books)
Resource management is a core function required of any man-made system. It affects the three basic criteria for system evaluation: performance, functionality and cost. Inefficient resource management has a direct negative effect on performance and cost. It can also indirectly affect system functionality. Some functions the system provides might become too expensive or ineffective due to poor performance.
A cloud computing infrastructure is a complex system with a large number of shared resources. These are subject to unpredictable requests and can be affected by external events beyond your control. Cloud resource management requires complex policies and decisions for multi-objective optimization. It is extremely challenging because of the complexity of the system, which makes it impossible to have accurate global state information. It is also subject to incessant and unpredictable interactions with the environment.
The strategies for cloud resource management associated with the three cloud delivery models, Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS), differ from one another. In all cases, the cloud services providers are faced with large, fluctuating loads that challenge the claim of cloud elasticity. In some cases, when they can predict a spike can be predicted, they can provision resources in advance. For example, seasonal Web services may be subject to spikes.
For an unplanned spike, the situation is slightly more complicated. You can use Auto Scaling for unplanned spike loads, provided there’s a pool of resources you can release or allocate on demand and a monitoring system that lets you decide in real time to reallocate resources. Auto Scaling is supported by PaaS services such as Google App Engine. Auto Scaling for IaaS is complicated due to the lack of standards.
In the cloud, where changes are frequent and unpredictable, centralized control is unlikely to provide continuous service and performance guarantees. Indeed, centralized control can’t provide adequate solutions to the host of cloud management policies you have to enforce.
Autonomic policies are of great interest due to the scale of the system, the large number of service requests, the large user population and the unpredictability of the load. The ratio of the mean to the peak resource needs can be large.
A policy typically refers to the principal guiding decisions, whereas mechanisms represent the means to implement policies. Separating policies from mechanisms is a guiding principle in computer science. Butler Lampson and Per Brinch Hansen offer solid arguments for this separation in the context of OS design.
You can loosely group cloud resource management policies into five classes:
The explicit goal of an admission control policy is to prevent the system from accepting workloads in violation of high-level system policies. For example, a system may not accept an additional workload that would prevent it from completing work already in progress or contracted. Limiting the workload requires some knowledge of the global system state. In a dynamic system, this information is often obsolete at best.
Capacity allocation means allocating resources for individual instances. An instance is a service activation. Locating resources that are subject to multiple global optimization constraints requires you to a search a large space when the state of individual systems is changing so rapidly.
You can perform load balancing and energy optimization locally, but global load-balancing and energy-optimization policies encounter the same difficulties as the ones already discussed. Load balancing and energy optimization are correlated and affect the cost of providing the services.
The common meaning of the term load balancing is that of evenly distributing the load to a set of servers. For example, consider the case of four identical servers, A, B, C and D. Their relative loads are 80 percent, 60 percent, 40 percent and 20 percent, respectively, of their capacity. Perfect load balancing would result in all servers working with the same load—50 percent of each server’s capacity.
In cloud computing, a critical goal is minimizing the cost of providing the service. In particular, this also means minimizing energy consumption. This leads to a different meaning of the term load balancing. Instead of having the load evenly distributed among all servers, we want to concentrate it and use the smallest number of servers while switching the others to standby mode, a state in which a server uses less energy. In our example, the load from D will migrate to A and the load from C will migrate to B. Thus, A and B will be loaded at full capacity, whereas C and D will be switched to standby mode.
Quality of service is that aspect of resource management that’s probably the most difficult to address and, at the same time, possibly the most critical to the future of cloud computing. Resource management strategies often jointly target performance and power consumption.
Dynamic voltage and frequency scaling (DVFS) techniques such as Intel SpeedStep and AMD PowerNow lower the voltage and the frequency to decrease power consumption. Motivated initially by the need to save power for mobile devices, these techniques have migrated to virtually all processors, including those used in high-performance servers. As a result of lower voltages and frequencies, the processor performance decreases. However, it does so at a substantially slower rate than the energy consumption.
Virtually all optimal or near-optimal mechanisms to address the five policy classes don’t scale up. They typically target a single aspect of resource management, such as admission control, but ignore energy conservation. Many require complex computations that can’t be done effectively in the time available to respond. Performance models are complex, analytical solutions are intractable, and the monitoring systems used to gather state information for these models can be too intrusive and unable to provide accurate data.
Therefore, many techniques are concentrated on system performance in terms of throughput and time in system. They rarely include energy tradeoffs or QoS guarantees. Some techniques are based on unrealistic assumptions. For example, capacity allocation is viewed as an optimization problem, but under the assumption that servers are protected from overload.
Allocation techniques in computer clouds must be based on a disciplined approach, rather than ad hoc methods. The four basic mechanisms for implementing resource management policies are:
A distinction should be made between interactive and non-interactive workloads. The management techniques for interactive workloads (Web services, for example) involve flow control and dynamic application placement, whereas those for non-interactive workloads are focused on scheduling.
A fair amount of work reported in the literature is devoted to resource management of interactive workloads—some to non-interactive and only a few to heterogeneous workloads, a combination of the two. Planning ahead for how you are going to manage these will help ensure a smooth transition to working with the cloud.
Dan C. Marinescu was a professor of computer science at Purdue University from 1984 to 2001. Then he joined the Computer Science Department at the University of Central Florida. He has held visiting faculty positions at the IBM T. J. Watson Research Center, the Institute of Information Sciences in Beijing, the Scalable Systems Division of Intel Corp., Deutsche Telecom AG and INRIA Rocquancourt in France. His research interests cover parallel and distributed systems, cloud computing, scientific computing, quantum computing, and quantum information theory.
For more on this and other Elsevier titles, check out Elsevier Science & Technology books.