Cloud computing: Challenges in cloud configuration
The selection, configuration and performance of your cloud-based applications will have a massive impact on performance.
Adapted from “Cloud Computing: Theory and Practice” (Elsevier Science & Technology books)
Developing efficient cloud applications comes with many of the same challenges posed by the natural imbalances found in computing, I/O and communication bandwidths of physical systems. These challenges are greatly amplified by the scale of the system, its distributed nature and the fact that virtually all applications are data-intensive.
Though any cloud-computing infrastructure will ideally attempt to automatically distribute and balance processing loads, you’re still left with the responsibility of placing the data close to the processing site and identifying its optimal storage strategy. One of the main advantages of cloud computing—the shared infrastructure—can also be a drawback.
Performance isolation is nearly impossible to reach in a real system, especially when the system is heavily loaded. This is even more difficult with cloud computing. The performance of virtual machines (VMs) fluctuates based on the load, infrastructure services, environment and number of users. Security isolation is another challenging factor to identify on multi-tenant systems.
Reliability is also a major concern. You can expect node failures whenever a large number of nodes compete for compute resources. Choosing an optimal instance (in terms of performance isolation, reliability and security) from those offered by the cloud infrastructure is a critical factor. Cost considerations also play a role in the choice of the instance type.
The app experience
Many applications consist of multiple stages. Each stage may involve multiple instances running in parallel on the cloud systems and the communications among them. Thus, efficiency, consistency and communication scalability are major concerns for an application developer. Due to shared networks and unknown topology, cloud infrastructures exhibit internode latency and bandwidth fluctuations that often affect application performance.
Data storage plays a critical role in the performance of any data-intensive application. Organizing the storage, choosing storage location and managing storage bandwidth must all be carefully analyzed for optimal application performance. Clouds support many storage options, including off-instance cloud storage, mountable off-instance block storage and persistent storage for the instance lifetime.
Many data-intensive applications use metadata associated with individual data records. For example, the metadata for an MPEG audio file might include the name of the song, the singer, recording information and so on. Metadata should be stored for easy access and storage should be scalable and reliable.
Another important consideration for application performance is logging. It’s a delicate balance. Performance considerations limit the amount of data logging, whereas the ability to identify the source of unexpected results and errors is helped by frequent logging. Logging is typically done using instance storage preserved only for the lifetime of the instance. Thus, you should always take measures to preserve the logs for a postmortem analysis.
You can divide existing cloud applications into several broad categories: processing pipelines, batch-processing systems and Web applications. Processing pipelines are data-intensive and often compute-intensive applications. These represent a fairly large segment of applications currently running on the cloud. There are several types of data-processing applications:
- Indexing: The processing pipeline supports indexing large datasets created by Web crawler engines.
- Data mining: The processing pipeline supports searching large collections of records to locate items of interests.
- Image processing: A number of companies let you store images on the cloud, such as Flickr and Google. The image-processing pipelines support image conversion, compression and encryption.
- Video transcoding: The processing pipeline transcodes from one video format to another (for example, from AVI to MPEG).
- Document processing: The processing pipeline converts large collections of documents from one format to another (such as from Word to PDF) or encrypts the documents. It could also use optical character recognition (OCR) to produce digital images of documents.
Batch-processing systems also cover a broad spectrum of data-intensive applications in enterprise computing. Such applications typically have deadlines. Failure to meet these deadlines could have serious economic consequences. Security is also a critical aspect for many batch-processing applications. A non-exhaustive list of batch-processing applications includes:
- Generating daily, weekly, monthly, and annual activity reports for organizations in retail, manufacturing, and other economic sectors.
- Processing, aggregating, and summarizing daily transactions for financial institutions, insurance companies, and health-care organizations.
- Inventory management for large corporations.
- Billing- and payroll-record processing
- Software development management (such as nightly updates of software repositories).
- Automatic testing and verification of software and hardware systems.
Finally, and of increasing importance, are cloud applications for Web access. Several categories of Web sites have a periodic or a temporary presence, such as the Web sites for conferences or other events. There are also Web sites that are active during a particular season such as the holidays. They might also support a particular type of activity, such as income tax reporting with the April 15 deadline each year. Other limited-time Web sites used for promotional activities “sleep” during the night and auto-scale during the day.
It makes economic sense to store the data in the cloud close to where the application runs. The cost per GB is low and processing is more efficient when the data is stored close to the servers. This could lead to several new classes of cloud-computing applications in the years to come. For example, there could be batch processing for decision support systems and other aspects of business analytics.
Another class of new applications could be parallel batch processing based on programming abstractions. Mobile interactive applications that process large volumes of data from different types of sensors and services that combine more than one data source are obvious candidates for cloud computing.
Science and engineering could greatly benefit from cloud computing because many applications in these areas are compute- and data-intensive. Similarly, a cloud dedicated to education would be extremely useful. Mathematical software such as MATLAB and Mathematica could also run on the cloud.
Application development, selection, configuration and performance tuning all become essential activities when balancing a cloud-computing environment. You’ll have many applications running in many different ways, and that application stack will need some monitoring and maintenance.
Dan C. Marinescu was a professor of computer science at Purdue University from 1984 to 2001. Then he joined the Computer Science Department at the University of Central Florida. He has held visiting faculty positions at the IBM T. J. Watson Research Center, the Institute of Information Sciences in Beijing, the Scalable Systems Division of Intel Corp., Deutsche Telecom AG and INRIA Rocquencourt in France. His research interests cover parallel and distributed systems, cloud computing, scientific computing, quantum computing and quantum information theory.
For more on this and other Elsevier titles, check out Elsevier Science & Technology books.