How Microsoft IT Manages a Private Cloud for Microsoft Research & Development
Technical Case Study
Published: June 2012
Microsoft IT implemented their first Private Cloud for hosting the development and test environments internally for Microsoft. Their Private Cloud infrastructure supports their internal businesses, giving them the flexibility, scale and cutting edge technology to serve some of the most important product teams.
Technical Case Study, 230 KB, Microsoft Word file
Products & Technologies
Microsoft IT was tasked with the challenge of helping the Microsoft Research and Development community build and scale server environments that would enable them to test a full software development life cycle. However, this was a timely and costly process that often maximized budgets, resources, and physical server rack space while underutilizing server capacity.
Microsoft IT developed and built Agile Labs, a private cloud infrastructure that is able to maximize resource utilization by offering compute resources on an as-needed basis and by centralizing infrastructure management.
As the worldwide software development leader, Microsoft requires a robust IT infrastructure that can quickly adapt to a constantly changing and developing environment. Keeping pace with customer requests for various storage, performance needs, and virtual machines built to unique customer specifications can tax an IT department's resources and ability to scale in a timely manner.
Historically, each product group's equipment planning specifications frequently called for individual server environments built at a capacity that could test a single software application through the development life cycle. This means that every environment was built to accommodate maximum usage when testing was at peak demand. These requirements resulted in developing server environments that could challenge a budget's bottom line while delivering an environment that was underutilized and rarely operated at full capacity.
With multiple environments built to these specifications, physical lab space and potential carbon emissions were a growing concern. Incremental changes focused on improved utilization and allocation of resources did not resolve the issues or increase the efficiencies of infrastructure operations. Even with consolidation and virtualization efforts, many problems remained;
- Machine allocation was a lengthy, timely process that could hinder a software release timeline.
- Gaps and differences in infrastructure offerings impeded the application life cycle and release pipelines.
- Product groups were spending a lot of time reacting to infrastructure issues, rather than developing, deploying, and supporting great applications.
Microsoft IT identified the lab development issue with the Microsoft Research and Development (R & D) group and wanted to deliver a solution that would enable internal, Microsoft customers to have access to server environments that could be built to distinct specifications, with scalability and without maximizing budgets or physical resources. The challenge was to develop an environment that would allow for better server utilization and resource allocation. Microsoft IT recognized this as an opportunity to develop a private cloud infrastructure and to offer IT as a service. This would allow them to improve how server environments and resources were managed, used, and allocated. A private cloud environment offered customers deployment agility, reduced maintenance and management requirements, pooled resources (which could be allocated on demand), and a self-service environment that provided customers with the flexibility and scalability they needed, when they needed it.
During the evaluation period, Microsoft IT considered two potential cloud options; leverage System Center as a base to build a cloud environment upon or build their own cloud solution. By building on top of the System Center platform, Microsoft IT gained all of the benefits of Windows Server® 2008 R2 and Microsoft Hyper-V Server® 2008 R2. Microsoft IT began development with System Center Virtual Machine Manager 2008 and created a Microsoft internal, hosted, enterprise private cloud environment code-named Agile Labs. As subsequent System Center releases were ready for internal review, Microsoft IT implemented each version and upgraded Agile Labs to System Center Virtual Machine Manager 2008 R2 and then most recently to Microsoft System Center 2012.
What Is Agile Labs?
Agile Labs is a comprehensive cloud and data center management platform that enables customers to efficiently manage their environments, server infrastructure, and client devices. It offers on-demand virtual resources with centralized infrastructure management, end-to-end environment design, compute provisioning, application deployment, and monitoring services. Agile Labs is able to maximize resource utilization by offering compute resources on an as-needed basis and by centralizing infrastructure management.
Microsoft IT manages Agile Labs as a supported service designed specifically for Microsoft Research and Development engineers. This diverse product group needs range from more sophisticated teams that have high-level tooling and automation tasks and require an API to check out and consume resources on demand (and check back in) to teams who don't have automation but need a web portal to manage their virtual machine resources. As a result, Agile Labs is a traditional end-tiered application built on top of System Center and provides an API, web portal, and chargeback and consumption data.
Agile Labs provides these customers with an ability to request machine time (physical or virtual), on an as-needed basis, where user budgets are charged only while the machine is powered on. This allows R & D the ability to better fine-tune their server budgeting models and gives them the flexibility to pay for resources only when they are in use. It also allows Microsoft IT to increase utilization of physical assets while reducing the overall constraints that come with managing power, space, and cooling.
The primary benefits of Agile Labs are:
Why On-Premises Private Cloud with System Center?
- Decreased provisioning time - customers can spend more time in development and testing and less time setting up their lab environments.
- Increased consistency in operating system and application deployment
- Test automation - no longer requires a human presence which increases the level of test and automation inside Microsoft.
- Abstracting - abstract-out managing hosts and operating systems allows developers to focus on development and not operational tasks.
- Chargeback for consumption - customers are only charged for the resources they use, which minimizes costs associated with resources, infrastructure management, and employee time spent and maximizes a product group's ability to determine the cost per test case or scenario.
- Providing tooling - that can enhance development and test engineering efficiencies.
- Decreased capital costs (sharing of resources) - by decreasing costs for managing and maintaining thousands of servers, Agile Labs is able to help customers do more with less. Customers can improve server planning and reduce costs by only buying the servers they need for average usage and relying on Agile Labs for overflow when they need to meet peak demand. This allows customers to reduce capital expenses normally used to purchase and maintain a server farm and to help improve efficiency.
- Improved efficiency - ensures Microsoft lab facilities, servers, and equipment are used to their fullest potential.
- Flexible customer options - customers have on-demand access to resources and can check resources out and in (based on the library concept) providing the opportunity to share resources and only pay for what they use.
During the evaluation phase, Microsoft IT also looked at Windows Azure. The Windows Azure platform is an Internet-scale cloud services platform hosted through Microsoft data centers. The platform includes the Windows Azure operating system and a set of rich developer services. At the time of Agile Labs development, one of the essential client scenarios required the ability to allow application teams to test on-premise installations in an on-premise private cloud situation. The Windows Azure global network of Microsoft-managed data centers solved a different set of problems then the needs of the R & D group within Microsoft.
Note: Windows Azure is a comprehensive cloud computing solution that offers an open and flexible cloud platform. It is important to understand which cloud technology makes sense for your business. To explore Windows Azure, visit http://www.windowsazure.com/en-us/
Microsoft IT Benefits
- By offering IT as a service, customers are better equipped to manage capital expenditure budgets, streamline processes, and reduce operational costs.
- Reduce management and maintenance needs to support server environments.
- Centralization and consolidation of resources allow economies of scale in lab architectures and purchasing decisions.
- Participate in "First and Best" practice by deploying System Center private cloud before publicly available and by providing valuable feedback to the product group before products go to the market.
- Increase awareness and highlighted positive environmental improvements in energy consumption. Previously with every 100 kilowatts of deployed compute capacity, Microsoft IT could deploy 500 physical servers. By deploying virtualization, that same 100 kilowatts of deployed compute capacity could deploy 5,000 servers. By implementing a private cloud environment, 100 kilowatts of deployed compute capacity could see 50,000 or more servers deployed in the same period of time.
R & D Customer Benefits
- Developers can focus on development and not on operations, improving their effectiveness and efficiency.
- Product groups can manage operational expenditures and not focus on capital budgeting.
- Experience consistency in resourcing and infinite capacity.
- Pay-by-use chargeback model provides the ability to associate direct costs to individual test cases.
- Improved speed and access; it is now quicker to get resources from the cloud versus provisioning physical servers.
Microsoft IT strives to make its onboarding process as light as possible to minimize the time from project inception to resource usage. Once onboarding is complete customers can access their virtual machine resources through one of the usage scenarios:
- Full .NET API through a WCF (Windows Communication Foundation) web service - allows customers to have full control to check out virtual machines; take, apply, save, and delete snapshots; and check-in functionality.
- Web portal access - this option offers customers check-in/out of virtual machines and other common user actions.
- Full support - customers can work with the Agile Labs support team to have their environments built and delivered ready to use.
Throughout the design and development of Agile Labs, Microsoft IT remained cognizant of these best practices:
Know Your Customer
The success of the private cloud infrastructure that you build hinges on having a strong understanding of your customers' needs and drives your ability to meet those needs. Microsoft IT knew that building a private cloud for the Microsoft Research and Development community would require a different set of specifications than other typical Microsoft internal, line-of-business groups and kept those requirements at the forefront of the development process.
Understand Your Funding Model
By working collaboratively with their customers, Microsoft IT was able to build a strong business case and chargeback model that illustrated the benefits and services provided by Agile Labs. As a result, Microsoft IT has been able to demonstrate the value of Agile Labs to the company, which has enabled them to continue their work and further develop and expand the Agile Labs service.
Fail safe and Fail Small
Understand and plan for the possibility of equipment (server, racks, and machine) failure so that you can avoid disrupting service availability in the event of a failure. As part of the private cloud infrastructure design process, Microsoft IT planned fault domains accordingly to minimize impact. This was done so that if a host was lost, the effect was minimal and did not affect any other fault domain.
Don't Reinvent the Wheel
Evaluate the research and understand the work that has already been done. Products will continue to evolve and reference materials, case studies, existing content, data, and examples can provide a great framework and architecture to start from instead of starting from scratch.
Results to Date
As of April 2012, Agile Labs has realized the following results:
- Delivered approximately 149,000 virtual machines
- Logged over 1 million build or rebuild requests
- Capacity capable of hosting 10,000 virtual machines
- A 66 to 75 percent cost savings relative to purchasing server capacity required to have all of the virtual machines required at peak demand concurrently
What Is Next?
As Agile Labs continues to grow, their development goals are focused on:
- Continuing Operational maturity
- Increasing the percentage of actionable versus non-actionable alerts
- Implement auto-healing features that enable data sources to remain consistent and stable
- Develop better service level agreements (SLA), metrics, and key performance indicators (KPI)
- Sustained service
- Continued to add features and integrate with additional functionality as new releases come out.
- Geographical expansion
- Evaluate the international demand for access to Agile Labs and understand delivery options. For example, would smaller sites require a smaller cloud or leverage an existing cloud?
- Document the benefit from economies of scale and the ability to offer blended rates for all sites due to the ability offer the Agile Labs service globally at the same cost.
- Improve availability options
- Explore Windows Server 2012 options managed by System Center 2012 SP1.
- Review scalability to determine any geographical distribution or application availability options without additional investment in hardware.
- Simplify the onboarding process
- Provide a one-click onboarding process.
- Evaluate workflow management solutions created by the Orchestrator component of System Center 2012 which could allow for the automation of resource creation, monitoring, and deployment.
- Determine process to allow Microsoft IT to complete the onboarding and validation of billing after onboarding to provide customers access to resources faster.
- Additional System Center features
- When Microsoft IT was developing Agile Labs, the functionality available in System Center was not as robust as it is now. As a result, some of the investments Microsoft IT made are now built into the product. Where possible, Agile Labs is moving towards using all inbox functionality. And as the code base is reduced, Microsoft IT is building in additional value by adding new features such as the service provider foundation (SPF) API, the Orchestrator component of System Center, the self-service experience delivered as part of System Center 2012, and the like.
By carefully evaluating existing server environments and new virtual machine environment requests, Microsoft IT was able to identify a deficiency in server resource utilization. Microsoft IT knew they had an opportunity to deliver an "agile" offering, on the basis of increased utilization and flexibility that would reduce costs, increase access to resources, improve employee productivity, streamline processes, and positively impact capital expense budgets.
The development of Agile Labs' private cloud using System Center 2012 provided improved infrastructure management and utilization while reducing the total cost of server ownership for individual product groups within Microsoft. Agile Labs is now a sustained, internal service that offers on-demand virtual resources with centralized infrastructure management. By remaining Microsoft's First and Best customer, and by implementing System Center, Microsoft IT was able to share some of the code base developed and best used practices established by Microsoft IT with the System Center product team to review before the product was publically released.
Throughout this process, Microsoft IT recognized that with the implementation of the private cloud the nature of conversation has changed. The focus is now on the applications that can be delivered and the needs of the customer and no longer based on what operations can or cannot deliver. This type of operational shift better positions Microsoft IT to support the growing needs of its customers and enables them to be a strong partner and strategic asset to their customers.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:
© 2012 Microsoft Corporation. All rights reserved.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Hyper-V, Windows Azure, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.