Delivering a Scalable Private Cloud Using Windows Server 2012
How Microsoft IT Designed a Private Cloud Infrastructure Using Windows Server 2012 and System Center 2012 SP1
Quick Reference Guide
The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization.
Microsoft Information Technology (MSIT) is adopting cloud computing as the first choice for developing new IT solutions and applications. Cloud computing is providing a new way to develop and imagine IT at Microsoft.
Quick Reference Guide, 163 KB, Microsoft Word file
Situation: Microsoft Information Technology (Microsoft IT) is all in when it comes to moving to the cloud and is focused on developing ways to deliver new services faster than they have ever done before. As the company’s first and best customer, Microsoft IT adopts the latest versions of Microsoft products and applies them in real-world, enterprise-scale environments.
Leveraging Windows Server 2012 and Microsoft System Center 2012 SP1, Microsoft IT is in the process of finalizing the design of a new infrastructure platform (compute, network, and storage) that is optimized for the capabilities of the new operating system, hypervisor, and management tools. This design will form the basis for a private cloud infrastructure focused on delivering a highly dynamic, scalable, multitenant server platform to host 95% of their enterprise workloads and to close the gap for applications and services that are not candidates to move to public cloud services.
Why You Should Care
- Business user expectations for continuous services available on demand are increasingly challenging, costly, and taxing on IT resources.
- A private cloud offers IT organizations the ability to pool current datacenter resources, automate resource management, deliver greater agility, improve service availability, and realize cost efficiency.
- A private cloud infrastructure can consume a fraction of the space, energy, and maintenance cost of traditional physical server infrastructures.
Why a private cloud?
In a private cloud, key hardware resources—compute, storage, and networking—are pooled and abstracted into units that enable you to dynamically provision and scale applications and resources. Resources can quickly expand or contract through automation or workflow, so IT services can scale up or down almost instantly to meet demand.
When planning their next cloud infrastructure utilizing Windows Server 2012 and System Center 2012 SP1, Microsoft IT wanted to create an environment that would complement their public cloud offering and develop a private cloud that would deliver self-service management features to cloud consumers while offering elastic fabric management capabilities to Microsoft IT operations.
Private Cloud Benefits
- Pooled resources provide applications access to hardware resources that can be scaled across multiple applications so that the same physical storage, network, and compute can be used by multiple applications.
- Multi-tenancy eliminates the physical location from the data and system access equation; data from several corporate or individual clients are segmented.
- Scalable dynamic infrastructure ensures that fabric resources can expand or contract through automation or workflow, so the private cloud can appear to be limitless while properly leveraging oversubscription risks and benefits.
- Self-service features drive agility benefits and deliver applications and resources as services. Users can request, configure, and manage IT services through an interactive portal providing automated provisioning.
- Usage-based metering enables users to pay for the resources they use, not resources they might use.
Windows Server 2012 Benefits
Microsoft IT used Windows Server 2012 to integrate a highly available, easy-to-manage multi-server platform with the following benefits:
- Designed to accommodate large cluster sizes.
- Live migration feature can manage simultaneous live migrations.
- Live storage migration available for both storage area network (SAN)-based and file-based storage.
- Virtual machines can migrate across cluster boundaries.
- Virtual Machine Replica can asynchronously replicate machines between sites.
Ease of management
- Windows PowerShell® cmdlets can be used to build command-line tools and automated scripts for configuring, monitoring and troubleshooting.
- Inbox management of profiles and personalized settings preserve settings and application cache data.
- BitLocker® Drive Encryption encrypts all data stored on the Windows Server 2012 operating system volume and configured data volumes, along with any Failover Cluster disks, including Cluster Shared Volumes.
- SMB encryption for data at rest and HPI workloads on SAN storage and ABM.
Hyper-V® host scale and scale-up workload support
- Windows Server 2012 can scale up to 320 logical processors on hardware, 4 TB of physical memory, 64 virtual processors and up to 1 TB of memory on a virtual machine with up to 64 nodes and up to 8,000 virtual machines per cluster or 1,024 virtual machines on a single node.
Hyper-V support for VMs running from SMB 3.0 file shares
- Windows Server 2012 Hyper-V allows VMs to run from a file share hosted on an SMB 3.0 file server, separating storage from compute and allowing greater flexibility and scaling for storage options and utilization.
Lower Total Cost of Ownership
- Design a cost-effective infrastructure with improved scalability, performance, and quality.
- Remote Desktop Services helps increase desktop density on the host server and employs lower-cost deployment options.
- Server Message Block (SMB) file share uses lower-cost, mainstream hardware for VDI deployments.
System Center 2012 SP1 Benefits
System Center 2012 SP1 offers a highly integrated set of server management components that support Windows Server 2012 and provides a flexible platform capable of managing IT data centers and supported public and private clouds.
System Center 2012 Component
App Controller, Service Manager
Application Performance Management
Application Management Across Clouds
Virtual Machine Manager
Service Delivery & Automation
IT Service Management & Reporting
Process Automation & Orchestration
Cloud Creation & Delegation
Virtual Machine Manager
Data Protection & Disaster Recovery
Data Protection Manager, Orchestrator
Configuration & Compliance
Additional System Center 2012 SP1 benefits include:
- Service Provider Framework authentication which allows App Controller to call on the SPF API when performing tasks such as deploying services or changing configuration properties.
- System Center 2012 SP1 can also integrate with other emergent solutions like Windows Azure™ Services for Windows Server.
When planning their private cloud architecture Microsoft IT was mindful in ensuring that the infrastructure they designed provided flexible and scalable cloud benefits for workloads and applications that may not be suitable for the public cloud. They also placed an emphasis on planning for the future and were considerate of potential hybrid cloud needs to move between public and private clouds depending on the applications’ needs and requirements.
Microsoft IT Strategic Focus:
- Agility. A private cloud infrastructure lets you move much quicker than a traditional approach to building a data center, and Microsoft IT wanted to ensure effective hardware utilization through a one cloud/one fabric of pooled resources designed to quickly scale up and down on demand, as needed.
- Customer Satisfaction. Utilize Windows System Center tools to enable customers to take advantage of self-service and billing options that charge customers only for what they use, not what they could use.
- Organizational Effectiveness. Develop an IT organization that evolves with the technology and that can operate with efficiency and flexibility to meet customer requirements by removing traditional IT roadblocks and defining new processes, procedures, and roles that work in support of the new private cloud infrastructure.
- Economics. A private cloud infrastructure enables more efficient use of existing capital expenditure in network storage and compute as you drive higher utilization across an existing set of resources.
Microsoft IT’s private cloud solution is about automatically and efficiently delivering IT services on request and dynamically scaling those services on demand. In the private cloud environment, Microsoft IT becomes the service provider for cloud resources through virtualization and data center automation. By building their private cloud infrastructure on Windows Server 2012 and System Center 2012 SP1 they were able to realize greater scalability, maximize hardware usage and provide customers with flexible access to data and resources through simplified management and improved self-service options.
Microsoft IT recognized that a key distinction between traditional data centers and a private cloud is the abstraction of physical resources. Physical resources are placed into higher-level resource pools, fault domains, upgrade domains, and so on. After those resource pools were created, Microsoft IT was then able to map their physical resources to the infrastructure; which meant that their private cloud could then aggregate resources (compute, storage, and network) so that capacity requirements are dynamically modified to match fluctuation in the system workload.
It is important to note that the physical infrastructure design is a key solution element that must provide the ability to pinpoint, troubleshoot, and diagnose physical infrastructure failures. With the addition of cloud hardware abstraction troubleshooting and locating a hardware failure can become complicated if the physical infrastructure design does not account for that need.
Resource Management and Fabric Capacity
Once the hardware infrastructure design was in place; Microsoft IT established logical compute, storage, and network resource groups creating an abstraction layer for their physical hardware components. On top of the resource management plan, Microsoft IT then developed the cloud abstraction layer that categorized their fabric into groups of capacity and capabilities.
Managing cloud resources ensures that the available compute, storage, and network resources are efficiently utilized and quickly scalable to meet customer demand quickly. A premise of the fabric is that all elements of the fabric are highly available at all times, delivering the perception of limitless capacity, and being properly monitored to ensure that common servicing does not impact performance. While delivering the perception of limitless capacity, it is vital to manage the risk of oversubscription through trending and monitoring offered by System Center 2012 SP1.
- Compute components must be clustered appropriately to meet VM expectations along with the fabric administration expectations during normal hardware servicing.
- Storage should be elastic with the ability to move between capacity units without impacting the service and while maintaining stability and reliability. Microsoft IT is leveraging Windows Server 2012 Storage and VM Live migration features to achieve this.
- Network must be resilient against outages and maintenance of any portion of the fabric should not cause a disruption in the service or performance degradation. Microsoft IT is leveraging Windows Server 2012 Network Teaming as well as SMB Multichannel to achieve this requirement.
Microsoft IT also took into consideration the diverse needs of their customers when developing their resource management plan. They determined that resource management should be modular in design and should be able to scale in any direction as quickly as it can be consumed—while also accounting for the probability that not all requests will require the same capacity or structure and that it is not a requirement of a cloud infrastructure to have a uniform set of components and capacity across the cloud.
Single Rack Fault Domain
Properly planning for hardware and system failures was of key importance to Microsoft IT when designing their next cloud hardware platform. Upon analyzing their existing infrastructure, Microsoft IT found that their traditional data center design had allowed physical resources to become intertwined with different levels of resources, availability and impact depending upon compute, storage, or network failures—meaning that a compute failure may impact a portion of a given rack, a storage failure may impact an entire floor of the data center, and a networking failure could impact as many as eight racks or as few as one rack. Microsoft IT also learned that during an outage their communication and identification of the impact was lacking in responsiveness.
Because private cloud hardware resources can be pooled and scaled, Microsoft IT had an opportunity to restructure how they provisioned, allocated, and reported on resources; which enabled them to prevent, reduce, or project the scope of impact for hardware related failures. To define their fault domain, Microsoft IT determined that all potential impacts: physical, logical, or software related, across all resources, must be consistent and established a single rack as the predictable and communicable fault domain. This allowed Microsoft IT to place VMs across faults with little logic behind the scenes. A single rack fault domain also allowed automation systems the ability to guarantee fault-tolerant VM placement even within a software abstraction over the private cloud hardware deployment.
As Microsoft IT was planning the management implementation for cloud infrastructure, there was a realization that the following traditional systems management functions and tools implementations did not represent cloud use cases.
- System Center Orchestrator to automate complex processes
- Virtual Machine Manager to simplify private cloud lifecycle management
- Powershell V3 workflows
- Configuration Management
- System Center Configuration Manager to ensure compliance and desired state
- Extensive Powershell integration
- Fault Management
- System Center Operations Manager to monitor software and hardware faults
- Enhanced debugging and diagnostics tools
- Hardware fault isolation
- System Center integration with IT Service Management tools and processes to deliver capacity, availability, and service level reporting
A new approach was created that cut across traditional functions to describe administrative (hoster) and self-service user (tenant) scenarios. These scenarios almost always leverage more than one component of the manageability toolset, which is also a difference from the traditional environment.
Microsoft users expect self-service options that provide them with the ability to allocate and provision resources only limited by what they are willing to pay for. To accommodate this demand Microsoft IT used System Center 2012 SP1 technology to deliver applications and resources as services where customers can request, configure, and manage IT services through an interactive portal that allows for automated provisioning.
- Microsoft self-service users can:
- Request a new application cloud
- Provision apps services to the application cloud
- Monitor the state of their clouds and services running within
- Scale or duplicate an existing app pattern
- Share resources within their owned clouds
- Manage an app throughout its life cycle
When building their cloud management solution, Microsoft IT wanted to take advantage of and deliver on the "Cloud on Your Terms" concept and provide customers with the resources they need when they need them while creating an IT infrastructure that could manage those needs.
To create a positive user experience, Microsoft IT wanted to minimize a customer’s need for infrastructure awareness and develop portals and templates that were easily consumable with a consistent experience across all Microsoft IT clouds. In addition to transparent cost and status information, Microsoft IT focused their efforts on developing and improving the following experiences:
- Onboarding. Provide a single point of entry for all new deployment requests using a standard template with optional add-ons or with a customer-provided VM template and cloud services, and any traditional capacity requests would require manual fulfillment.
- Self-Service Enabled Standard Change. Customers can control their own resources to meet business needs with the ability to execute routine tasks on demand.
- Delegated Admin Experience. Hosters are subject to Role Based Access Controls (RBAC) with the majority of Tier 1 and 2 tasks being automated and highly complex and conventional tasks being automated. With this cross-tier and technology service health would be visible.
- Life-Cycle Management. Hardware end of life is an issue that must be planned for with common processes and standards across the infrastructure. System design must account for end-of-life processes with consistency and efficiency through pre-defined life cycle tracks that dictate the level of automation and cost involved to migrate off of obsolete hardware.
Develop resiliency in the platform
When planning and designing a private cloud platform, it is paramount to understand your fabric capacity needs and demands and to minimize failure impact by developing resiliency in your platform. To develop resiliency in their cloud platform, Microsoft IT is deploying with two redundant paths and has established that full utilization will occur at 45% consumption. At 45% consumption if a full portion of the infrastructure is lost, the remaining resilient portion will operate at 90% capacity allowing for minimal performance impact or failure cascade during an outage. For network and SAN infrastructure, Microsoft IT planned for a single device failure (or maintenance) to be spread across other devices before impact.
Understand the economies of scale
A private cloud infrastructure offers valuable economies of scale. However, when the perception of limitless resources are projected through a cloud abstraction, it’s vital to manage the risk of oversubscription (which is required for that perception to happen) against the potential impact to the customers. Different technologies oversubscribe at different rates and impacts; and it is important to understand the behavior of compute, storage, and network at target oversubscription rates.
Plan for hardware maintenance and failure
Planning for hardware maintenance and failures is a critical success factor in any cloud platform design and must be done in a way that minimizes potential customer impact.
This important step is often overlooked or not properly planned for. Microsoft IT was careful in developing a plan that accounted for maintenance or failure. During planning, Microsoft IT found that adding the cloud abstraction layer and maintaining the perception of limitless capacity complicated the diagnosis and troubleshooting of physical infrastructure problems. The only way to mitigate this complexity and to ease troubleshooting was through proper physical infrastructure design that would allow Microsoft IT to quickly and accurately pinpoint and, optimally, avoid failures. Failure avoidance can be achieved through trending and business intelligence techniques offered through System Center Virtual Machine Manager 2012 SP1.
Prepare for organizational transition
As part of this development process, Microsoft IT realized that a paradigm shift in their IT organization would need to occur to successfully implement a private cloud infrastructure. Existing IT organization processes, roles, and infrastructure management services no longer aligned with the new private cloud platform and needed to be evaluated to ensure consistency and efficiency in new functions and processes.
In developing their private cloud solution, Microsoft IT was careful to plan for current and future cloud solutions that would meet the current need while being flexible and dynamic in its ability to transform as needs changed. To do this, Microsoft IT placed their focus on developing a strong foundational platform hardware design that when combined with solid software manageability architecture allowed them to build a cloud abstraction that would enable them to virtualize workloads and to help organizations control and cut costs while improving the scalability, flexibility, and reach of Microsoft IT systems.
Private Cloud Solution Hub
Microsoft Private Cloud
Microsoft System Center 2012
Microsoft System Center 2012 – TechNet
Windows Server 2012
Windows Server 2012 Hyper-V