IT Showcase On: Microsoft IT's Hybrid Cloud Strategy
Microsoft IT’s Journey to the Cloud
Quick Reference Guide
The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization.
Quick Reference Guide, 156 KB, Microsoft Word file
Situation: Microsoft IT currently uses mostly conventional on-premises products. But it is moving rapidly to a mixed-use environment in which it utilizes some combination of on-premises software, software as a service (such as Microsoft® Exchange Online), and Windows Azure products.
Product: Microsoft developed the Windows Azure platform as a foundation for developing applications that run in the cloud. Approximately 20 million businesses and more than a billion people use Microsoft cloud services, which are the products, services, and customer experiences that Microsoft offers through hosted and online services. The Windows Azure platform provides a group of cloud technologies―each providing a specific service set for application developers―and both applications running in the cloud and on local systems can use it.
Comparing Cloud Offerings
Infrastructure as a service (IaaS): IT departments typically use IaaS to run client and server applications on virtual machines. The vendor manages the network, servers, and storage resources so that IT managers no longer need to buy, track, or decommission hardware. However, IT managers must continue to manage operating systems, databases, and applications. Depending on the service provider, IT managers may be able to perform limited configuration of networking components. IT staff can manage configuration remotely, through application programming interfaces (APIs) or through a web portal, for example, to increase application instances when demand spikes occur.
Platform as a service (PaaS): Enterprises use PaaS to develop, deploy, monitor, and maintain applications, while the cloud provider manages everything else. Developers can manage configuration remotely, as with IaaS, but they do not have to configure the virtual machine’s image directly.
Software as a service (SaaS): This is perhaps the most familiar service-delivery model. Enterprises subscribe to prepackaged applications that run on a cloud infrastructure and that allow access from a variety of devices. Enterprises rarely are responsible for administration, beyond limited configuration and data-quality management.
Why You Should Care:
Embrace the right IT for your business – The combination of choice and flexibility is key to cloud adoption and provides the ability to balance control with cost and agility.
Improved agility and focus - Many challenges that organizations face relate to the infrastructure that they require to deploy, run, and manage applications. Windows Azure enables developers to focus on building applications rather than worrying about managing an infrastructure and its components.
Make more of existing investments - Windows Azure makes use of robust and familiar tools, including Microsoft Visual Studio® development system, Microsoft Silverlight®, Microsoft .NET, and Microsoft System Center products. It provides unlimited scalability at a lower cost than on-premises solutions, and a consistent environment for development, test, preproduction, and production.
Consistent experience - Microsoft IT’s cloud-services strategy is consistent and highly complementary with the Microsoft on-premises offerings. Rather than an all-or-nothing approach, Microsoft IT (and Microsoft customers) can leverage new cloud technologies and existing capabilities.
Better economics - Microsoft IT estimates that by moving applications to Windows Azure, an enterprise can save approximately 30% of overall addressable expenditures for support; application development and maintenance; and hardware, hosting, and software licenses.
Microsoft IT’S Journey to the Cloud
Microsoft IT has a three-prong approach to developing applications in Windows Azure:
- Identify existing applications that are not mission-critical, that have solid fallback positions, and that have workload patterns that are suitable for the cloud. These are the first applications that Microsoft IT migrates to Windows Azure. And these are used to develop best practices and reusable components for other, more-complex migrations.
- Ensure that new applications that developers can write or deploy on Windows Azure indeed are written and deployed on Windows Azure. Make Windows Azure the default application-development platform.
- Create multiyear plans, and then begin moving some of Microsoft IT’s biggest and most-critical applications to Windows Azure. This enables the leveraging of experience gained from earlier, less-complex migrations.
The New Normal – A Hybrid Cloud Environment
Embrace the right IT for your business. Microsoft IT is embracing the right IT for its business with a hybrid cloud solution comprised of private and public cloud features. Their solution provides improved connectivity options and a common cloud management platform that can provide internal customers with the agility and scalability they demand while allowing Microsoft IT to effectively manage and maintain the infrastructure and resources while providing a consistent user experience.
As part of the solution, Microsoft IT also accounted for the transition of the following capabilities and applications:
- Commodity capabilities and workloads - (CRM, payroll, mail, collaboration) will move to Public SaaS.
- Legacy applications – (those with little to no new investments) will lift and shift to Public IaaS as well as Private IaaS. This depends primarily upon the connectivity and manageability solutions as well as the security and compliance options.
- New applications – development of new applications will be for Public PaaS.
- Differentiating applications - (applications critical to enterprise’s core business) will move to Public PaaS. Falling back to Private IaaS is an option if that does not work.
- Cloud Management capabilities- will unify the various cloud components under a single pane of glass.
To ensure a successful cloud adoption, planning must begin early and be considerate of the potential for change within the existing organization.
- An executive-level cloud adoption steering committee comprised of representatives from every group within Microsoft IT was tasked with overseeing and coordinating all aspects of the cloud adoption.
- Area-specific working groups were formed to develop standards and guidelines related to software development life cycle, security, operations, architecture, and the like.
- Integration and coordination across working groups is a key success factor in this process. All working groups shared an integrated (multi-year) milestone schedule describing key milestones and critical interdependencies between groups. This established a level of awareness and responsibility between the groups to communicate often, to integrate their work, and to facilitate clear roles and responsibilities.
- Employee training is critical. Microsoft IT offered free training to all employees and has trained over 50% of the organization ranging from developers, project leaders, and program managers to accountants, lawyers, human resources staff and executive administrators. The objective was to normalize cloud nomenclature at every level.
- A Human Resources cloud working group was also vital to ensure that as roles changed employee transition was accounted for and additional training may be required.
Planning for security must be done concurrently with infrastructure and organizational planning. Microsoft IT manages a Security Development Life Cycle comprised of training to understand implications, scheduled security reviews, analysis, and an incident response plan, and utilizes Windows Azure Active Directory® to provide identity management and access control capabilities to cloud applications―whether those are Windows Azure applications, Microsoft Office 365, Microsoft Dynamics CRM Online, Windows Intune, or other third-party cloud services.
Note: Every organization will have their own unique security needs, it is vital to understand what yours are.
Windows Azure Active Directory
- Windows Azure Active Directory is a modern cloud service providing identity management and access control capabilities to cloud applications.
- Combines the proven enterprise capabilities of Active Directory with the scale and elasticity of Windows Azure, so you can bring your applications to the cloud easily.
Benefits of the Windows Azure Active Directory include:
- Single sign-on across your cloud applications - Windows Azure Active Directory (AD) gives end users a seamless, single sign-on experience for Microsoft and third-party cloud services as well as applications built on Windows Azure. Delivering a consistent experience to users.
- Simple integration with your on-premises Active Directory - make more of your existing investments You can quickly extend your existing on-premises Active Directory to apply policy and control and to authenticate users with their existing accounts to Windows Azure and other cloud services.
- Easily create and manage identities in the cloud - For organizations that don’t require an on-premise Active Directory deployment, Windows Azure Active Directory provides an easy way to create and manage identities in the cloud.
- Build social enterprise apps - Windows Azure Active Directory provides an enterprise social graph for applications to discover information and relationships between identities easily so that developers and ISVs can build a richer end user experience.
- Flexibility to use your development tools and social identities - With Windows Azure Active Directory, you can use Microsoft .NET, PHP, Java, and Node.js and get out-of-the-box support for popular web identity providers including Windows Live ID, Google, Yahoo!, and Facebook.
Moving Existing Applications
The decision process for moving applications takes into consideration potentials for both risk and success. Microsoft IT first filters for risk and other application attributes which may preclude an application from migrating. Factors such as regulatory or data sovereignty issues, special hardware requirements or application with complex upstream or downstream interfaces are evaluated. Microsoft IT then filters for success by identifying applications which add the most business value, solve critical problems, and where they know the platform is well established. Finally, applications with the highest potential ROI are selected to move.
When an organization moves existing applications to Windows Azure, it is important to remember that it:
- Can be more complex than building new applications designed for Windows Azure or cloud computing.
- Is better to choose applications with lower risk and complexity for the first migration to become familiar with the process. Enterprises then can attempt migration of more complex applications that have regulatory exposure, integration issues, downstream dependencies, or other complicated attributes.
- Helps to understand a migrating application’s technical aspects and how the enterprise is utilizing it before migration.
- Helps to understand the application’s monitoring implications and its life span, to ensure alignment with the Windows Azure solution plan.
- Is beneficial to understand workload patterns, which indicate suitability with Windows Azure. The following table illustrates these patterns:
On and Off
Typically includes seasonal or time-bounded workloads that have processing requirements only during certain periods.
Includes predictable bursts of activity during certain days of the week or times of the year.
Typically includes unpredictable events that trigger heavy usage requirements. Therefore, the enterprise must scale design considerations to predict.
Typically associated with new development and, in particular, with startups or specific groups in larger companies.
Windows Azure Projects
Moving a Low-Risk Application: the Auction Tool
The Auction Tool is a component of the Microsoft Annual Giving Campaign. The tool experienced very high spikiness (Predictable Bursting workflow pattern). Usage was low for most of the month and then inadequate for the last-day spike, as 20% of all bids were made on the last day. This left many bidders unable to contribute because the Auction Tool was unavailable due to heavy last-day usage. This project had three sets of Microsoft Internet Information Services (IIS) and Microsoft SQL Server® virtual machines.
- Common platform - Utilized a hybrid Windows Azure cloud and on-premises architecture
- Common identity infrastructure - Utilized single sign-on (SSO) through Active Directory® Federation Services (AD FS)
Auction Tool Results
- The solution scaled easily from four to 24 instances for the last day’s high traffic.
- Every user that wanted to bid could actually do so.
- It was a record year for the Microsoft Annual Giving Campaign, with approximately $500,000 U.S. dollars raised.
Lessons Learned from Moving and Using the Auction Tool
The learning curve for development was relatively small, because the Windows Azure development tools are similar to the current Visual Studio development environments.
- Scaling the environment to meet demand is a strong point of the Windows Azure platform, and the process was straightforward and easy to implement.
- The Windows Azure platform is consistent. Development, preproduction, and production environments are the same and allow for a smooth path to production.
Private Cloud: Redmond Ridge Lab
Microsoft IT has adopted a private cloud architecture to support their large development and test environments. Traditional development and test requirements had ever increasing server deployments that were causing Microsoft data centers to reach capacity. Servers deployed were often underutilized, deployment times were increasing, and support costs were rising.
- Agility, focus, and scalability - Microsoft IT developed a private cloud and IT utility services strategy to deliver IT services to internal customers. This approach is based on the idea of creating large, shared, centralized services so that Microsoft IT customers do not have to worry about adding infrastructure capacity.
- Maximum control - With a private cloud Microsoft IT customers have maximum control of their critical and sensitive data and applications.
- Better economics - Pay for what you use. Microsoft IT finances its services by charging business groups for providing services. An additional benefit of a virtual machine is that the business groups do not pay for the cost of the underlying physical machine; the monthly hosting fee includes that cost. As a result, the business group does not need to worry about replacing the physical hardware when purchasing a virtual machine, which increases savings significantly over the server hardware’s life.
Redmond Ridge Private Cloud Results
- Building, deploying, managing, and supporting virtual servers has increased Microsoft IT operational efficiencies and reduced overall operational costs. On average, virtual servers cost approximately 35-40% less per month to support than physical servers.
- The target goal for 50% virtualization translates into a server cost benefit of an annual $4.6M savings opportunity for Microsoft IT. With additional savings achieved through reduction in "all-up" support costs, 50% virtualization could potentially achieve a $16M savings.
- With this move to the private cloud, Microsoft IT was able to improve the SLAs promised to business units, improving customer satisfaction.
- Today, the Microsoft IT lab space requirements have stopped growing. In March 2010, Microsoft IT was at 34% virtualization with 1,554 total hosts supporting 7,224 virtual machines―all of which help to slow deployment times.
Redmond Ridge Lessons Learned
- Ongoing IT operational costs and capital investment costs (servers, datacenter equipment) reduced, cloud technology is paid incrementally, saving organizations money.
- No longer having to worry about constant server updates and other computing issues, organizations will be free to concentrate on innovation.
- Highly automated – IT personnel no longer need to worry about keeping software up to date.
New Application: Social eXperience Platform (SXP)
The Video Showcase site on Microsoft.com provides access to more than 8,000 marketing videos. Personnel used the previous on-premises solution to manage the site’s comments and ratings, to filter profanity and spam, and to perform site moderation. However, there were problems with scalability, maintenance costs, upgrades, performance, and availability.
- Common platform - Utilized a hybrid Windows Azure cloud and on-premises architecture.
- Multitenant capabilities - Any subsite on Microsoft.com can use SXP. The first tenant was the Video Showcase site, and as of mid-2011, the service has 57 tenants.
- Existing resources - Separate SQL Azure database for each tenant was used to store comments and ratings.
- Existing toolset - Utilized the Microsoft System Center Operations Manager Connector to provide alerts and notifications via Systems Center Operations Manager.
- Better economics - Reduced monthly costs from $15,000 USD to $45 USD.
- Increased availability from 99.1% to 99.997%.
- Agility - Improved the speed of new releases dramatically. Previously, issuing new releases could take as much as six weeks but was reduced to 45 minutes.
- Facilitated easy push-button upgrades that had no planned down time. Ten upgrades have occurred, and no transactions have been missed.
- Enabled 10,000,000 error-free transactions since December 2011.
- Reduced the average response time to less than 10 milliseconds (ms). Microsoft IT’s service-level agreement mandates 250 ms.
- Enabled Microsoft IT to replace more than 37 separate instances of a third-party competitive product with a Microsoft cloud-based solution that operates at enterprise scale.
SXP Lessons Learned
- Windows Azure greatly decreases the time and developer participation necessary for application maintenance and upgrades.
- Engineering and operations departments must partner early to ensure that application development includes operational integration, such as monitoring and maintenance.
- Windows Azure presents a fundamental shift in how IT departments approach the spending and resource allocation on which cloud computing is based. Spending patterns now focus on operational costs and not capital expenditures; while resource allocation now focuses on development and innovation, rather than maintenance and operations.
Hybrid Cloud: Volume Licensing System
The Volume Licensing system―one of Microsoft’s internal systems―processes the overwhelming majority of Microsoft revenue. And after analyzing the system and its requirements, Microsoft IT determined that moving this system to the cloud would require a hybrid design to account for systems like SAP, which are not yet in the cloud but are a required component of the system.
- Common platform – Cloud technology builds on existing Microsoft IT expertise and familiarity with the Windows platform and helps to maximize productivity and to reduce complexity by leveraging a common platform to manage a hybrid cloud environment.
- Existing toolset and experience – By utilizing Microsoft management tools, Microsoft IT can manage the complete environment including applications that do not reside within the cloud.
- Common identity infrastructure – Utilized Active Directory Federation Services (AD FS) to allow secure access from the corporate network and Internet.
Volume Licensing Results
- Per month savings on one of the Volume Licensing system applications reflected an average cost savings of 32%, reducing the monthly bill from $10,000 to $,7000 on average.
Volume Licensing Lessons Learned
- Design for failure and disruption - When you’re working with cloud-scale systems, it becomes almost a certainty that things will fail. If you assume that your servers have a mean time between failures of 30 years (which would make them staggeringly reliable) and you have 10,000 of them in your data center, it becomes a statistical certainty that one server will fail every day. Therefore, our guidance is to build systems with the expectation of failure and to have them work in such a way that failure has little or no impact. No system will ever be perfect, but we can get closer and closer.
- Expect changes - The Volume Licensing team found that there were some small issues: The fact that the cloud is worldwide means that all servers are on Coordinated Universal Time (UTC). And because of load balancing between multiple instances, all code has to be stateless—you don’t know anything about past requests from a client when the next request comes in. As with SXP, the team found that monitoring is critical and has made use of various Microsoft products to make this work.
- Monitoring is critical - There’s a business continuity/disaster recovery (BCDR) requirement for any system that’s as critical to the company as Volume Licensing.
- The laws of physics apply - Windows Azure Traffic Manager enables you to manage and to distribute incoming traffic to Windows Azure hosted services whether they are deployed in the same data center or in different centers across the world.
- Gateway communicates with on-premises systems - Microsoft IT still has many systems on premises and will likely continue to do so for some time. So the Volume Licensing team built a reusable gateway as an interface between SAP systems and the cloud, and between the internal taxonomy systems and the cloud.
- Third-party controls can be an issue - Although Microsoft IT has not had any issues in this area, it’s important to pay attention to third-party code. It may or may not support Windows Azure, and it may or may not be legal under current licensing agreements to scale an application that contains a third-party piece of code to 20 servers. Most vendors understand the cloud and adjust their agreements. But this is a new area, and it’s worthwhile to pay attention to it.
- Business continuity and disaster recovery - The cloud is worldwide. And with Windows Azure, you have a choice of data centers when you deploy code or store data. If the majority of an organization’s users are in the U.S., it probably doesn’t make sense to put the data in Singapore.
Knowledge is essential
- Application knowledge - Microsoft IT found that application knowledge resided in roles across the organization and that while using a portfolio management system did provide useful information, they frequently found that they needed knowledge from other sources―transaction volumes, database sizes, and traffic information, items that are typically known by a product manager, program managers, and operations staff.
- Cloud knowledge - You must know Windows Azure in order to make decisions application suitability, and it is essential to have in-depth knowledge about the application.
- Use the available tools - Microsoft offers several, and many other vendors and partners also have tools and methodologies for categorizing and segmenting applications.
Design for the cloud first
- Microsoft IT is focusing on designing for the cloud first at all levels of planning and development so that when an application is ready to move to the cloud the process is defined and ready, and Microsoft IT can take full advantage of the platform.
- In an environment where there are multiple instances of the same application, it is important that the applications are stateless.
- Design for failure. The cloud is an intrinsically resilient environment. If one instance fails, there are other instances that continue to work.
- Cloud applications should be highly modular. If they are well written and designed, the application will allow operations to scale and optimize them with less effort, which will minimize the cost and maximize their responsiveness.
- The cloud is worldwide. Choosing the right data center, deciding whether to use the Content Delivery Network, making sure that chatty applications are minimized, being thoughtful about caching, and generally paying attention to the fact that the laws of physics apply will pay dividends in the long run.
- The Windows Azure Platform developer portal
- Cloud computing content from Microsoft IT