Using Windows Azure to Implement a Hybrid Application
Technical Case Study
Published: June 2012
Microsoft IT used the Windows Azure platform to create a hybrid solution for the company’s performance management system, Performance@Microsoft. The hybrid solution enabled manageability and scalability, and provided Microsoft IT with valuable reusable components, lessons learned, and best practices to apply to future hybrid-application implementations.
Technical Case Study, 788 KB, Microsoft Word Document
|Solution||Benefits||Products & Technologies|
Microsoft IT identified its human resources-focused employee performance management suite of applications as a candidate for migration to the Windows Azure platform. The development team believed that the solution's components would benefit from the improved scalability and elasticity that a migration to the Windows Azure platform assured.
The development team migrated several modules of the existing solution to Windows Azure. This created a hybrid solution, as some components were located on-premises and some were on Windows Azure. This enabled P@M to benefit from the significant strengths of the Windows Azure platform.
Performance@Microsoft (P@M) is a suite of applications used to support employee performance management throughout the entire Microsoft organization. Microsoft employees use P@M for several different functions, including to:
- Establish and regularly monitor employee commitments and career-development plans.
- Complete an annual performance review.
- Enable feedback from both managers and peers about an employee's performance in his or her job.
Although Microsoft employees use P@M throughout the year, there are two key periods of heavy use: February through March, during mid-year review; and June through July, for year-end review. The latter is prior to the close of Microsoft's fiscal year-end, which is June 30. Performance@Microsoft is critical to Microsoft's ongoing Human Resources operations, and it is considered a Tier 1 application in the Microsoft IT (MSIT) application infrastructure. Almost every Microsoft employee uses P@M, which contains sensitive data, and which supports a key function within the Microsoft organization.
Figure 1. The P@M architecture as end users see it
From a user's perspective, the interface in the P@M solution is relatively simple. Users utilize a webpage to input commitments, and performance review information. The underlying architecture, however, houses a large amount of data, and complex application and database structures.
Previous Solution Architecture
Figure 2. Previous P@M architecture
MSIT architected the solution for P@M entirely by using the enterprise's on-premises infrastructure, located in Microsoft's corporate data centers. The architectural design was based on an Internet Information Services (IIS) cluster that housed the P@M websites, which include Performance, Promotions, and Feedback. The IIS cluster also included a set of web services that provide user authorization, retain domain data, and host P@M's primary data-input method.
P@M databases were all hosted on a Microsoft® SQL Server® 2008 R2 failover cluster, which contained the core performance-related data that P@M was designed to manage, and the configuration and log databases for a set of Foundation services. The solution also leveraged two external systems--Microsoft FeedStore (a data-warehouse solution) and SAP to provide organizational and business data that P@M required.
The Foundation components provide a set of shared common services, including application configuration, email notification, and performance and error logging.
Targeting P@M for Windows Azure
P@M was identified as a candidate for Windows® Azure™ for several key reasons, including that it had:
- A web-based interface, based in IIS. IIS applications typically experience seamless and optimal migrations to Windows Azure.
- A usage pattern, known as predictable bursting, that was well-suited for the Windows Azure platform. An application that experiences predictable bursting has significant increases of activity during certain days of the week or times of the year, which results in significantly higher user demand than it otherwise receives. Windows Azure can scale resources to meet application demand, so only the resources that users require and consume are paid for. On the Windows Azure platform, P@M would consume minimal resources during low usage periods. During peak usage times, Windows Azure enables P@M to scale to meet demand, for as long as that demand lasts.
However, the development team realized that not all of P@M components were ideally suited for the Windows Azure platform. Specifically, the team realized that migrating the data tier to SQL Azure™ would require a large investment due to the size and interdependencies of the P@M SQL databases. Therefore, the development team opted to keep the databases in the on-premises SQL Server 2008 R2 infrastructure.
Developing a solution based on Windows Azure meant that the development team had to establish design goals for the entire application, factoring in those pieces of functionality and its capabilities that were a good fit for Windows Azure.
Establishing the Design Goals
The design goals for the P@M migration included:
- Increase the scalability and extensibility of P@M.
- Ensure secure connections through the application, and protect sensitive information.
- Increase availability and resiliency for P@M.
- Provide best practices and lessons learned for use with future migrations.
The development team recognized quickly that several components of the original solution needed to remain on-premises, making the new implementation a hybrid solution. This hybrid solution would contain Windows Azure components and those components from P@M that needed to remain in Microsoft's data centers.
Designing a Hybrid Architecture
Once it was clear that a hybrid solution was the best approach, the team selected which P@M components would migrate to Windows Azure and which would remain in the Microsoft data centers.
Determining Azure Components
The development team determined that the P@M interface, or front-end, would migrate to Windows Azure, while the web services and data-tier components would remain in the Microsoft data centers. The website could be hosted in a Windows Azure web role, which would require minimal refactoring because the project code for the website would function similarly in an on-premises deployment as it does in Windows Azure. Therefore, only minor changes to the Windows Azure web configuration file were necessary.
The team also planned to migrate P@M's foundation services, which provided the application's core services, to Windows Azure. A second web role was deployed to host the foundation services, which positioned these services for reuse and enabled scaling, independent of P@M. The team used SQL Azure to store the foundation services' databases, which leveraged performance increases that the services would experience by using data hosted within the Windows Azure platform.
Determining On-Premises Components
The remaining application components, including the SQL 2008 R2 databases and the associated reporting services and maintenance pieces, would be managed by MSIT and remain in the on-premises data centers.
Integrating Windows Azure and the On-Premises Components
After determining that the P@M components would be split between Windows Azure and the Microsoft data centers, the development team began planning the integration of the two platforms and the exchange of data between them. The transfer of data between on-premises and cloud-based components was a critical aspect of the design plan.
The team used the following design components to provide hybrid integration:
Active Directory® Federated Services (ADFS) 2.0
ADFS 2.0 provides claims-based access and single sign-on for on-premise and cloud-based applications in the enterprise, across organizations, and on the Internet. Integrating ADFS with Windows Azure enabled the team to leverage the domain-authentication information that users were providing to the on-premises components and continue the authenticated session in Windows Azure cloud.
- Windows Azure Connect
Windows Azure Connect enables the Windows Azure-based components to connect to the on-premises resources via full network access that uses a virtual private network based on Internet Protocol Security (IPsec). Windows Azure Connect also enables tunneling to authenticate through the Windows Azure Connect agent to the on-premises Active Directory domain, thereby making the Azure components functioning domain members. Windows Azure Connect was used to establish communication between the P@M website, hosted in Windows Azure, and the on-premises SQL 2008 R2 databases.
- Windows Azure Service Bus
Windows Azure Service Bus, a second method of connecting Windows Azure to on-premises resources, also was used. Windows Azure Service Bus provides a message-based communication between endpoints, and relays messages through a pre-established socket connection. Windows Azure Service Bus does not require open inbound ports. Therefore, it was used when firewall limitations prevented the use of Windows Azure Connect, or where Windows Azure Connect's capabilities did not fit the design requirement. Additionally, Windows Azure Service Bus was used to enable web-service calls from the Windows Azure web role to service endpoints that were hosted in the data centers.
Designing for Resiliency and Scalability
One of the primary design goals for P@M in Windows Azure was to make the application more resilient and scalable, especially during peak usage times. Windows Azure's built-in functionality enabled the development team to achieve this for the P@M website without a significant investment in infrastructure and a moderate investment in development time.
The P@M Website was migrated to a Windows Azure web role, which provides a preconfigured instance of IIS. Additional complementary instances that are running the same code can be activated and deactivated to provide the web role with more or less system resources, as necessary. In the web role, the operations team can simply increase or decrease the number of web-role instances to meet user demand.
Designing for Intellectual-Property Protection and Security
Because P@M contained so much sensitive data, information security was another primary design goal. Windows Azure Connect and Windows Azure Service Bus were used to facilitate secure connections between Windows Azure components and their on-premises counterparts. Several areas of P@M, including communication with Windows Azure Service Bus endpoints, required secure sockets layer (SSL) communication using a trusted client certificate. MSIT ensured that a trusted certification authority (CA) issued all certificates and attached to a trusted root certificate.
MSIT also ensured that on-premises resources that Windows Azure accessed, which could have been exposed to untrusted networks, were hardened by using their corporate server-hardening procedures to prevent unauthorized access.
Designing for Configurability
The design of the foundation configuration service greatly increased the configurability of P@M. The team deployed the foundation configuration service, which contains most of the application's configuration data, into its own Windows Azure web role. This decoupled the configuration data from the application, and enabled a single cloud deployment package that was used across all application environments (development, test, staging, and production). Additionally, it enabled dynamic configuration updates through the foundation services user interface (UI), without requiring a redeployment or restart of the Windows Azure web roles
Designing for Network Performance
Network latency was an anticipated design factor for the new implementation of P@M. Hosting the P@M website and several key application components in Windows Azure meant a greater network distance between these components, and the on-premises data and services. The development team had to account for this latency in their design, and then refactor some of the application code to ensure that network performance service level agreements (SLAs) were being met. The Windows Azure Cache was used to improve performance in several areas. Windows Azure Service Bus was used to perform load balancing on connections between Windows Azure and on-premises components, and the team refined SQL Server stored procedures to decrease the amount of network traffic between application database components. First, the development team used it as a Session store. When the solution was on-premises, several key application components relied on the in-process Session store, which was not applicable in the hybrid solution. The team leveraged the Windows Azure Cache Session State Store Provider, and did not have to make any changes to existing Session store code.
Secondly, the solution caches most datasets returned from the Windows Azure Connect and Service Bus channels. This enabled Microsoft to meet performance benchmarks despite the increased latency incurred from the hybrid topology.
Solution Challenges and Design Refactoring
During the development and implementation phase, Microsoft IT encountered challenges that required modifications to the original design, specifically with respect to performance and security requirements. The development team had to alter their approach to the development process, and ensure that the hybrid solution would meet the needs of the user base.
Ensuring End-to-End Security
One of the earliest security-related concerns was an issue with authentication of the Windows Azure Web role to the on-premises SQL Server 2008 R2 database servers. MSIT best practices has long required Windows domain-based authentication for SQL Server access. However, connecting the web role to the SQL Servers by using Windows Azure Connect did not allow the use of domain-based authentication. Therefore, developers had to implement SQL Server-based authentication to connect to the database servers. Sent as plaintext by default, the connection strings containing the SQL Server-based credential were encrypted to ensure that important authentication information could not be compromised.
MSIT also ran black-box and code-assisted attack and penetration testing of the P@M application to ensure satisfactory data security for the sensitive data that P@M contains.
Implementing a Resilient Connection Between the Cloud and the Data Centers
Hybrid solutions that utilize Windows Azure always involve at least one uncontrollable variable: the Internet. Windows Azure resides in the cloud, which relies on the Internet for data transmission. MSIT provided a resilient and consistently available solution by utilizing redundant connections, with multiple providers, to data centers. This ensured adequate bandwidth availability.
In a hybrid system, with one cloud-based service talking to another, or talking to an on-premise component, errors will occur because of temporary infrastructure conditions or network issues. Typically, if the operation is retried a few milliseconds later, it will succeed. These issues are called transient faults, and a resilient cloud application must handle these gracefully. The P@M development team used the Enterprise Library Integration Pack for Windows Azure to implement a pattern where any call that crosses a service boundary was retried three times, if necessary. This was crucial to ensuring consistent availability for P@M.
Optimizing for Performance
Performance in the Windows Azure environment depends on two key factors: the availability of computing resources to the Windows Azure role, and the data access that the application requires. For P@M, almost all application data was stored on-premises in SQL Server databases. Frequent modifications to data meant increased usage of the potentially high latency connection over Windows Azure Connect and Windows Azure Service Bus. Therefore, MSIT mitigated this behavior by using Windows Azure Cache to reserve important application data in the Windows Azure environment for easy and fast access.
Windows Azure Connect provided a small performance problem when large amounts of data were exchanged between on-premises and cloud-based components.
Implementing Instrumentation for a Hybrid Solution
With part of the P@M architecture in Windows Azure and part still on-premises, MSIT was confronted with two different environments that supported a single application. This introduced a challenge: how to consistently and efficiently implement monitoring for both the Azure and on-premise components. MSIT needed to unify the monitoring and instrumentation information gathered from both environments into a consolidated solution.
The development and operations teams leveraged the capabilities of their existing Microsoft System Center 2012 - Operations Manager deployment to provide a single monitoring and instrumentation environment. Using the Windows Azure Management Pack for Ops Manager, and custom designed monitors, MSIT was able to provide a solution that provided the performance, availability, and security-monitoring information that those applications that are as critical as P@M require.
Mitigating Application Downtime and Implementing Disaster Recovery
Aside from the built-in high availability qualities of Windows Azure, MSIT implemented measures to ensure application availability and adequate disaster-recovery capabilities.
SQL Server provided database storage for all data, and data resiliency was provided by a Windows Server® 2008 R2 failover cluster hosting SQL Server. However, Windows Azure Connect does not support connections to SQL Server that are hosted in a failover cluster. Therefore, MSIT had to modify SQL Server to provide high availability for on-premises databases by using database mirroring instead of failover clustering.
Another aspect of resiliency and data protection that was implemented was a disaster-recovery procedure for any data housed in SQL Azure databases. While SQL Azure provides a robust environment to ensure consistent data availability, MSIT needed to retain data on-premises to satisfy SLA and regulatory requirements. Therefore, MSIT established a scheduled process that backed up SQL Azure databases to on-premises locations, in which the data was backed up to removable devices for archiving.
Final Solution Architecture
Figure 3. Final P@M architecture on Windows Azure
The final solution architecture used the following technologies:
- Windows Azure Web role
- Windows Azure Windows Azure Cache
- Windows Azure Windows Azure Service Bus
- Windows Azure Connect
- SQL Azure (used by foundation services)
- System Center 2012 - Operations Manager
- Windows Azure Management Pack for Ops Manager
- SQL Server 2008 R2
- Windows Server 2008 R2
- IIS 7.5
- ADFS 2.0
The final implementation of P@M, using Windows Azure, provided MSIT with numerous benefits, from an end-user and business perspective, and from a technical perspective.
Business and End-User Benefits
- Scalability. The P@M website, now based in a Windows Azure web role, can be scaled up or down, according to user demand and resource requirements. P@M now can acquire resources to handle peak usage times during mid-year and end-of-year usage bursts. It also can release those resources when they are not needed.
- Extensibility. The Windows Azure components of P@M benefit from Windows Azure's straightforward development and release process, making it easier and faster to update the application. Updates happen more often and with less effect on availability, which provided a significantly better end-user experience.
Technical Benefits and Lessons Learned
MSIT learned a number of important lessons that can be leveraged in future Windows Azure migrations and implementations.
While the Windows Azure development is quite similar to traditional Microsoft development environment, the development team also learned that there are significant differences for which it needed to account.
P@M's hybrid architecture required that developers consider aspects of the application that they previously had not considered, such as the efficiency of communication between Windows Azure and on-premises components. In an entire on-premises solution, this communication means little because of the bandwidth available. However, the connection between Windows Azure and on-premises components emphasizes the need for efficient coding and application communication.
Monitoring a Hybrid Solution
The development team established a number of reusable components that MSIT can use for monitoring with System Center Operations Manager in subsequent migrations. They used Operations Manager to implement a unified monitoring environment.
Building Resiliency into a Hybrid Solution with Azure
MSIT determined that several aspects of the Windows Azure environment required additional implementation to provide true application resiliency. SQL Server availability, connection retries, and connectivity between on-premises and Windows Azure all play a critical part in overall application resiliency.
Building Performance Management into a Hybrid Solution
MSIT also learned that adequate performance in a hybrid application is relatively simple to obtain, especially with Windows Azure components. However, performance of the entire application also depends on connectivity between Windows Azure and on-premises infrastructure, which can be unpredictable, since that is the nature of Internet connectivity.
MSIT was also able to obtain several best practices that can be applied to future Windows Azure migrations:
- Develop for Windows Azure from the beginning. Take advantage of cloud-based functionality as much as possible.
- Perform application testing in a live Windows Azure environment.
- Plan a performance testing methodology to ensure SLAs are met.
- Use System Center 2012 - Operations Manager for monitoring, in conjunction with the Windows Azure Management Pack.
- Store application-configuration data external to the deployment package, so that a single package can be deployed across all environments.
- Take advantage of Windows Azure Cache to mitigate the performance impact of increased hybrid latency.
- Use Windows Azure Service Bus to perform load balancing on connections between Windows Azure and on-premises components.
- When designing custom code, make it reusable for future implementations.
- Be aware of network bottlenecks through proxies or firewalls.
- Build retry logic into code to recover from transient faults.
- Evaluate database requirements against SQL Azure capabilities before establishing a migration scenario.
P@M provided an excellent opportunity for MSIT to implement a hybrid solution using the Windows Azure platform. Developers were able to leverage Windows Azure's capabilities in P@M to provide a more resilient and scalable solution, leading to increased availability and an improved end-user experience. They also were able to establish several best practices and develop reusable components that MSIT will leverage in future migrations to Windows Azure.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:
For more information on the Enterprise Library Integration Pack for Windows Azure, visit the following link: http://www.microsoft.com/en-us/download/details.aspx?id=28189
For more information on the Windows Azure Management Pack for Systems Center Operations Manager, visit the following link: http://www.microsoft.com/en-us/download/details.aspx?id=11324
© 2012 Microsoft Corporation. All rights reserved.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Azure, Windows, Windows Server, SQL Server, System Center, and Internet Information Services are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.