Export (0) Print
Expand All
7 out of 12 rated this helpful - Rate this topic

Monitoring Business Application and Platform Health with Windows Phone and the Cloud

Technical Case Study

Published: February 2012

Learn how Microsoft Information Technology (Microsoft IT) developed a Windows Phone app that provides near real-time status information and alerts by receiving Microsoft System Center Operations Manager 2007 R2 data through the cloud. This new mobile application enables IT administrators, managers, and executives to monitor business-critical platform and application status anywhere they have an Internet connection.

Download

Download Technical Case Study, 1 MB, Microsoft Word file

Situation

Solution

Benefits

Products & Technologies

Microsoft IT needed a solution that could send business application and platform health information in near real time to IT administrators, managers, and executives. They wanted a system that could send updates to mobile devices.

Leveraging their System Center Operations Manager 2007 R2 servers that were monitoring business-critical systems, Microsoft IT developed a solution that incorporated a middle tier based on Windows Azure™ to push the Operations Manager data out to a monitoring application that runs on Windows Phone.

  • Improved visibility into business-critical system status: The Windows Phone app provides senior management with at-a-glance status information in near real time.
  • Enhanced IT productivity: This automated solution relieves operations personnel from what used to be manual tasks, freeing them to perform more strategic work.
  • Improved Time To Resolution: Using the mobile application that provides near real-time alerts, Microsoft IT has met their Time To Resolution (TTR) service level agreements consistently.
  • Reduced number of outages: Implementation of the monitoring application has helped Microsoft IT respond proactively, reducing the annual amount of major system incidents from 130 percent down to 21 percent.
  • System Center Operations Manager 2007 R2
  • Windows Phone
  • Microsoft SQL Server® 2008 SP2
  • Windows Azure

Situation

With so much of today's business processes being driven by technology, companies not only need to invest significant resources in order to implement systems such as business intelligence (BI) and customer relationship management (CRM), but they also need to ensure that these business-critical systems are maintained properly. Any unexpected downtime in such an environment can quickly impact the bottom line.

As the group responsible for maintaining Microsoft's corporate network and infrastructure, Microsoft Information Technology (Microsoft IT) is tasked with monitoring all critical events in the company's production systems, including the Enterprise Commerce IT (ECIT) group's platforms and applications. The primary tool that Microsoft IT had been using to monitor these systems was Microsoft System Center Operations Manager 2007 R2.

Although System Center Operations Manager 2007 R2 supports enterprise-scale operations monitoring and has robust alert capabilities, it does not provide out-of-the-box categorization of information based on business processes or applications. Instead of receiving performance-specific issues for their business processes, managers receive all types of Operations Manager alerts, potentially flooding their mailboxes with unwanted data.

Another challenge with Microsoft IT's Operations Manager 2007 R2-based monitoring environment was the reliance on desktop systems. While Operations Manager 2007 R2 offers powerful system consoles, they only run on Windows servers or desktops. Although people could receive emails on their smartphones, they had to decipher cryptic alert messages. There was no way for support engineers to monitor server status easily or for management to view the overall business processes while away from their desks.

Microsoft IT needed a tool that could display system availability and performance impact to selected platforms and applications. Microsoft IT wanted to create a tool that offered a dynamic display so that support teams would no longer need to depend solely on emails for Operations Manager monitoring. The new tool would offer an intuitive interface that would provide a roll-up view for the health of a business capability. Finally, by focusing on the development of a push-based application that could run on mobile devices, Microsoft IT would be able to enable personnel to monitor system status and receive near real-time alerts wherever they were.

Solution

Microsoft IT decided to design a new solution that could receive data from their existing Operations Manager 2007 R2 servers that were monitoring business-critical processes. The new solution would pull the relevant Operations Manager data from the on-premises environment, push the information into the cloud, and make it available for Windows Phone to consume the data in near real-time and present it in an intuitive manner.

Implementation

This section describes the development and deployment activities involved in Microsoft IT's implementation of their new mobile platform health monitoring solution.

System Development

As previously noted, Microsoft enhanced their existing Operations Manager reporting capabilities by developing a mobile health monitor for business-critical systems. The key enhancements were:

  • Back end: Microsoft IT developed new SQL Server APIs that expose the data from their existing on-premises Operations Manager 2007 R2 environment. Microsoft IT also designed a data synchronization service that uses the APIs to extract the data and push it into the middle tier (described below).

  • Middle tier: Microsoft IT designed a cloud-based middle-tier component that uses Windows Azure to receive the appropriate Operations Manager monitoring data that comes from the data synchronization service and pushes it out to mobile devices. This middle tier uses the Windows Azure Web role to expose WCF services that mobile devices can consume on demand. The Windows Azure Worker role pushes the updates to the Windows Phone app, which displays a count of them in the app's Live Tile.

  • Windows Phone app: Finally, Microsoft IT developed a new Windows Phone app called ECIT Monitor that receives the Operations Manager system status information in near real time from the middle tier over a secure network connection.

    Figure 1 displays the topology of the new solution and its components.

    Figure 1: A topology view of the new application and platform monitoring solution

    Figure 1: A topology view of the new application and platform monitoring solution

Deployment

The deployment of the solution included an initial test environment and then later was rolled out as a production service. As part of the deployment process, Microsoft IT:

  1. Created an initial test environment that comprised:

    1. Four virtual machines (VMs), each running in its own Operations Manager 2007 R2 Root Management Server (RMS)

    2. Four physical servers running SQL Server 2008 SP1

    3. Windows Azure Sandbox subscription for testing the middle tier

  2. Worked with their leadership team to identify a set of people who might be interested in running a beta test of the new monitoring solution. They also developed a communication plan that described the new technology, discussed the beta program, and provided information about installation, accessing updates, and how to contact support for help.

  3. Confirmed that the solution functioned properly in the test environment before moving to a production rollout. Implementing the following infrastructure changes to support a production-scale environment, Microsoft IT:

    1. Leveraged their existing Operations Manager RMS servers; no new hardware was required for this component.

    2. Deployed the SQL Server APIs to the Operations Manager operations data warehouse.

    3. Converted the Windows Azure sandbox subscription to a production Windows Azure cloud subscription, adding multiple instances of the web and worker roles for redundancy.

Results

  • A total of 25 people participated in the test phase, including 24 participants in the company's Redmond domain and one participant based in India.

  • As of February 2012, the production version of the mobile application has been deployed to 120 users worldwide, monitoring 91 applications within the Enterprise Commerce IT (ECIT) group.

  • Microsoft IT's Time To Resolution (TTR) has improved for addressing monitored application or platform issues: Since implementation of the new solution, Microsoft IT has met their TTR SLAs consistently.

  • In addition, the implementation of the new monitoring application has helped Microsoft IT reduce the percent of major system incidents from 130 percent down to 21 percent annually (seeFigure 2).

    Figure 2: As of February 2012, major system incidents has dropped to less than 25 percent annually since the rollout of ECIT Monitor.

    *FY2012 values as of February 2012. Microsoft fiscal years begin July 1

    Figure 2: As of February 2012, major system incidents has dropped to less than 25 percent annually since the rollout of ECIT Monitor.

The New End User Experience

In addition to the previously listed results, the new mobile ECIT Monitor application has significantly improved how IT administrators, managers, and executives monitor their business-critical systems. ECIT Monitor currently offers near real-time health monitoring for the following business processes:

  • Channel compensation

  • Reporting and business intelligence

  • Fulfillment

  • Entitlements

  • Ordering

  • Agreements

  • Quotations

  • Pricing

The following figures display some screens in the application's intuitive graphical user interface that enable users to monitor system status and identify any health issue in near real time.

Figure 3 displays three separate screens:

  • ECIT Monitor launch screen

  • Biz Process: Displays an at-a-glance, overall status indicator for entire business processes, including Entitlements, Ordering, and Agreement. The colored indicator next to the business process name indicates whether the entire process is functioning properly (green), has a performance degradation (yellow), or has a server down or critical service outage (red). Managers can also see the status of individual applications within a process; in this case, the Partner Portal application within the Entitlements process is flagged with a server down.

  • SQL Performance: Displays database performance information. Each database is marked as operating within normal performance parameters (green), running less than optimally (yellow), or having significant performance degradation (red).

    Figure 3: The ECIT Monitor offers executives and managers at-a-glance views of entire business processes, performance information, and more.

    Figure 3: The ECIT Monitor offers executives and managers at-a-glance views of entire business processes, performance information, and more.

Figure 4 displays three separate screens:

  • Applications: Displays application status, using the same green, yellow, and red icons for each application as described in previous screens. Yellow icons are used in clusters where a single node is down, but the cluster is still functioning. The Application page also provides a link to another screen where IT administrators can review the system (or cluster of systems) that host the application. In this case, the Partner Portal application has an issue as indicated by the yellow icon and flag. Clicking the cluster icon at the right of the Partner Portal entry navigates the user to the Infrastructure screen (see below).

  • Infrastructure: Lists all the servers involved in running an application. The green, yellow, and red icons help IT administrators quickly identify which server in a cluster is generating errors. In this case, the server XXX.w.006 shows a red "x" icon, indicating a problem. Clicking on the server icon navigates the user to the Error Detail screen (see below).

  • Error Detail: Displays server status details. In this case, server XXX.w006 -- a web server for the Partner Portal application, failed to send a heartbeat.

    Figure 4: The ECIT Monitor offers IT administrators topology screens to identify problems with individual servers in a cluster and also provides description screens for more detail.

    Figure 4: The ECIT Monitor offers IT administrators topology screens to identify problems with individual servers in a cluster and also provides description screens for more detail.

Best Practices

In the course of developing and deploying their new business process and application health monitoring solution, Microsoft IT followed these best practices:

  • Promote internal collaboration among all teams involved. Due to the number of different teams in your organization that may need to be involved—including business owners, Security, Compliance, Legal, and those who provide infrastructure—it is important to ensure that all stakeholders can provide input at an early stage and work together to design a system that fulfills all key criteria.

  • Consider the best means to implement your middle tier. Regardless of what technology you employ, you should choose a scalable platform that interoperates with your existing infrastructure. Microsoft IT utilized Windows Azure due to its scalability, seamless switching between test and production environments, tight integration with ADFS, and its integration with on-premises services using Windows Azure Connect.

  • Determine the appropriate set of systems, applications, and platforms you need to monitor. Start by reviewing your systems and identify which are business-critical, plus what Line of Business (LOB) applications you have that support the business capabilities. Then identify all servers and application components that can affect LOB application availability or performance. With this information, you can create Operations Manager management packs to monitor the identified server infrastructure and application components.

  • Develop an appropriate Operations Manager schema and stored procedure. Although Operations Manager 2007 R2 utilizes a very rich schema for identifying and grouping servers based on roles, it does not provide a simple means to map servers to applications and business processes. In order to address this, Microsoft IT added a small schema that defines the appropriate groups that are displayed within the mobile application and includes Servers, Applications, Business Process, and associative (mapping) tables. Microsoft IT then created a stored procedure to pull the data from Operations Manager and filter it by business processes and application.

Benefits

By developing a new business process and application health monitoring solution that leveraged data from their existing Operations Manager infrastructure, Microsoft IT derived a number of benefits:

  • Improved Time To Resolution: By automating what used to be manual processes and pushing status updates to mobile devices, the new system has enabled Microsoft IT to respond to issues more quickly. ECIT Monitor has helped Microsoft IT meet their Time To Resolution (TTR) service level agreements consistently.

  • Reduced number of outages: Implementation of the monitoring application has helped Microsoft IT operate proactively and reduce the number of major incidents. Since implementing the new system, the percentage of major incidents per number of monitored systems has dropped from 130 percent in fiscal year 2010 to a current rate of 21 percent.

  • Improved visibility into business-critical system status: By using the new Windows Phone app, senior management can obtain high-level business process status information in near real time. The intuitive interface enables executives to view status by a supported business process (such as fulfillment) without needing to be familiar with an Operations Manager alert, servers, or applications.

  • Enhanced IT productivity: Automated alert and report delivery relieves operations personnel from emailing update messages to upper management manually, freeing them to perform more strategic tasks.

  • Enhanced Management productivity: The system's ability to filter alerts pushes the relevant information to mobile devices, and its ability to display critical information in a single window frees managers from the laborious process of wading through many different emails and spending time to interpret overall system status.

  • Intelligent, multi-status alerts: Previous to this solution, Microsoft IT would only recognize red/green (break/fix) status of the overall system. They had no details concerning the system topology. In contrast, the new solution offers granular, server-level insights into system health, resulting in the system's ability to display yellow or red alerts on a single server in a cluster while the overall system is green.

Conclusion

Microsoft IT enhanced their existing Operations Manager 2007 R2-based reporting capabilities by developing a mobile health monitor for their business-critical systems in the Enterprise Commerce IT (ECIT) group. The new solution incorporates three primary components: the backend, which leverages their Operations Manager 2007 R2 servers with new SQL Server APIs to expose the data; a Windows Azure-based middle tier to push the data out to the cloud; and a new Windows Phone platform-based application called ECIT Monitor that receives the information from the cloud and displays it in an intuitive interface.

As of February 2012, the mobile application has been deployed to more than 120 IT administrators, managers, and executives worldwide. ECIT Monitor's ability to receive data in near real-time and notify personnel even when they are away from their desks has improved Microsoft IT's average Time To Resolution (TTR), helping them to consistently meet their TTR service level agreements. Moreover, this new system has enabled Microsoft IT to respond proactively to issues, thereby reducing the percentage of major incidents per number of monitored systems from 130 percent in fiscal year 2010 to a current rate of 21 percent.

Finally, ECIT Monitor has become a key tool that the leadership team uses to gain better visibility into the overall health of their business-critical systems. The ECIT Monitor provides at-a-glance status information for an entire business process (such as fulfillment), freeing executives from wading through multiple emails that contain cryptic alert messages.

Microsoft IT expects to see continued worldwide adoption of the ECIT Monitor, estimating the number of users to reach 300 by the end of 2012. Additional system enhancements are also planned for future releases, including graphically displaying incident management metrics and trends, and new drill-down screens that will provide more detailed information about a selected platform or application.

More Information

For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:

http://www.microsoft.com /

http://www.microsoft.com/technet/itshowcase/

© 2012 Microsoft Corporation. All rights reserved.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SQL Server, Windows, and Windows Azure are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.

Did you find this helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft. All rights reserved.