Microsoft Operations Manager 2005 Custom Management Pack Development at Microsoft
Note on IT
Published: December 24, 2004
The Microsoft Information Technology (Microsoft IT) group supports internal line of business (LOB) applications. To improve monitoring of these applications and to help reduce downtime, Microsoft develops custom management packs for the LOB applications and deploys them to Microsoft Operations Manager (MOM) servers inside Microsoft IT. This Note on IT explains the process that Microsoft IT uses and recommends to plan, develop, and deploy custom MOM management packs.
A Note on IT is a short, technically deep drill-down on a specific topic related to Microsoft IT and is usually associated with an existing How Microsoft Does IT document. A Note might illustrate how Microsoft IT performs a specific operational task step by step or configures a hardware device or software application. It might also relate details of a best practice or contain key information about Microsoft IT operations that is regularly requested by customers.
Enterprise MOM administrators, application developers, and application support personnel.
Microsoft IT uses Microsoft® Operations Manager (MOM) 2005 to monitor the health of its infrastructure and applications. Microsoft IT has about 400 critical internal LOB applications that are used by its various departments, such as Finance and Human Resources. To improve the health and reliability of these applications, Microsoft develops and deploys custom MOM management packs for each of these applications. This paper describes the process that Microsoft IT uses to plan, develop, and deploy MOM management packs for internal LOB applications.
This paper assumes that readers are MOM administrators, application developers, or support personnel who are familiar with MOM.
For a detailed discussion of how Microsoft IT uses MOM 2005 to manage its servers worldwide, see Microsoft Operations Manager 2005 Deployment white paper.
Note: For security reasons, the sample names of forests, domains, internal resources, organizations, and internally developed applications and files used in this document do not represent actual names used within Microsoft and are for illustration purposes only. In addition, the contents of this document describe how Microsoft IT runs its enterprise data center. The procedures and processes described in this document are not intended to be prescriptive guidance on how to run a generic data center and may not be supported by Microsoft Product Support Services.
A MOM application management pack is a package that contains rules and knowledge useful in monitoring and supporting a particular application with MOM. Management packs are used in three general layers of monitoring at Microsoft.
- The first layer is infrastructure, such as hardware, Dynamic Host Configuration Protocol (DHCP), and DNS.
- The second layer is platform services, such as Microsoft SQL Server™ and Internet Information Services (IIS).
- The third layer is individual applications, such as the LOB applications discussed in this paper.
Monitoring is generally implemented at the lowest level possible so that functionality required by many applications can be monitored in one place. There is some overlap between monitoring layers.
Microsoft develops management packs internally for its LOB applications. These management packs are developed by the support team that owns each application, because this team has the knowledge, support experience, and development skills relevant to that application. The management pack incorporates this knowledge into a set or rules for monitoring the application's health and responding to it. The monitoring team, which is responsible for the central MOM infrastructure servers, provides management pack development standards, guidance, hosting services, and change control for the support teams during the development process.
Microsoft IT has a large number of business unit IT groups. Each group is responsible for its own LOB applications. Because of the wide variety of applications supported, management pack naming conventions are very important. The naming convention must identify the following:
- The support group that created the management pack and has responsibility for the application it monitors
- The application that the management pack monitors
Regular naming conventions are also used for management pack components, including computer groups, attributes, and rules. This makes it much easier to administer the MOM environment.
The naming convention that Microsoft uses for MOM management packs is "Apps XXX – YYY" where XXX is the acronym of the group that supports the application, and YYY is the name of the application. Including "Apps" in the name identifies the management pack as an application management pack. For example, if the Financial IT group manages an application named Financial Data, the management pack for this application will be named "Apps FinanceGroup–FinanceDATA".
The management pack design process begins with the design phase of a new LOB application's development life cycle. However, management packs are also often designed for legacy applications.
During the design process, the support team gathers the following essential information:
- A comprehensive list of events that the application creates
- Locations and formats of log files created by the application
- A list of errors that the application may generate
- A list of performance counters relevant to application health
- A list of service dependencies for the application
- The root cause of major application health issues or errors and related corrective actions.
When developing management packs for legacy applications, it is not uncommon to discover that the support team has already built services or tools that test and record various aspects of application health. These tools are often used as sources of monitoring data for the management pack.
When developing an application management pack for a legacy application, the support team answers the following questions:
- Does the application have a legacy monitoring system in place?
- Can the existing monitoring system be used as a data source for the management pack?
- What is or is not effective about the legacy approach to monitoring?
- How and when will the migration between the existing monitoring system and MOM take place?
- How will the new management pack be evaluated against the legacy monitoring system or tool?
- How can the application be modified to provide the data that a management pack needs?
The next step in the design process for a management pack is to establish a baseline health model for the application. This model provides a basis for determining the health of the application at any given time. First, a number of health states for the application are determined. An application and components will typically have at least two health states, running and stopped. Additional health states may be included as needed.
The list of health states that are relevant to a particular application vary greatly from application to application. Application health states may, for example, include all or some of the following:
- Running – The application is healthy and is performing as expected
- Stopped – The application has stopped functioning completely
- Degraded – The application's performance has been degraded
- Critical – The application is running but in an extremely degraded state
- Error – The application has generated a significant error
- Warning – The application has generated an event that may indicate a serious problem
After establishing a baseline heath model, the support team identifies a set of parameters that define each health state. These parameters are based on information sources that MOM can monitor, such as performance counters, the Windows NT Event Log, and information gathered by custom scripts. For example, it may be determined that an application is in the degraded state when the % Processor Time performance counter rises above a particular threshold.
Finally, once the health model is complete, a root cause is identified for each health state, and the correct response to resolve this health issue is identified and described. For example, if an application can produce a critical error, the cause for the error is identified, along with the correct steps to take when the error occurs. This information will be used to formulate the responses that the management pack will take when various health states are encountered. This information can also be incorporated into the MOM knowledge base for the application.
Once the health model is complete, this information is used to design management pack rules. These rules monitor for the events or performance thresholds that indicate a particular health state and specify the response to take when that health state is encountered.
Microsoft IT makes extensive use of Windows events, performance counters, and MOM Internal Service Monitoring (ISM). These information providers give valuable information and are easily developed and deployed.
The support team designs the following types of rules:
- Event Rules – Event rules monitor event providers, such as the Windows
NT Event Log and the Application Log, and raise alerts to indicate health status.
Microsoft IT organizes event rules into several categories:
- Event Collection – Events are collected from the Windows NT Event Log for reporting purposes.
- Missing Events – MOM can also raise an alert if an expected event, such as the startup of a service, is not encountered.
- Even Consolidation – When multiple events indicate the same health state, they are consolidated into a single rule. Consolidation makes it easier to maintain and tune the management pack.
- Event Filtering – MOM 2005 is used to filter out events that are not required for the management pack.
- Event Suppression – Once an alert is raised by an event, it may be necessary to suppress alerts for that event for a period of time. Doing so avoids creating multiple alerts for a problem that has already been identified.
- Performance Rules – Performance rules monitor NT performance counters that are relevant to the application. Relevant counters often include counters that indicate general server health, such as CPU and memory utilization. When these counters rise above or drop below the specified threshold level, an alert is raised. These performance values can also be collected for reporting and performance analysis.
- Service Rules – Windows services are evaluated by MOM ISM. Whenever
a service state changes, MOM ISM raises an event. This behavior allows services
to be reliably monitored. A service may change state several times while it starts
up and moves to a running state. To avoid generating excessive alerts, Microsoft
IT does not monitor these state changes. Microsoft IT does, however, monitor any
state change that takes a service from a running state to any other state. To achieve
this, an event rule is created with the following parameters:
- Event ID = 21207
- Source = "Microsoft Operations Manager"
- OldState = Running
- ServiceName = Monitored service name where "monitored service name" is the name of a specific service to be monitored.
These rules are placed into a functional specification, along with all of the other components of the management pack. At Microsoft, an Excel spreadsheet is used for the functional specification. For a description of this spreadsheet, see the Appendix.
During the design process, the support team also determines a response to be taken for each rule. Each application rule defines a response to be carried out if the conditions of the rule are satisfied. Reponses can include generating an alert, sending a notification to support staff, collecting information for analysis, or running a script. For example, if the % Processor Time performance counter rises above 25 percent, the management pack may collect that information for reporting purposes. If the same counter later rises above 50 percent, it may send a notification to the support staff.
Often, some aspects of an application cannot be effectively monitored using the Windows NT Event Log and existing performance counters. In these cases, Microsoft IT uses custom scripts or managed code. These scripts may already be in use by the support team, or they may be created specifically for the management pack. For example, scripts may be used to get data from a SQL Server database; this information is not available through the Windows NT Event Log or any performance counter.
Custom scripts can be executed in response to a specified condition. A specific health issue may call for a script to be run to resolve the issue or to gather more information from the application before choosing a final course of action. For example, a script may be used to start critical services, such as anti-virus services that are stopped unexpectedly.
Microsoft IT takes advantage of MOM 2005 monitoring of remote connectivity. One way remote connectivity is monitored is with scripts or managed code deployed to a group of computers that act as simulated clients. These clients periodically perform a timed transaction against an application server and record the results. This technique allows MOM 2005 to monitor connectivity between applications and remote clients. For example, this technique may be used to verify connectivity over a slow WAN link. This technique is more complex than what is usually expected for LOB application management packs. It requires special attention to MOM agent deployment and a thorough knowledge of MOM actions. The remote connectivity technique used in developing management packs at Microsoft is similar to that used in the SQL Server 2000 Management Pack, available from Microsoft. Remote connectivity monitoring in the SQL Server 2000 Management Pack is shown in Figure 1.
If your browser does not support inline frames, click here to view on a separate page.
Figure 1Remote Connectivity Monitoring in the SQL Server 2000 Management Pack
Once the functional specification is complete, the development process can begin. Most of the management packs that are created for Microsoft IT LOB applications are relatively small, consisting of between 10 and 20 rules, although some management packs have been created with as many as 60 rules.
MOM management packs are developed through the MOM Administrator Console. The development process for management packs includes the following steps:
- Create a rules group – A processing rule group contains event and performance rules. Typically, a Microsoft LOB management pack will only have one processing rules group to contain all of the processing rules for that management pack.
- Create a computer group and attributes – A computer group is created for the management pack. The computer group will contain computers that run the application to be managed. These computers are identified by a computer attribute, which is defined as a registry key that indicates the application is running on that computer. Microsoft LOB management packs typically target only a single computer group.
- Add computers to the computer group – Computers are added to the computer group, either manually or by scanning for the relevant computer attributes. As servers are added, removed, or repurposed, MOM evaluates them and adjusts computer group membership accordingly.
- Create rules– Rules are created for the application as defined by the functional specification. Each rule responds to some event, alert, or performance threshold and activates an appropriate response.
Alert processing rules respond to alerts based on particular criteria of either the alerts themselves, or the rules that generated them. Although alert processing rules are typically used to send a notification response to a notification group, Microsoft IT instead uses the Notification Workflow Solution Accelerator. This solution accelerator is a product designed for routing e-mail notifications based on MOM alert criteria and other extended properties, such as computer group membership. In Microsoft IT, alerts are delivered to e-mail distribution lists that are co-owned by the support teams assigned to the servers. This mechanism provides greater flexibility for alert notification delivery. Only the support people assigned to the server with the error receive the notification.
Management packs also incorporate troubleshooting knowledge directly into the rules and their associated alerts through the Product Knowledge and Company Knowledge properties of the various processing rules within the management pack. Product knowledge is information that is relevant to the product that the management pack monitors. Company knowledge is information that is relevant to the company that is using the management pack, such as its internal support practices and details of its infrastructure. Ideally, the knowledge base should provide information about each event that the management pack monitors, including summarization of any issues, the likely root cause, and the steps required to resolve the issue. In the Company Knowledge field, Microsoft IT often provides a link to technical support documentation relevant to the problem located on the company intranet. This approach makes it much easier to ensure the accuracy of the technical guidance. Knowledge Base information is added under the properties of a processing rule group in the MOM Administrator Console.
When development is completed on the management pack, the management pack is then deployed into a testing environment. This deployment is done by exporting the management pack from the authoring environment and importing it into the test environment. All of the processing rules are visually checked and then tested by simulating the events or conditions that they monitor. The computer groups are verified to contain the expected server population. Once all of the processing rules have been verified, the management pack is exported again and is imported into the production environment.
Once the management pack has been deployed, there is a tuning and maintenance or stabilization period. During this period, the management pack is monitored as it functions in the production environment.
One task that is often required during this period is tuning performance thresholds. For example, if a management pack has a performance rule that sends a notification to the support team whenever the % Processor Time performance counter rises above 50 percent, and this happens every day at peak usage time, the support staff will receive more notifications than are useful to them. In this case, raising the threshold slightly may resolve the problem.
MOM reports are especially useful during the tuning and maintenance period. Microsoft IT uses the MOM Alert Tuning Solution Accelerator during this process to assess alert details and trends.
Microsoft IT found that effective planning and design are important to successful development of management packs for its LOB applications. The key steps in the management pack development process include:
- Developing a comprehensive health model for the application.
- Identifying correct responses to health issues.
- Codifying rules and other management pack components in a functional specification.
- Authoring the management pack.
- Testing and deployment.
- Post-deployment tuning.
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information through the World Wide Web, go to:
For any questions, comments, or suggestions about this document, or to obtain additional information about Microsoft IT Showcase, send e-mail to:
Microsoft IT uses an Excel spreadsheet for the functional specification for a management pack during the development process. This specification reflects the requirements of the environment at Microsoft, and will not match the needs of every organization or application.
The functional specification spreadsheet at Microsoft has five worksheets, one for each major component of the management packs created at Microsoft:
- Event Rules – This worksheet has one row for each event rule to be
included in the management pack. In many cases, these fields map directly to Windows
NT Event Log fields. The following information is recorded for each event rule:
- Name – The friendly name for this rule
- Enabled – True if this rule will be enabled
- Provider Type – The type of provider for this rule (for example, application)
- Provider Name – The name of the provider that generates the event for this rule (for example, Windows NT Event Log)
- Source – The application or service that generates the event
- Event ID – The ID number of this event
- Event Type – The type of event (for example, as recorded in the Windows NT Event Log)
- Event Text Criteria – The criteria that will trigger this event
- Response – A description of the response to be taken ,(for example, run a particular script, execute a command, update a state variable, or transfer a file)
- Performance Rules – This worksheet has one row for each performance
rule to be included in the management pack. The following information is recorded
for each performance rule:
- Name – The friendly name for this rule
- Enabled – True if this rule will be enabled
- Type – The type of performance monitor (for example, processor)
- Provider Type – The type of provider for this rule (for example, .Windows NT Performance Counter)
- Provider Name – The name of the provider that generates the event for this rule (for example, %CPU Utilization)
- Response –The response that will be taken (for example, collect for analysis)
- Attributes – Attributes are characteristics that can be used to identify
computers that should belong to a computer group. The following information is recorded
for each attribute:
- Name – The friendly name of the attribute
- Type – The type of attribute, (for example, registry key)
- Enabled – True if this attribute will be enabled
- Description – A short description of the attribute
- Computer Groups – Computer groups are collections of computers to be
managed by a management pack. Management packs for Microsoft LOB applications usually
only contain a single computer group. The following information is recorded for
each computer group:
- Name – The friendly name of this computer group
- Rules Enabled – A list of all event and performance rules that will be enabled for this computer group
- Contains Subgroups – True if this computer group will contain any subgroups
- Description – A short description of this group
- Expression – A description of the combination of attributes that defines this computer group
- Scripts – This worksheet contains information about all of the custom
scripts that will be included in the management pack. The following information
is recorded for each script:
- Name – The name of this script
- Description – A brief description of the script, including its purpose and use
- Reports – This worksheet contains information about any reports that
will be included in the management pack. The following information is recorded for
- Name – The name of this report
- Description – A brief description of the report
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Microsoft grants you the right to reproduce this White Paper, in whole or in part, specifically and solely for the purpose of personal education.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.
© 2004 Microsoft Corporation. All rights reserved.
This Note on IT is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Windows, Windows NT and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.