Systems Managament

Five Solution Accelerators to Lend MOM a Helping Hand

Steve Rachui

 

At a Glance:

  • A hands-on guide to five solution accelerators for MOM
  • Integrate enterprise systems with MOM connectors
  • Customize notifications
  • Test and implement the management packs

MOM 2005

Windows Server System

SQL Server

Microsoft Operations Manager (MOM) 2005 is a scalable and flexible enterprise management platform designed to help simplify the job of managing the IT infrastructure. Suitable

for both large and small networks, MOM 2005 combines event management and flexible scripting with powerful notification, reporting, and data warehousing features. You can add management packs that provide specialized rules and data views for Microsoft® Exchange Server or SQL Server™), while connectors allow you to share data and integrate management tasks among different MOM environments and third-party management systems.

All this flexibility is great, but you don’t want to get bogged down in configuring and managing a system that was originally intended to simplify things. This is where solution accelerators enter the picture.

A solution accelerator is a collection of documentation and samples that shows how other people have implemented a particular management pack, add-in, or other such solution. The solutions accelerators are intended to give you guidance and examples that help you configure a particular solution more easily. For example, the Notification Workflow Solution Accelerator will help you to extend notification functionalities in MOM.

In this article, I’ll examine five useful solution accelerators for MOM:

  • Multiple Management Group Rollup Solution Accelerator (MMGR)
  • Notification Workflow Solution Accelerator (NW)
  • Service Continuity Solution Accelerator (SCSA)
  • Autoticketing Solution Accelerator (ASA)
  • Alert Tuning Solution Accelerator (ATSA)

The accelerators discussed in this article and full documentation for each can be found at MOM 2005 Solutions.

Multiple Management Group Rollup Solution Accelerator

First, let’s look at the Multiple Management Group Rollup Solution Accelerator. As you will see demonstrated here, MMGR provides guidance for consolidating data from multiple MOM management groups into a single data warehouse.

A MOM implementation requires that there be at least one management group comprising at least one management server and an instance of the MOM database. A management group contains a distinct rule set and agent population. MOM agents, which reside on the systems being managed by MOM, report their system data back to their assigned management group (or groups). Agents can belong to a single management group or be managed by multiple management groups. Management groups, meanwhile, can be organized by concepts such as geography, function, or load balancing.

In many environments, MOM deployments consist of multiple management groups. Data from each is important for complete enterprise monitoring and can be retained on a long-term basis using the built-in data warehousing solution in MOM. Over time, this data can provide valuable information. MOM can generate reports based on this data warehouse, offering valuable insight such as long-term trend information.

The default configuration for MOM 2005 is one data warehouse for each management group. While this approach is sufficient in many environments, it doesn’t allow for consolidation of data from multiple management groups into a single warehouse for unified enterprise reporting and storage—a feature of particular interest for many organizations.

The traditional data warehouse and reporting solution in MOM 2005 is shown in Figure 1. It requires SQL Server installations for the OnePoint (MOM database), System Center (MOM Reporting), and SQL Server Reporting Services databases. You can configure a single SQL Server instance to host all three databases, but this may degrade performance.

During installation of MOM reporting, a Data Transformation Services (DTS) package is installed as a scheduled task. This is used for moving data from the OnePoint database to the data warehouse. The DTS package runs once per day by default. Command-line switches define parameters for the DTS package, including the location and name of the source database (OnePoint) and the location and name of the destination database (SystemCenterReporting).

The MOM Reporting installation adds information about the data warehouse to the Reporting Services database. Reports are accessed using the Report Manager interface of Reporting Services.

The MMGR provides guidance for consolidating data from multiple MOM management groups into a single data warehouse. If an organization doesn’t consolidate its data warehouses, the infrastructure would need to support the configuration described in Figure 1 for each of those management groups.

Figure 1 Data Transfer from a Single Management Group

Figure 1** Data Transfer from a Single Management Group **

Figure 2 shows multiple management groups configured to use a single data warehouse server. With consolidated data warehouses, the management group chosen to host the data warehouse will be termed the destination management group. The DTS package on the destination management group is modified so that it not only moves its own data to the data warehouse but also runs against the OnePoint database in each source management group and moves that data to the data warehouse as well. Only a single instance of the DTS job is needed and should be located on the servers in the destination management group.

Figure 2 Data Transfer from Management Groups

Figure 2** Data Transfer from Management Groups **

This DTS instance moves data from all management groups, one at a time, to the data warehouse. This is accomplished by editing the scheduled task on the management group that hosts the data warehouse and creating multiple tasks to run the DTS package, with one task for each management group. The command-line options for each run of the DTS package must be modified to target the databases in each source management group. You can also create a script to call the DTS job to run against each management group. Whichever way you choose, it is important that the DTS job runs do not overlap.

When the DTS job transfers data from the OnePoint database to the data warehouse, an event is logged on the server running the DTS job. This event records the success or failure of the DTS run and the location of any problems that were encountered.

MOM data warehouses can grow to be quite large. In any scenario, it is important to ensure data warehouse grooming is running as expected. A server hosting data from multiple management groups in a single data warehouse will be potentially much larger. Obviously, the data warehouse will need sufficient free disk space, and it will be even more important to ensure data warehouse grooming is running properly on a consolidated warehouse. In addition, ensuring the server is adequately powered to handle the load of multiple DTS runs and any reporting tasks is crucial. The MMGR Solution Accelerator can help you quickly configure your MOM servers your data warehouses to run as expected.

Notification Workflow Solution Accelerator

Notifications are an essential part of today’s enterprise monitoring solutions. When an event occurs that requires immediate attention (such as a power outage or network failure), the responsible administrators must be notified to limit the impact of the situation. MOM 2005 includes ample notification services. In some cases, however, when more granular control is required, the configuration can prove to be more challenging. The Notifications Workflow Solution Accelerator includes guidance, as well as the necessary add-on for MOM 2005. The add-on, which is based on SQL Server Notification Services (SSNS) with a Web front end, provides more sophisticated and targeted enterprise notifications.

Consider an enterprise that has deployed MOM 2005 to monitor Exchange, Active Directory®, Systems Management Server, and some other services. This enterprise requires alerts to be given to just the owners of particular services when there are problems—when Exchange needs attention, only the Exchange administrators are informed. Now assume that the person or persons responsible for this server changes based on the time of day. Using the Notification Workflow add-on (see Figure 3), you can configure a single user to be notified only for a single server running a particular set of rules during a specific time of day.

Figure 3 Notification Workflow User Interface

Figure 3** Notification Workflow User Interface **

Notification Workflow is quite flexible and allows you to set up a variety of detailed scenarios. You can, for instance, configure users who should be notified when any problem happens on a group of computers for a rule set or you can specify administrators who should be notified regardless of where a problem occurs. If a notification sent to a server owner is not addressed within a certain amount of time, the Notification Workflow can also escalate the notification to another group of administrators. This ability was not included in the initial release, but has been added to the next release. The new release also adds features that make the Notification Workflow add-on more flexible and easier to map to varied MOM environments. The new release with these capabilities was not publicly available at the time of writing this article, but it is expected to be available by the time this article is published.

Autoticketing Solution Accelerator

In MOM 2005, all information regarding monitoring status, notifications, reporting, and more are retained within the database for the management group. It is often helpful to join multiple management groups into a hierarchical structure for centralized monitoring and reporting. As mentioned earlier, the top-level management group is known as the destination group, while the child management groups are known as source groups. Management groups must be linked to facilitate the flow of information between different systems. This is made possible by using methods available in the MCF and SDK.

Figure 4 Configuration Management Database

Figure 4** Configuration Management Database **

Linking of MOM 2005 management groups is done with the MOM-to-MOM Product Connector. Using MCF methods, this connector allows information to flow between different management groups. The connector may be configured to detect if an alert is triggered on a source management group and move the alert forward to the destination management group—ultimately, the alert will exist in both databases. If an administrator at the source management group addresses the problem and flags the alert as resolved, the MCF will update the information for this alert in the destination management group. In the same way, if an administrator in the destination management group resolves the alert, this change will be reflected in the source management group.

This information sharing is useful in many different scenarios: integrating MOM into environments with third-party monitoring systems, forwarding the information generated by MOM 2005 to helpdesk systems to facilitate automatic generation of trouble tickets, and so on. There are multiple third-party connectors available for such connectivity.

Figure 5 Workflow

Figure 5** Workflow **

In some cases, however, a connector either is not available for a third-party system or a custom connector solution is needed. For developing a connection between MOM and third-party helpdesk ticket management systems, Microsoft offers the Autoticketing Solution Accelerator. Because the connectivity requirements will differ among systems, guidance provided by ATSA is fairly general. Some specific examples are given for a solution using Siebel Enterprise Application Integration, but the topics may apply in many third-party scenarios.

Development of an autoticketing solution is challenging for many reasons.

Data Sufficiency MOM 2005 is not a ticketing system so it can be difficult to mine and share the info needed to create a complete ticket.

Data Transfer Enablement While the SDK provides full documentation of the methods provided by the MCF, development of a connector is not a trivial task.

Data Transfer Management Managing and synchronizing communications among different systems is a tough job. Connectors can enable either one-way forwarding or two-way communication. The former is easier. The architecture for building a connector is shown in Figure 4.

As you can see in Figure 5, the typical autoticketing workflow is somewhat complex. ATSA provides significant insight into the skill sets and information needed to build a custom solution to enable the sharing of info between a ticketing system and MOM.

Alert Tuning Solution Accelerator

One of the key features of MOM 2005 is the array of available management packs. These are collections of rules, reports, data views, and more that describe the management requirements for particular technologies. Management packs are available from Microsoft as well as from third-party vendors and include solutions for both software and hardware management. Management packs that are provided by Microsoft have been written by the teams responsible for development of the specific technology. The catalog of available management packs and connectors for MOM 2005 can be found at Management Pack and Utilities Catalog.

When a management pack is installed, some rules are enabled and some disabled by default. The default configuration of each pack is designed to be immediately useful in most environments. However, since networks differ greatly in design and topology, the default configuration may not be best for your environment.

In addition, some rules in a management pack require environment-specific configuration to be effective. Examples include the Active Directory management pack, where script timings—among other things—must be adjusted to match the expected Active Directory replication latency of an environment. The Exchange management pack also requires similar configuration.

Because management packs contain many rules and often require tuning and adjustment to function optimally it is not uncommon to notice a large number of warnings and errors alerts or events indicating problems when you first install a management pack. This is especially true if you have installed several packs simultaneously. In this situation there may not be an actual problem, despite the warnings. This situation is commonly called an alert storm.

The Alert Tuning Solution Accelerator provides specific guidance for evaluating, testing, and deploying a management pack into a production environment while ensuring that the rules enabled and the configurations specified are relevant to the environment and will not cause false alarms.

Figure 6 details the proposed testing stages for each management pack. In general this approach includes the following steps.

Figure 6 Alert Tuning Workflow

Figure 6** Alert Tuning Workflow **

Evaluate the Management Pack Review the rules included with the management pack to determine which are appropriate for the environment and how they should be implemented.

Stress Testing the Management Pack Deploy the management pack in an isolated test environment and deliver more stress than would be expected in production. One example of this would be to force management pack scripts to run every minute instead of the more realistic schedule chosen for the real rollout.

Implement a Small-Scale Production Pilot Identify select production machines as candidates for management pack testing and join them to the pilot management group. These machines will then run the rules associated with MOM production servers and also those from the pilot group.

Deploy the Refined Management Pack Keep in mind that the total test time from initial evaluation to implementation in the production environment will vary based on the management pack under consideration.

Service Continuity Solution Accelerator

IT environments require reliable, fault-tolerant system monitoring. Assuring that effective monitoring continues in the event of localized outage is increasingly challenging as systems become more geographically dispersed. Fortunately, MOM 2005 is built to ensure fault tolerance and reliability. And the Service Continuity Solution Accelerator extends the inherent capabilities of MOM 2005, presenting a strategy to address fault tolerance with MOM 2005 both within a homogenous monitoring environment and across disparate locations.

MOM 2005 addresses automatic failover and monitoring continuity through the following features:

  • Ability to design multiple management groups
  • Ability to configure multiple management servers per management group
  • Ability to install the MOM database on a clustered server
  • Ability to associate MOM agents with multiple management groups

Management groups create an infrastructure that can be deployed based on geography, monitoring targets, business units, and so on. Management groups consist of management servers and a database server for the group. You can install up to 10 management servers in a single management group. Agents that are a member of the management group are able to communicate with any management server in that group. If the assigned management server goes offline, the agent automatically fails over to one of the remaining management servers with no interruption in monitoring.

Management groups are designed to be linked together using the Microsoft Connector Framework (MCF) and the MOM-to-MOM Product Connector for a centralized view into the management infrastructure. As mentioned, MOM agents can be associated with one or more management groups—this is often done to address fault tolerance needs, but can also be used to have the agent report to multiple management groups based on function.

Only a single MOM database is permitted per management group. Because the database is the configuration repository for the management group, it is possible to install this database on a clustered SQL Server, which allows for failover and ensures database availability.

The SCSA extends service monitoring to include geographic failover. As an example, the SCSA presents a monitoring scenario with four geographically separated sites—one site dedicated as a hot-failover site. In this configuration it is possible to provide continuous monitoring even when operations at a particular site are disrupted. The monitoring load can be shifted from normal production machines to the failover machines located at the failover site.

The SCSA discusses this scenario and its configuration in detail using the sample infrastructure presented in Figure 7. Of course, this functionality can be applied to an environment with as few as two sites. The four- site scenario is simply an example.

Figure 7 Four-Site Sample Configuration

Figure 7** Four-Site Sample Configuration **

General requirements to configure such an environment include the following:

  • Management servers from each management group that are located both at the primary production site and at the hot-failover site
  • Two database servers for each management group, one located at the local production site and one located at the hot-failover site
  • Database synchronization between the production and hot-failover servers

The SCSA includes a DTS package that is responsible for scheduled, routine synchronization of data between the production MOM database and the hot standby MOM database server. The SCSA also includes tools for removing database constraints and creating jobs on the standby MOM database server.

The standby database server should be a complete replica of the production MOM server. Should a failover occur, manual intervention is required to facilitate the failover and to bring the standby server to a state ready for production. The process involves adding constraints and triggers back to the standby database, cleaning up the standby database to remove any duplicates from data replication, and adjusting management servers to use the standby database server.

Agent failover to use the backup management servers will take place automatically. Since agents maintain data in local queue files, no data should be lost during failover. Instead, the data will accumulate on the agents pending resumption of database activities.

Of course, it is not possible to test all possible configurations of the SCSA. It is crucial that testing be done in each anticipated configuration, which should include a test failover to the hot-failover site to verify service availability.

In addition, the SCSA has not been tested to address continuity of MOM with other applications, such as third-party connectors and extensions or customization through MCF or the MOM SDK.

Steve Rachui is a Manageability Support Escalation Engineer in the Product Support Services group at Microsoft. He has supported MOM since its introduction. Steve can be reached at steverac@microsoft.com.

© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.