Integrating MOM into Your Existing Infrastructure
At a Glance:
- Plan a highly available MOM 2005 deployment
- Integrate MOM with existing systems
- Configure escalation for MOM notifications
As a central administrator, you want a single point of view into your whole IT infrastructure and you need it to be available at all times. A solid IT monitoring solution must be highly available and it must be able to
integrate with existing systems. With Microsoft® Operations Manager (MOM) 2005, you can achieve a high degree of availability and integrity of the monitoring solution. In this article, I will detail some of the methods you can use to achieve these two requirements.
For a highly available deployment of MOM 2005, you must ensure that each and every component of MOM is in some way fault tolerant. You may have read about the Service Continuity Solution Accelerator for MOM in the article "Five Solution Accelerators to Lend MOM a Helping Hand". This is a valuable tool, but it deals primarily with disaster recovery and multiple management groups. In this article, I focus on how to provide high availability for a single management group. Figure 1 illustrates the architecture I'll describe.
Figure 1 MOM High-Availability Architecture
Deploying MOM Databases
At the heart of all MOM functionality is the operational database called OnePoint. This is where all data is written to and read from. If this database crashes, all MOM functionality becomes unavailable and results in the following problems:
- Agent data from management servers is not written to the database.
- Data is not transmitted to and from applications integrated with MOM.
- MOM consoles cannot display content.
The first point is the most critical. When the OnePoint database is down, new alerts can't be generated. If you rely on notifications—e-mail or SMS alerts, for example—you may not even notice the database is down since alert rules will not run (they are, in fact, plain SQL triggers).
If the database is unavailable, the management server will store incoming agent data (including alerts, events, and performance counter values) on the local hard disk by default, up to 30MB. And when the database is back up and running, the locally cached data will be sent to the database. But this is by no means fault tolerance. For high availability of MOM databases, you should install them on a SQL cluster.
There has always been some confusion around MOM 2005 Service Pack 1 (SP1) and support for clustering databases. The operational (OnePoint) database is supported on active/passive and active/active SQL clusters. With the second option, you need to use the momcreatedb.exe tool to install the database. The historical reporting database (SystemCenterReporting) is only supported on active/passive SQL clusters. With the active/passive option, there can only be one instance of a MOM database on the SQL cluster. Therefore, you can have more than one operational database on a single SQL cluster (simply by installing them in different instances), whereas you can have only one reporting database on a SQL cluster. For step-by-step instructions for installing MOM databases on clusters, see the Deployment Guide.
Hardware for clustering solutions is much more expensive than the hardware for standalone servers. Of course, you can use existing SQL Server™ instances to install the MOM databases provided these SQL clusters have enough capacity in terms of both storage and performance. To measure current use of the SQL Server instances, use the Performance Monitor and the System Center Capacity Planner. In addition, you can use the MOM 2005 Sizer Tool to roughly forecast load and storage minimum requirements for the computer running SQL Server.
It is essential to install the OnePoint database on a cluster to achieve high availability. The SystemCenterReporting database can also be clustered, but this is not as critical since its downtime only leaves reports inaccessible. Of course, if you rely heavily on reports, clustering the reporting database may be critical to your operations.
It's worth noting that SQL Server 2005 provides greater stability and availability than SQL Sever 2000. Once you've installed three hotfixes, MOM 2005 SP1 can be run on SQL Server 2005. I've detailed how to set up this configuration on my blog.
Configuring Management Servers
MOM management servers are responsible for communication with agents, consoles, and transferring data to and from the OnePoint database. They also provide the point of entry for connecting to third-party platforms through the MOM Connector Framework Web service.
To provide fault tolerance for your management server, you simply install more than 1, but fewer than 10, management servers in the management group. The agents installed in the management group will automatically switch to another management server if the primary MOM server goes down. If all the management servers in the group are unreachable, the agent will locally cache up to 3MB of data, by default.
The MOM administrator console only shows the primary management server of an agent. You can verify or modify all the agents' management servers in the registry:You can also do this via add/remove programs (on the Advanced button) for the MOM 2005 agent software, as shown in Figure 2. When the agent is running in control level full, you can only set the primary management server (through discovery rules). All the other management servers in the management group are automatically assigned to that agent as secondary. Any changes made in the registry or through add/remove programs will be overwritten. When the agent is running in control level none, you can specify exactly which management servers are secondary in the registry and those values will not be overwritten.
[HKEY_LOCAL_MACHINE\SOFTWARE\Mission Critical Software\OnePoint\Configurations\MG1\Operations\Agent\Consolidators] "Consolidator 1 Host"="MOM1.vs.local" "Consolidator 1 AD Name"="MOM1.VS.LOCAL" "Consolidator 2 AD Name"="MOM2.VS.LOCAL" ...
Figure 2a Microsoft Operations Manager 2005 Agent Configuration and Setup
Figure 2b Microsoft Operations Manager 2005 Agent Configuration and Setup
You should distribute the load on management servers evenly (proportional to each management server's capacity). The primary management server is defined by the discovery rule. All other management servers in the management group are applied as secondary to the agent. If you want to modify the primary management server, you need to change the original discovery rule or the ManualMC.txt file. Thus it is not a good idea to delete your original discovery rules.
If you have a lot of servers to monitor with MOM you may want to put forth regular expressions in your discovery rules. Tools like Ultrapico's Expresso, used for creating and testing regular expressions, can become handy in developing those rules.
You might be tempted to install management servers on virtual machines. This is acceptable as long as the capacity is evaluated properly. It depends on the number of reporting agents and deployed management packs. Generally, the biggest obstacle is the single processor inside the guest operating system in Virtual Server 2005 and Virtual Server 2005 R2.
Management server operations are processor intensive, so it is a good idea to install the management server on a dual processor machine when the number of reporting agents exceeds 50 (assuming about 10-15 management packs, including major ones like Active Directory® and SQL). You definitely need a dual processor server for more than 100 agents.
Integrating MOM 2005 into Your Infrastructure
To get the most out of a highly available monitoring solution, it must be capable of integrating into your existing environment. Take trouble-ticketing/service desk systems, for example. Many organizations, especially large enterprises, rely heavily on these systems for reporting and tracking IT problems and resolutions. Synchronizing MOM with the trouble-ticketing system adds some important enhancements:
- A single console lets you manage problems related to servers.
- Service desk can be used as an escalation mechanism for MOM.
- Built-in service desk reporting can be used as an enhancement to MOM reporting services.
MOM has a Software Development Kit you can use to develop connectors. The MOM SDK exposes several interfaces that are exerted for development:
- Operations Database SQL Views
- Reporting Database SQL Views
- Windows Management Instrumentation (WMI) classes
- .NET Framework classes for: Microsoft.EnterpriseManagement.Mom namespace; Microsoft.EnterpriseManagement.Mom.Runtime namespace for developing managed code responses; MCF (Microsoft.EnterpriseManagement.Mom.Connector and Microsoft.EnterpriseManagement.Mom.Connector.V2)
- MOM Connector Framework (MCF) Web service
The last component is available as an install option on MOM management servers. This is the recommended method for developing connectors and will be the main SDK access method in the next version of MOM. The SDK contains a sample connector (based on MCF) to an XML file emulating a ticketing application. This gives a simple example of how to utilize the MOM APIs.
Using the MOM Connector Framework
The current version of MCF is a Web service with 16 methods (shown in Figure 3). Six of these methods are used most often: Setup, Initialize, GetData, AckData, UpdateAlerts, and Uninitialize. Setup is used to identify the connector in MOM with a unique registration ID. Initialize prepares the Web service for communicating with the connector. GetData retrieves alerts from MOM. AckData is used to acknowledge that alerts from MOM were received. UpdateAlerts updates existing alerts in MOM with information from the external system. And Uninitialize clears most of the settings and prepares for new settings or a complete service stop. You can call these methods from managed code, and using just these six methods to build a full connector that transfers data to and from MOM.
Figure 4 shows the snippets of code you can use to take advantage of these methods in order to retrieve and update data in MOM. The best practice is to develop connectors as a Windows NT® service. This ensures greater stability and the service runs in the background, polling the MCF Web service at a given interval for new or updated alerts. You can specify any value for the interval that meets your needs. I have successfully tested intervals of as little as 10 seconds in production environments with over 200 agents. This setting depends on the performance of the SQL Server where the OnePoint database resides.
Set up the connector
Dim connector As CompName.ConnectorServiceV2 ‘CompName is the Web reference to MCF Web service (ConnectorServiceV2.wsdl) Dim registrationid As System.Guid Dim flag As DataChanges registrationid = connector.Setup(info, resolutionstate) ‘setup the connector and retrieve its GUID flag = DataChanges.NewAlerts Or DataChanges.UpdatedAlerts ‘set the flag to get new and updated alerts connector.Initialize(registrationid, flag) ‘initialize the connector
Dim getdataalerts As sendtoSystem.CompName.Data = connector.GetData(registrationid, flag, 100) Dim newalerts() As sendtoSystem.CompName.Alert = getdataalerts.NewAlerts Dim newalert As sendtoSystem.CompName.Alert Dim updatealerts() As sendtoSystem.CompName.Alert = getdataalerts.UpdatedAlerts Dim updatealert As sendtoSystem.CompName.Alert
For Each newalert In newalerts Dim ack(1) As Guid ack(0) = newalert.AckId connector.AckData(registrationid, ack) Dim comp As String comp = newalert.ComputerName.ToString
Dim AlertUpdate As New sendtoSystem.CompName.AlertUpdate Dim AlertUpdateTable(0) As sendtoSystem.CompName.AlertUpdate MakeUpdate(AlertUpdate) ‘internal procedure to prepare for update (define what fields are updated) AlertUpdate.CustomField2 = "PENDING_FORWARD_TO_SYSTEM" ‘the introduced change AlertUpdate.AlertId = updatealert.AlertId AlertUpdateTable(0) = AlertUpdate connector.UpdateAlerts(registrationid, AlertUpdateTable) ...
This is only a general approach to developing MOM connectors. It is a complex issue that requires detailed planning, consideration of business requirements, and analysis of available access methods (to the connected system).
The complexity involved can vary greatly. Developing your own connector may be a good approach if your requirements dictate a relatively simple design. For a simple connector, you may be able to avoid any complex coding, basing your design on MOM alert rules and management server scripts. These scripts can use the Management Class Libraries as they run locally on the MOM server. Or you may need a custom connector if there are no MOM connectors available for your specific system. If the connector has more functionality, then developing a Windows NT service that calls MOM Connector Framework is the best approach. Otherwise, you may want to check to see if a MOM connector has already been developed for the system in question.
You may want to implement some kind of escalation mechanism inside MOM. The easiest way to achieve this is to use MOM alert custom fields and rules. Figure 5 demonstrates how to utilize the WMI classes to check whether alerts may need to be escalated. The following code can be adapted as a script in an alert rule response to mark alerts with given escalation time (through a parameter):
Set objAlert = ScriptContext.Alert paramEscalation = ScriptContect.Parameters.Get("EscalationThreshold") objAlert.SetCustomField 4, paramEscalation
Set objWMI = GetObject("winmgmts:\\.\root\mom")’get the alerts from WMI Set colAlerts = objWMI.ExecQuery("Select * from MSFT_Alert where Severity >=50 and ResolutionState = 0 and CustomField4 > 0") ‘return alerts matching conditions for escalation For Each objAlert in colAlerts ‘enumerate through the alerts If CInt(DateDiff("n",ConvertDateFromUTC(objAlert.TimeRaised),Now)) > CInt(objAlert.CustomField4) Then ‘take action when alert time raised is over the threshold stored in Custom Field number 4 ‘Take action accordingly to the requirements for escalation, e.g. run executable file wshshell.CurrentDirectory = WorkDir Result = wshshell.Run(CmdLine, 0, TRUE) If Result <> 0 Then CreateEvent(CompName) ‘log a MOM event on failure End If End If Next
When an alert remains unattended for a length of time that exceeds the specified escalation time, a particular action can be taken. The administrator can indicate what action to take (fire an external command, send a new alert, change the alert that caused the escalation, or so on). The interval for running the script in Figure 5 depends on the requirements, but it should not be less than five minutes in a large MOM deployment. Many thanks to Brian Wren who has published a management pack that fully implements exactly this escalation mechanism.
The escalation process can be layered with different levels depending on elapsed time. This can be implemented by expanding the solution described here to the next custom fields defining further escalation levels.
The next version of MOM, named System Center Operations Manager 2007, is expected to be released in late 2006. It will natively implement notification escalation mechanisms, with the ability to base escalation on a notion of "alert aging." The administrator will be able to define criteria that are used to generate a notification when an alert "ages" beyond the specified threshold.
Andrzej Lipka is an Infrastructure Consultant in Microsoft Services in Poland. He works in management and operations, specializing in MOM, SMS, and Active Directory. He can be contacted through his blog at blogs.technet.com/alipka or by e-mail at firstname.lastname@example.org.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.