Key Monitoring Scenarios

Applies To: Operations Manager 2007 R2

The Operations Manager Management Pack monitors the availability, configuration, performance, and security of the Operations Manager agents, services, workflow, and database. The management pack monitors the following key scenarios:

  • Active Directory integration

  • Agent health and remediation

  • Agent management and recovery

  • Core System: secure storage—password expiration

  • The health of the Health Service for agents and servers

  • The health of the Operations Manager database

  • Data volume by management pack

  • Agent version and architecture mismatch

  • CPU utilization by agents and related processes

  • Routine database maintenance

  • Duplicate relationships between agents and management servers

The following table describes these monitoring scenarios.

Scenario Description

Active Directory integration

This scenario monitors the LDAP module for Active Directory integration agent assignment.

Agent Health

This scenario checks agents for out-of-date configurations from the perspective of the agent. When the configuration is at the warning or critical level, the monitor logs the event. The Health Service rolls up the health of the agents and alerts you when the configured threshold of agents is out of date.

Agent management and recovery

The Health Service Watcher checks the heartbeat of agents and warns you when they fail. You can disable the monitor for managed clients or for a management server. By using the management pack, you can perform the following actions:

  • Repair the agent by automatically reinstalling it

  • Repair the agent by manually reinstalling it

  • Check the Health Service Windows Service State

  • Query the service state and the configuration

  • Ping the computer by using Internet Control Message Protocol

  • Recover and diagnose the agent by using the Automatic Agent Management Account Run As profile

  • Remotely enable and restart the health service

  • Remotely restart the health service

Core System: secure storage—password expiration

This scenario checks the secure storage's public key and configuration. The monitor alerts you about password expirations and configuration errors of Run As accounts.

The health of the Health Service for agents and servers

This scenario includes the following monitors for the Health Service:

  • Monitoring Host Handle Count Threshold. When consecutive samples for the Handle Count counter for the MonitoringHost.exe process exceed the configured threshold, the monitor changes state. The default threshold for this monitor is 6000 for agents and 10,000 for management servers.

  • Monitoring Host Private Bytes Threshold. When consecutive samples of the Private Bytes counter for MonitoringHost.exe exceed the configured threshold, the monitor changes state. The default threshold for this monitor is 300 MB for agents and 1500 MB for management servers.

  • Health Service Handle Count Threshold. When this monitor detects that consecutive samples of the Handle Count counter for the HealthService.exe process exceed the configured threshold, the monitor changes state. The default threshold for this monitor is 6000 for agents and 10,000 for management servers.

  • Health Service Private Bytes Threshold. When this monitor detects that consecutive samples of the Private Bytes counter for HealthService.exe exceed the configured threshold, the monitor changes state. The default threshold for this monitor is 300 MB for agents and 1500 MB for management servers.

    Note

    The monitors above roll up the worst of their combined states to an aggregate monitor named Health Service State, which in turn has a recovery associated with its error state. The recovery, which is enabled for agents by default and disabled for management servers, will restart the Health service on the system where excessive memory utilization has been detected.

    To determine whether the threshold for these monitors should be increased, disable the recovery for the Health Service State aggregate monitor so that the service will not be restarted while you are establishing a baseline. Use Perfmon to observe or collect the performance counters for the agents over a 24-hour time period or regular activity. Review the data collected and determine the typical maximum value. If necessary, apply overrides to the applicable monitors with values appropriate for your environment. Remove the override that disabled recovery for the Health Service State aggregate monitor.

    To change the thresholds, apply overrides for specific groups to the monitors, targeting the Agent class.

  • Action Account Configuration State. This monitor checks the configuration state of the action account and alerts you to errors.

  • System Rules Loaded State. This monitor checks that the rules are loaded. If an aggregation of the rules is not loaded, the monitor alerts you.

The health of the Operations Manager database

This scenario monitors the free space threshold of the Operations Manager database and alerts you if the monitor is in a warning or critical state. If the Operations Manager database runs out of space, the monitoring of other components and services can be interrupted.

Analyzing data volume

This scenario provides you with data that you can use to tune management packs more effectively. This management pack provides reports which enable you to analyze the amount of data produced by the management packs in your environment. When you run the Data Volume by Management Pack report, you can view the data volume for each management pack. You can then click any of the count cells to open the Data Volume by Workflow and Instance report, which provides a more detailed look at the volume of data. The information you obtain from these two reports can help you identify the management packs and workflows producing the largest amount of data, which you can then evaluate to determine whether tuning would be useful.

Monitoring the version and architecture of the Operations Manager agent

  • This scenario checks the installed agents and sends an alert when a 32-bit agent is installed on a 64-bit operating system. Running a 32-bit agent on a 64-bit operating system will produce unreliable results and is not a supported configuration.

  • This management pack enables you to check whether all installed agents are a specific version or newer. You can configure the agent version by using overrides for the “Agent Version Monitor” monitor. By default, the monitor checks for version 6.0.7221.0, which is the agent provided with Operations Manager 2007 R2, and generates an alert for any agent that is an earlier version. For best performance, stability, and functionality, agents should be upgraded to the most recent version.

Monitoring Operations Manager agent CPU utilization

This scenario monitors CPU utilization by agents and related processes, and generates an alert when CPU utilization exceeds a specified threshold for a specified number of consecutive samples. Excessive CPU utilization by the agent over a period of time is a symptom that something is not operating properly. This scenario adds the “Agent Performance” view.

Monitoring routine database maintenance

This scenario monitors whether routine database maintenance, such as partitioning and grooming, are completed in a timely manner. Incomplete or failed maintenance can result in performance problems and database free space alerts. The Partitioning and grooming has completed recently monitor runs a script that compares successful completion of partitioning and grooming workflows to a specified time period. By default, this monitor sets a warning state when database maintenance has not succeeded in the past 48 hours.

Monitoring duplicate relationships between agents and management servers

This scenario monitors for duplicate relationships between agents and management servers. When duplicate relationships between agents and management servers exist, data becomes corrupted and the configuration service will stop generating configuration for the entire management group. The Relationships between Agents and Management Servers Monitor monitor detects potential problems with the Operations database by checking for corrupted records of relationships between agents and management servers and generates an alert. You can run a task in the alert’s product knowledge that will repair the database. You can also configure automatic recovery on the Relationships between Agents and Management Servers Monitor monitor.