Sudden Increases in Resource Usage

It is normal for sudden increases (spikes) in resource usage to occur on all the MOM components under certain conditions. These spikes are caused by different activities that take place during normal MOM operations, and can they can create temporary bottlenecks that increase alert latency. Any optimizing activity should factor in these increases in resource usage. It is recommended that you:

  • Distinguish spikes in resource usage from ongoing resource utilization issues.

  • Use performance counters and MOM reports to identify the cause and frequency of the spikes.

Typically, surges in resource usage are not considered to be an ongoing performance issue, and you can implement processes to minimize the impact of performance spikes as part of your optimizing activities.

Known Causes of Increased Resource Usage in MOM

There are several situations that will cause a sudden increase in resource usage on the MOM system.

Performance data bursts

Most management packs collect performance data for 15 minutes, and all of the agents send this data in bursts. In a management group with a large number of agents, these data transmission bursts saturate the database server disk and back up the Management Server queue until all of the performance data is inserted, which temporarily increases alert latency.

Service discovery data from all the agents

CPU utilization can increase noticeably when a large number of agents simultaneously send service discovery data to the Management Server. Depending on the volume of the data, the server queue can fill, which contributes to alert latency.

This situation usually happens when:

  • Service discovery is run after a new Management Pack is installed and targeted to a large number of agents.

  • The service discovery script is run after there has been a change in some service discovery instance, or attribute, for a large number of agents.

SQL jobs

The re-index job, which runs every Sunday at 3 A.M., causes the database server disk to be heavily utilized. This can cause some alert latency for the duration that it runs, which is usually 20-30 minutes.

Note

Other jobs, including grooming, update database, as well as the Data Transformation Services (DTS) job, do not contribute significantly to alert latency.