Defining Rules

發行項
05/13/2011

適用於: System Center Operations Manager 2007

After monitors and health state are defined for each class in the service model of a System Center Operations Manager 2007 management pack, rules can be defined.

Alerting Rules

Alerting rules are used either when a monitor cannot reasonably be defined for a particular issue or when the issue does not relate to the health state of a particular class. The most common reason that a monitor cannot be defined is that there is no reasonable means of identifying that the issue was corrected. This is required to define the healthy state of the monitor. One strategy to address this challenge is using an alerting rule instead of a monitor.

Instead of creating an alerting rule, a monitor that uses a timer reset can be considered. We do not recommend a manual reset, especially targeted at a class that has multiple instances, because of the requirement that a user must always intervene to return the monitor to a healthy state. A timer reset monitor still influences the health state of the target object and will automatically reset to a healthy state if a user does not manually reset it in a specified time.

The other use of alert rules is to identify issues that do not relate to a health state. For example, the application might perform a data replication regularly. An alert should be created if the transfer fails, although the application is still fully functional. This issue is best identified through an alerting rule instead of a monitor.

A strategy that can be used to identify alert rules is to analyze all events created by the application that were not used in a monitor. These may be Windows events or may be provided in any of the event data sources previously discussed. Such events will typically identify errors in the application. Any events not used by a monitor are candidates for an alerting rule.

Minimizing Noise

A strategy of just creating an alerting rule for each event created by the application without analysis is not a best practice because excessive alerts can represent a source of noise in the Operations Manager environment with minimal value. Before you create any alerting rule, consideration should be given to whether the issue that is identified by the event is significant enough to create an alert. If the issue is minor, the event may just be collected for later analysis.

Analysis should also be performed to determine whether an event represents a unique issue not identified by another rule or monitor. A single application issue may have several symptoms generating multiple kinds of error events. There is minimal value in creating multiple alerts from the same root problem. Such noise can create too much work for the user and obscure alerts that identify other issues.

Collection Rules

Collection rules augment health monitoring by collecting information for reporting and analysis in the Operations console. The most common collection rules will be those collecting performance data. This may be any Windows performance counters that are provided by the application or could be response times and other numeric data that is created by monitoring scripts.

The first data that should be considered for collection is any performance data that is used in monitors. The monitor sets health state based on comparing the numeric data to a threshold, but it does not store the data for analysis. Having trending and historical data to support the monitor will typically be useful for the end-user. Additional performance data that might be available should be defined taking into consideration storage requirements. If performance data is of minimal value in daily monitoring then it may not be worth the space that is required for its storage.

另請參閱

概念

Alert Rules
Collection Rules

Defining Rules

Alerting Rules

Minimizing Noise

Collection Rules

另請參閱

概念

其他資源