Health Monitor Concepts

Understanding data groups, data collectors, thresholds, and actions is fundamental to using monitors in Application Center, especially if you are planning to extend monitoring. This section describes these concepts and provides examples of their use.

Data Groups

Data groups are analogous to directories in the files system. They are used to contain and organize data collectors and data groups, aggregate their statuses and statistics, and associate an action with the status (if desired). These actions fire whenever the status of the data group meets the configured criteria.

Bb687448.note(en-us,TechNet.10).gif Notes

The data group aggregates the the data collectors' status and the status for the data groups that are contained within it. This means that the group status becomes Critical whenever a data collector or data group within it becomes Critical. The group status stays Critical until all of the status for all of the data collectors and data groups are OK (or Warning). When the collectors' status is OK, the status of the group is OK.

For example, the Online / Offline Monitors data group in Synchronized Monitors (Application Center) calls the action Take Server Offline when the group status for Online / Offline Monitors becomes Critical. In turn, the group status becomes Critical whenever a data collector within it becomes Critical. The group status stays Critical until all of the data collectors' status is OK (or Warning). When the collectors' status is OK, the group status is OK. Then, the group calls the Bring Server Online action.

Bb687448.caution(en-us,TechNet.10).gif Caution   Because the member is taken offline whenever the status of any data collector or data group in the data group Online / Offline Monitors is Critical, use caution when adding data collectors or data groups to this data group.

Data Collectors

Data collectors are objects that collect metrics for a variety of objects, such as the Inetinfo process. You can display these statistics in Health Monitor and set a threshold for a specific metric. When the threshold criteria is met, the data collector's status changes to the value specified in the threshold. Actions can be associated with data collectors in the same manner as associating actions with data groups.

Data collectors can gather statistics for:

  • Performance counters.

  • Services.

  • Processes.

  • Windows events.

  • COM+ applications.

  • The HTTP Monitor.

  • A TCP\IP port.

  • Pinging a remote server.

  • WMI event and instances.

The data collector aggregates status for the thresholds within it. If a data collector has two thresholds for for a data collector; for example, #Instances of Replication Event > 0 where one threshold is set to Status = Critical and the other threshold is set to Status = OK, the Critical status takes precedence even though the criteria is met for both.

For example, the System Monitors data group contains the LogicalDisk, Memory, and Processor data collectors. If their status is set to LogicalDisk = Warning, Memory = OK, and Processor = Critical, the group status for System Monitors is Critical. If the Processor collector becomes OK, and no other changes occur and the group status changes to Warning.

Use the Requires manual reset to return to Ok status check box to configure whether the status of the collector returns to OK automatically when all thresholds are below their criteria. If you select this check box, you need to take explicit action to return the collector's status to OK, such as creating a threshold that does this when the criteria is below the Critical level or by clicking Reset & check now on the pop-up menu for the data collector. Application Center leaves this check box cleared for the default monitors, therefore their status automatically returns to OK when the datum goes below the threshold.

Statistics

Besides firing actions and aggregating and displaying status, you can also use data collectors to collect statistics for an object. To configure the collected statistics, in the data_collector Properties dialog box, click the Detail tab, and then in the Properties list, clear the check box next to the property. You can collect statistics without providing an associated threshold.

For example, the Application Center Monitors\Cluster Service data collector collects statistics on the Started, Status, and State properties of the ACCluster Service, but only attaches a threshold for Started. To view all three of these statistics and the statistics for the thresholds, click the collector, and then in the details pane, click the Statistics tab.

Bb687448.note(en-us,TechNet.10).gif Notes

Thresholds

Thresholds compare their configured criteria with the criteria for the specific metric collected by the data collector. When the comparison criteria for the data collector's metric is satisfied, the threshold changes the data collector's status.

Bb687448.note(en-us,TechNet.10).gif Notes

Alerts

Alerts are Health Monitor events that are fired when the status changes for a data collector. This is true when the status changes and the status changes back to the original value. Therefore, if the Application Center Cluster Service is shut down, the Cluster Service data collector goes to Critical and an alert is fired. When the service is started again, the data collector returns to OK and another alert is fired.

Alerts can be viewed in the lower part of the details pane in the Health Monitor snap-in. To view an alert, in the Health Monitor console tree, click the data collector or data group. When you select a data collector, only the alerts for that collector are displayed. When you select a data group, all of the alerts for all the group's data collectors are displayed. To see more information about an alert, right-click the alert. A dialog box similar to Windows 2000 Event Log Properties appears.

Bb687448.note(en-us,TechNet.10).gif Note   Alerts are also fired as WMI events and logged by Application Center Events and Performance Logging.

Actions

Actions can be called by either data collectors or data groups, or, in some cases, by both. The same action can be called from any collector or group and any collector or group can call any number of actions. Actions also fire on a per-call basis. This means that if you have an e-mail action associated with the firing of an Application Center event that fires often, such as a replication event, a separate e-mail message is sent for each instance of the event.

The following actions are available:

  • Command-line action—executes the specified command-line .exe, using the specified parameters.

  • E-mail action—sends an e-mail message to the specified party in either the Create Cluster Wizard, Add New Member Wizard, or the To property in the action Details dialog box. The message sent is specified in the action Details dialog box, as well as the CC and BCC parties.

    Bb687448.note(en-us,TechNet.10).gif Note   Application Center does not provide a default e-mail address for an SMTP server name. If you do not provide one explicitly when creating a cluster or in the action Properties dialog box in Health Monitor, the e-mail action fails.

  • Text log action—uses a different file for each action called (for example, use Websitenotavailable.log to log instances for the HTTP Monitor Failed data collector). Specify a different log file for each data collector and action.

  • Windows event log action—fires an event to the Windows 2000 Event Log (applications) with the specified parameters.

    Bb687448.note(en-us,TechNet.10).gif Note   Application Center creates a special data collector to monitor the failure of actions. If an action monitored by this data collector fails, an event is fired and logged by Application Center Events and Performance Logging. The Health Monitor Action Failure Monitor is located in the Application Center Log Monitors data group under the Application Center Monitors group.

  • Script action—executes the specified VBScript or JScript from the specified location. You can specify a timeout parameter.

    Bb687448.note(en-us,TechNet.10).gif Note   To run WSH scripts, you must uses Cscript.exe or Wscript.exe from the command line.

  • Application Center default actions—the following actions are created by Application Center by default:

    • Text Log Action—log to Offline.log. Records instances of members being taken offline by a monitor.

    • Text Log Action—log to Websitefailures.log. Records instances of failures of the default Web site https://127.0.0.1 (localhost).

    • Command Line Action—take member offline. Takes the member offline when the Online/Offline Monitors data group status is Critical.

    • Command Line Action—bring member online. Brings the member online when the Online/Offline Monitors data group status is OK.

    • E-mail Action—send an e-mail message to an administrator. Sends an e-mail message to the system administrator when any one of several thresholds are crossed.

      Bb687448.note(en-us,TechNet.10).gif Note   Application Center does not provide a default e-mail address for an SMTP server name. If you do not provide one explicitly when creating a cluster or in the action Properties dialog box in Health Monitor, the e-mail action fails.

Did you find this information useful? Please send your suggestions and comments about the documentation to acdocs@microsoft.com.