Monitor an Application's Health

適用於: Operations Manager 2007 R2

This topic describes a monitor that uses a script to evaluate an application’s health.

Executing a Script by Using a Monitor

This monitor discovers health by executing a script and using the results stored in StdErr. The Health state changes if a warning or critical state is detected, and then an alert is fired.

So far, we’ve taken steps to discover the application by identifying the existence of the script, to validate the script, and to take a corrective action if the script is invalid. The preliminary steps are all in place, and it is now appropriate to run the script to determine application health.

Start by defining the monitor for the sample application’s health. The discovered Microsoft.SCX.Sample.Application is the target, and just as in the check Md5 script monitor, the parent monitor is System.Health.AvailabilityState. The alert settings identify the alert message label that is defined in the Presentation and Language Packs sections in the management pack. The monitor alerts on an error and attempts to auto resolve.

<UnitMonitor ID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor" Accessibility="Public" Enabled="true" Target="Microsoft.SCX.Sample.Application" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Microsoft.SCX.Authoring.Guide.RunScript.MonitorType" ConfirmDelivery="false">
        <Category>PerformanceHealth</Category>
        <AlertSettings AlertMessage="Microsoft.SCX.Authoring.Guide.RunScript.AlertMessage">
          <AlertOnState>Error</AlertOnState>
          <AutoResolve>true</AutoResolve>
          <AlertPriority>Normal</AlertPriority>
          <AlertSeverity>Error</AlertSeverity>
        </AlertSettings>
        <OperationalStates>
          <OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning" />
          <OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error" />
          <OperationalState ID="OK" MonitorTypeStateID="OK" HealthState="Success" />
        </OperationalStates>
        <Configuration>
          <TargetSystem>$Target/Host/Property[Type="Unix!Microsoft.Unix.Computer"]/NetworkName$</TargetSystem>
          <Command>sh /tmp/SampleAppHealth.sh</Command>
          <Interval>30</Interval>
        </Configuration>
      </UnitMonitor>

As with the previous script monitor, all the attributes necessary for this monitor are defined within the configuration, and the modifiable attributes are defined in overrideable parameters.

There is a scheduler to determine when to run the monitor and a probe action to execute the monitor. The $Config/Command$ takes the value that is passed in from Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor, and that is defined as overrideable in the configuration subsection. There is also a condition and a regular detection. The condition is the check that is performed in each of the regular detections. As in discovery, the regular detection actions are executed from deepest-level node to the top-level node. The action determines the hash value for the script and verifies if it matches the expected hash code.

  
      <UnitMonitorType ID="Microsoft.SCX.Authoring.Guide.RunScript.MonitorType" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="Warning" NoDetection="false" />
          <MonitorTypeState ID="Error" NoDetection="false" />
          <MonitorTypeState ID="OK" NoDetection="false" />
        </MonitorTypeStates>
        <Configuration>
          <xsd:element name="TargetSystem" type="xsd:string" />
          <xsd:element name="Command" type="xsd:string" />
          <xsd:element name="Interval" type="xsd:unsignedInt" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="Command" Selector="$Config/Command$" ParameterType="string" />
          <OverrideableParameter ID="Interval" Selector="$Config/Interval$" ParameterType="int" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource ID="Scheduler" TypeID="System!System.Scheduler">
              <Scheduler>
                <SimpleReccuringSchedule>
                  <Interval Unit="Seconds">$Config/Interval$</Interval>
                  <SyncTime />
                </SimpleReccuringSchedule>
                <ExcludeDates />
              </Scheduler>
            </DataSource>
            <ProbeAction ID="RunScript" TypeID="Unix!Microsoft.Unix.WSMan.Invoke.ProbeAction">
              <TargetSystem>$config/TargetSystem$</TargetSystem>
              <Uri>https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx</Uri>
              <Selector />
              <InvokeAction>ExecuteCommand</InvokeAction>
              <Input><![CDATA[ <p:ExecuteCommand_INPUT xmlns:p="https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem"><p:command>$Config/Command$</p:command><p:timeout>10</p:timeout></p:ExecuteCommand_INPUT> ]]></Input>
            </ProbeAction>
            <ConditionDetection ID="CDOK" TypeID="System!System.ExpressionFilter">
              <Expression>
                <SimpleExpression>
                  <ValueExpression>
                    <XPathQuery Type="Double">//*[local-name()="StdErr"]</XPathQuery>
                  </ValueExpression>
                  <Operator>Equal</Operator>
                  <ValueExpression>
                    <Value Type="Double">0</Value>
                  </ValueExpression>
                </SimpleExpression>
              </Expression>
            </ConditionDetection>
            <ConditionDetection ID="CDWarning" TypeID="System!System.ExpressionFilter">
              <Expression>
                <SimpleExpression>
                  <ValueExpression>
                    <XPathQuery Type="Double">//*[local-name()="StdErr"]</XPathQuery>
                  </ValueExpression>
                  <Operator>Equal</Operator>
                  <ValueExpression>
                    <Value Type="Double">1</Value>
                  </ValueExpression>
                </SimpleExpression>
              </Expression>
            </ConditionDetection>
            <ConditionDetection ID="CDError" TypeID="System!System.ExpressionFilter">
              <Expression>
                <SimpleExpression>
                  <ValueExpression>
                    <XPathQuery Type="Double">//*[local-name()="StdErr"]</XPathQuery>
                  </ValueExpression>
                  <Operator>Equal</Operator>
                  <ValueExpression>
                    <Value Type="Double">2</Value>
                  </ValueExpression>
                </SimpleExpression>
              </Expression>
            </ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="OK">
              <Node ID="CDOK">
                <Node ID="RunScript">
                  <Node ID="Scheduler" />
                </Node>
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="Warning">
              <Node ID="CDWarning">
                <Node ID="RunScript">
                  <Node ID="Scheduler" />
                </Node>
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="Error">
              <Node ID="CDError">
                <Node ID="RunScript">
                  <Node ID="Scheduler" />
                </Node>
              </Node>
            </RegularDetection>
          </RegularDetections>
        </MonitorImplementation>
      </UnitMonitorType>

Add a string resource to the presentation section.

  <StringResource ID="Microsoft.SCX.Authoring.Guide.RunScript.AlertMessage" />

Add display string information to the Language Packs section. In most cases, a description is not required, although the alert message generates an error in the Operations console if the description field is blank.

        <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor">
          <Name>Sample App Health Monitor</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor" SubElementID="Error">
          <Name>Error</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor" SubElementID="OK">
          <Name>OK</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Monitor" SubElementID="Warning">
          <Name>Warning</Name>
          <Description />
        </DisplayString>
        <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.RunScript.AlertMessage">
          <Name>Sample App Health Alert</Name>
          <Description>Application is not healthy</Description>
        </DisplayString>
        

Again, observe that there are overrideable parameters defined: the command and probe execution interval. A scheduler is defined to determine when to run the monitor, a probe action to run the monitor, and then the condition and regular detections. In this case, the monitor examines StdErr to retrieve the results of the probe action. Because there are three possible results, you must evaluate a regular detection that corresponds to each result.

Based on the evaluation results, the state of the Health monitor is updated. If there is a change to the monitor, the Health status is updated and an alert is triggered.

In the Microsoft.SCX.Authoring.Guide.xml file that you created earlier, replace the above XML code in the sections noted.

Save the file Microsoft.SCX.Authoring.Guide.xml file and then import the updated management pack into the management group as described in Required Management Pack Definitions.

Monitor Health Results

For the purposes of observing the Health monitor, turn off the Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor monitor by setting the <Enabled> flag to false. This allows modifications to SampleAppHealth.sh without generating additional alerts.

Observe Health Results

  1. Modify SampleAppHealth.sh to generate a critical alert. echo 2 1>&2.

  2. In the Operations console, go to the Monitoring node.

  3. Click Active Alerts.

  4. Wait the specified configuration interval, which is usually no more than 30 seconds. An alert should appear in the list of active alerts.

  5. Double-click Sample App Health Alert to view the properties dialog box. The severity should be reported as critical.

Modify SampleAppHealth.sh to generate a warning echo 1 1>&2. Return to the Operations console and observe the Active Alerts node. The alert for Sample App Health should disappear. Open a Diagram View of the monitored computer. Observe that the Sample Application is reporting a warning. Based on the Alert Settings, this monitor generates an alert on an error state but not on a warning state.

In the management pack, enable the Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor by setting the Enabled attribute to true. Increment the version number, save, and import the management pack. Changes to SampleAppHealth.sh report the Health state, and at the same time, the Check MD5 script automatically restores SampleAppHealth.sh.

Add a Task

Depending on the configuration interval, it might be useful to have a task to get an immediate report on the Health of the monitored application. The probe action, invoke action, and input are identical to the Get Sample App Health monitor.

      <Task ID="Microsoft.SCX.Authoring.Guide.GetSampleAppHealth.Task" Accessibility="Internal" Enabled="true" Target="Unix!Microsoft.Unix.Computer" Timeout="300" Remotable="true">
        <Category>Maintenance</Category>
        <ProbeAction ID="RunScript" TypeID="Unix!Microsoft.Unix.WSMan.Invoke.ProbeAction">
          <TargetSystem>$Target/Property[Type="Unix!Microsoft.Unix.Computer"]/PrincipalName$</TargetSystem>
          <Uri>https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx</Uri>
          <Selector />
          <InvokeAction>ExecuteCommand</InvokeAction>
          <Input><![CDATA[ <p:ExecuteCommand_INPUT xmlns:p="https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem"><p:command>sh /tmp/SampleAppHealth.sh</p:command><p:timeout>10</p:timeout></p:ExecuteCommand_INPUT> ]]></Input>
        </ProbeAction>
      </Task>

In the Microsoft.SCX.Authoring.Guide.xml file that was created earlier, insert the task in the tasks subsection under the monitoring section.

Save the Microsoft.SCX.Authoring.Guide.xml file and import the updated management pack into the management group as described in Required Components of every management pack.

Monitor Customization

Customization of this monitor is similar to the Check MD5 Script Monitor. The three attributes Command, Md5, and Interval in the configuration values of the <UnitMonitor> can be modified. The configuration interval should be increased from 30 seconds to a larger interval. Depending on the importance of the application, an interval from five minutes to 30 minutes is more appropriate.

To customize this monitor, execute a script for a different application, and if needed, create a new discovery, define a new unit monitor with the target matching the new discovery, and then update the command to use a new script. It might also be appropriate to monitor the physical script, as discussed earlier.