Use a Script to Monitor an Application

Applies To: Operations Manager 2007 R2

This topic describes monitoring and how to implement it in your sample management pack.

Monitoring

When an application is discovered, Operations Manager loads the appropriate monitors, tasks, and alerts into the Operations Manager agent that is deployed on the managed computer. Each monitor checks on some aspect of an application or component. Every monitor defines a set of states and can only be in a single state at any time. Typically, a monitor has a unique workflow for each state that the monitor type declares. The workflows for a monitor type define how the state will be set when the monitor is reset by the user or by a recovery task. They also define the action to take on first initialization.

For performance reasons, it is not advisable to run all monitors on every system within a management group. Monitoring activity should be targeted by using discovery that allows for the identification of appropriate monitoring targets.

The first monitor created in this sample management pack discovers a script that is used for health monitoring. The monitor validates that the file is uncorrupted, and that it is the correct version.

Sample Health Script

The script that is used in this sample is a Bash script. Bash is the default shell on most Linux-based systems and can be run on most UNIX operating systems. The script, SampleAppHealth.sh, which simulates application availability, is quite simple and illustrates the process of collecting system information. In many cases, UNIX and Linux administrators have existing scripts that collect data, and they can easily customize this monitor to support an existing script.

The script is as follows:

#!/bin/bash

# Change the value to:
# 0 - Health
# 1 - Warning
# 2 - Critical

echo 0 1>&2

The necessary code is the first and last lines. Every line between those two lines is a comment that indicates the warning level values. To change this script to illustrate each of the error conditions, change the first numeral (in this case “0”) to the warning level that you want. The 1 represents standard output, which usually is the console, and 2 represents output to standard error. The code echo 0 1>&2 takes the output to standard output, and redirects it to standard error.

Create a Monitor

This monitor is a check of the MD5 hash value of the script to ensure it is the most up to date version. MD5, also known as Message-Digest algorithm 5, is a widely used hash function that takes an arbitrary block of data and returns a fixed-size bit string or checksum. MD5 is commonly used to check the integrity of files. This monitor targets the Microsoft.SCX.Sample.Application that was previously discovered. The <ParentMonitorID> is defined to create a relationship of this monitor to the system-defined monitor that reports overall System Health. When this monitor reports an error, this status will be reflected in the overall health for that system.

<OperationalStates> defines all the states for this monitor. This monitor has two states, Error and OK. <AlertSettings> describes the attributes for an alert, including which of the defined states will generate an alert. <AlertOnState> defines which of the monitor states produces an alert. When <AutoResolve> is set to true, which indicates that an alert has been triggered, the monitor will initiate corrective action without requiring administrative intervention. The corrective action can be defined in a recovery task with the name of Microsoft.SCX.Authoring.Guide.CheckMd5Script.Recovery. If no recovery task is defined, no action is taken. Information about how to create a recovery task is in the Create a Recovery Task topic.

<AlertSettings> includes an attribute for <AlertMessage>. The <AlertMessage> will be discussed in further detail in the presentation and languages section later in this topic.

Finally, the command that identifies the MD5 hash is defined in <Command>, the hash value for the correct version of the script is defined in <Md5>, and the <Interval> is set to 30 seconds. Every time that the script is modified, the value for Md5 must be updated and a new management pack must be imported into Operations Manager.

   <Monitors>
     <UnitMonitor ID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor" Accessibility="Public" Enabled="true" Target="Microsoft.SCX.Sample.Application" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Microsoft.SCX.Authoring.Guide.CheckMD5Script.MonitorType" ConfirmDelivery="false">
        <Category>PerformanceHealth</Category>
        <AlertSettings AlertMessage="Microsoft.SCX.Authoring.Guide.CheckMd5Script.AlertMessage">
          <AlertOnState>Error</AlertOnState>
          <AutoResolve>true</AutoResolve>
          <AlertPriority>Normal</AlertPriority>
          <AlertSeverity>Error</AlertSeverity>
          <AlertParameters>
            <AlertParameter1>$Data/Context/Md5$</AlertParameter1>
          </AlertParameters>
        </AlertSettings>
        <OperationalStates>
          <OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error" />
          <OperationalState ID="OK" MonitorTypeStateID="OK" HealthState="Success" />
        </OperationalStates>
        <Configuration>
          <TargetSystem>$Target/Host/Property[Type="Unix!Microsoft.Unix.Computer"]/NetworkName$</TargetSystem>
          <Command>md5sum /tmp/SampleAppHealth.sh</Command>
          <Md5>4f00fdbe0f3b89d4046f5d98152a1cf6</Md5>
          <Interval>30</Interval>
        </Configuration>
     </UnitMonitor>
   </Monitors>

In the Microsoft.SCX.Authoring.Guide.xml file, replace the </Monitors> subsection under Monitoring with the preceding XML.

Define the Unit Monitor Type

As in the <DataSourceModuleType> of the previously defined discovery, all the attributes necessary for this monitor are defined within configuration and the modifiable attributes are defined in OverrideableParameters.

The <MonitorImplementation> is almost identical in layout to the <ModuleImplementation> described for the <DataSourceModuleType>. Again, there is a <Scheduler> to determine when to run the monitor and a <ProbeAction> to execute the monitor. This <ProbeAction> is formatted similarly to the discovery’s <DataSourceModuleType>. The $Config/Command$ takes the value passed from Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor, which is defined as overrideable in the configuration subsection. There is also a condition and a regular detection. The condition is the check performed in each of the regular detections. As in discovery, the regular detection actions are executed from deepest-level node to the top-level node. Determine what the hash value is for the script and verify that it either does or does not match the hash code expected.

When you make and save modifications to the SampleAppHealth.sh script, verify the new MD5 hash value and then update the MD5 value by changing the overwriteable value in the monitor that is found in the Operations console. If you do not do this, Operations Manager will report an alert.

    <MonitorTypes>
      <UnitMonitorType 
ID="Microsoft.SCX.Authoring.Guide.CheckMD5Script.MonitorType" Accessibility="Public">
        <MonitorTypeStates>
          <MonitorTypeState ID="Error" NoDetection="false" />
          <MonitorTypeState ID="OK" NoDetection="false" />
        </MonitorTypeStates>
        <Configuration>
          <xsd:element name="TargetSystem" type="xsd:string" />
          <xsd:element name="Command" type="xsd:string" />
          <xsd:element name="Md5" type="xsd:string" />
          <xsd:element name="Interval" type="xsd:unsignedInt" />
        </Configuration>
        <OverrideableParameters>
          <OverrideableParameter ID="Command" Selector="$Config/Command$" ParameterType="string" />
          <OverrideableParameter ID="Md5" Selector="$Config/Md5$" ParameterType="string" />
          <OverrideableParameter ID="Interval" Selector="$Config/Interval$" ParameterType="int" />
        </OverrideableParameters>
        <MonitorImplementation>
          <MemberModules>
            <DataSource ID="Scheduler" TypeID="System!System.Scheduler">
              <Scheduler>
                <SimpleReccuringSchedule>
                  <Interval Unit="Seconds">$Config/Interval$</Interval>
                  <SyncTime />
                </SimpleReccuringSchedule>
                <ExcludeDates />
              </Scheduler>
            </DataSource>
            <ProbeAction ID="RunScript" TypeID="Unix!Microsoft.Unix.WSMan.Invoke.ProbeAction">
              <TargetSystem>$config/TargetSystem$</TargetSystem>
              <Uri>https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx</Uri>
              <Selector />
              <InvokeAction>ExecuteCommand</InvokeAction>
              <Input><![CDATA[ <p:ExecuteCommand_INPUT xmlns:p="https://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem"><p:command>$Config/Command$</p:command><p:timeout>10</p:timeout></p:ExecuteCommand_INPUT>]]></Input>
            </ProbeAction>
            <ConditionDetection ID="CDOK" TypeID="System!System.ExpressionFilter">
              <Expression>
                <RegExExpression>
                  <ValueExpression>
                    <XPathQuery Type="Double">//*[local-name()="StdOut"]</XPathQuery>
                  </ValueExpression>
                  <Operator>ContainsSubstring</Operator>
                  <Pattern>$Config/Md5$</Pattern>
                </RegExExpression>
              </Expression>
            </ConditionDetection>
            <ConditionDetection ID="CDError" TypeID="System!System.ExpressionFilter">
              <Expression>
                <RegExExpression>
                  <ValueExpression>
                    <XPathQuery Type="Double">//*[local-name()="StdOut"]</XPathQuery>
                  </ValueExpression>
                  <Operator>DoesNotContainSubstring</Operator>
                  <Pattern>$Config/Md5$</Pattern>
                </RegExExpression>
              </Expression>
            </ConditionDetection>
          </MemberModules>
          <RegularDetections>
            <RegularDetection MonitorTypeStateID="OK">
              <Node ID="CDOK">
                <Node ID="RunScript">
                  <Node ID="Scheduler" />
                </Node>
              </Node>
            </RegularDetection>
            <RegularDetection MonitorTypeStateID="Error">
              <Node ID="CDError">
                <Node ID="RunScript">
                  <Node ID="Scheduler" />
                </Node>
              </Node>
            </RegularDetection>
          </RegularDetections>
        </MonitorImplementation>
      </UnitMonitorType>
    </MonitorTypes>

In the Microsoft.SCX.Authoring.Guide.xml file that you created earlier, replace the <MonitorTypes /> subsection with the preceding XML.

Presentation and Language

Previously, there has been no focus on the labels that Operations console uses for folder and script objects. The presentation and language sections contain this information within the management pack. The presentation section describes how strings and content will be presented. This information can include fonts, sort order, color, and any attribute associated with the display of data. Minimally, for a monitor to run, all string resource references must be defined. For this monitor, the string resource Microsoft.SCX.Authoring.Guide.CheckMd5Script.AlertMessage must be described.

<Presentation>
    <StringResources>
      <StringResource ID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.AlertMessage" />
    </StringResources>
</Presentation>

In the Microsoft.SCX.Authoring.Guide.xml file, replace the <Presentations/> section with the preceding XML.

All strings that are referenced by a string resource are defined in language pack. Language pack allows for multiple language support, although for this guide, only English strings are provided. If a display string is not defined, the default behavior is to display nothing.

  <LanguagePacks>
    <LanguagePack ID="ENU" IsDefault="true">
      <DisplayStrings>
      <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.AlertMessage">
        <Name>Sample App MD5 Hash Alert</Name>
        <Description>MD5 hash value of script does not match latest version.</Description>
      </DisplayString>
      <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor">
        <Name>Sample App MD5 Hash Monitor</Name>
        <Description />
      </DisplayString>
      <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor" SubElementID="Error">
        <Name>Error</Name>
        <Description />
      </DisplayString>
      <DisplayString ElementID="Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor" SubElementID="OK">
        <Name>OK</Name>
        <Description />
      </DisplayString>
        </DisplayStrings>
      </LanguagePack>
  </LanguagePacks>

In the Microsoft.SCX.Authoring.Guide.xml file, replace the <LanguagePacks/> section with the preceding XML.

Observing the Monitor

If this monitor fails, a state change will occur. This state change will trigger the execution of a recovery task, as described in the Create a Recovery Task topic.

To observe the monitor function within Operations Manager, save the file, Microsoft.SCX.Authoring.Guide.xml, and import the management pack into the Operations Manager management group, as described in the Required Management Pack Definitions topic. Remember to increment the version in the manifest section and then import the management pack into the Operations console.

From the earlier discovery, the SampleAppHealth.sh script should still be in the /tmp folder on the UNIX-based or Linux-based server. If not, follow the instructions for putting the scripts on the UNIX-based or Linux-based computer that are found in the Enable Application Discovery topic.

Observe System Health from Diagram View

  1. From the Monitoring Node, select Unix/Linux Servers.

  2. Right-click the server name.

  3. Select Open: Diagram View.

  4. A new view is opened.

Health of the computer is determined by the active monitors that are targeted at the system; File System, Network, Operating System, and now Microsoft.SCX.Sample.Application, which is targeted and active. By adding the monitor, the Application now has Health data to report.

Observe Health Explorer

  1. Right-click Microsoft.SCX.Sample.Application in the Diagram window.

  2. Select Health Explorer.

The Entity Health has a green circle with a check next to it. This view is showing overall Health. Expand the Health Explorer Tree to view the Sample App MD5 Hash Monitor and observer the circle next to the monitor also has a green check mark.

Modify the SampleAppHealth.sh script in the /tmp directory

  1. Edit the file; add an extra character to the script and save file.

  2. Return to the Health Explorer window.

  3. Wait 30 seconds, and then press F5.

Notice that there is a red circle with an X next to the monitor and that the Entity Health also shows a red circle with an X. The computer is now in an unhealthy state.

It is possible to manually replace the monitored script with a correct version; however, it is also possible to create a task to initiate the update on demand and to create a recovery task to restore a computer from an unhealthy state automatically. The next topics cover the method of creating both a task and a recovery task.

Monitor Customization

Every monitor requires the following:

  • A target, typically a discovered computer or network device

  • A monitor definition

  • An associated unit-monitor type

  • String resource definitions and literal strings

Minimally, you can customize this sample monitor by changing the configuration values in the unit monitor for Microsoft.SCX.Authoring.Guide.CheckMd5Script.Monitor. The three attributes (command, Md5, and interval) all contain strings. A different validation method or file location can be inserted. Any updates to the script will require that the Md5 value be updated. The interval can be changed to something more appropriate in a production environment. Usually, scripts do not change frequently, so an interval of either 24 hours (86,400 seconds) or one week (604,800 seconds) is appropriate, depending on the required frequency of script updates.

To customize this monitor to provide the same validation process for a new application, create a new discovery, define a new unit monitor with the target matching the new discovery, update the command to point to the new script, and update the Md5 value. Because the validation process is identical, you do not have to change the monitor type or any of the alert messages.