Work with Alerts_library

The Operator console is the primary interface for working with managed computers. Anyone using this console can obtain different types of information about the computers that they manage, resolve alerts, perform diagnostics, and run tasks against selected computers - within the boundaries of the console scope that they are using.

Web Console Notes

As noted in Chapter 2 of this guide, the Web console provides the following subset of Operator console views: Alerts, Computers, and Events. It does not provide the capability of running predefined tasks against a managed computer.

Another important difference between the consoles is view filtering. A Web console user can filter any of the views, but this information is not retained after the user navigates away from the view.

You can configure the Web console to be Read-only by using the following procedure.

Configure Web console as Read-only

  1. On the server where the console is installed, open the %INSTALLDRIVE%\Program Files\Microsoft Operations Manager 2005\WebConsole\Web.config file in a text editor.

  2. Locate this tag: <appSettings>

  3. Remove the comment markers to enable addkey=Readonly value=true.

  4. Save and close the file.

  5. Stop and restart the Microsoft Operations Manager 2005 Web console application in the Internet Information Services snap-in.

Operational data processing cycle

Managed computers are continuously sending data to the Management Server. Event, performance, alert and discovery data originates on the managed computer. Although the internal processing of each type of data is different, the data flow is the same.

Figure 3.4 illustrates how an alert is handled and processed by an operator. In this example, a WMI event indicating high queue length on an Exchange server provides the starting point in the process.

OG_Ch3_AlertProcess

Figure 3.4 Alert processing cycle

Referring to Figure 3.4:

  • The process described occurs, regardless of how MOM is deployed. For example, communications between the DAS and the database is the same when the MOM Database and MOM Management Server are installed on the same computer, or on different computers.

  • Given the steps in process, the display of new information in the Operator console is almost real time, rather than actual real time. The refresh rate, especially for events, is directly related to the size of the operational database and the refresh rate that is configured for the Operator console.

  • There are several points where latency can occur and where data transfer can be interrupted; namely: between the agent and the Management Server, and between the Management Server and the operational database*. See also:* Monitor MOM Components.

  • Latency and potential disruption in the data flow are important considerations for configuring high-service availability and performance tuning.

    Important

        Under a high volume of events, occasionally a rule might generate two alerts in response to a single event. When this happens, simply resolve one of the alerts.

The Alerts View

This section covers the following aspects of working with an alert:

  • Obtaining information about an alert.

  • Setting the alert resolution state.

  • Adding comments to the Alert Details.

  • Using maintenance mode.

  • Running diagnostic tasks.

Service Level Exceptions

This is a subset of the Alerts view that is used to flag alerts that have exceeded a predefined service level for the computer that is being monitored. You can change these settings by opening the properties page for an alert view, and editing the settings. In order to change the default settings you have to create a custom service level exception.

To create a custom service level exception

  1. In the Alert View, click Service Level Exceptions.

  2. In the Results pane, right-click the alert displayed as a service level exception to open the Alert View Properties page. If no alert is displayed, right-click Service Level Exceptions and open the properties page.

  3. Click the Criteria tab to display the View description.

  4. The phrase that begins with "and that violated" will contain the phrase "default company" as an active link. Click the link to open the Service Level Exception properties page.

  5. Click the radio button beside Custom service level agreement to display a list of service level options.

  6. Each of the service level options in the list contains minute, hour, or day settings displayed as an active link. To change a setting, click the appropriate link to open the Service Level Agreement properties page.

  7. Change the setting and click OK to return to the Service Level Exception properties page.

  8. When you finish configuring the custom service level exception, click OK.

View Alert summary

If the Alerts view is not active in the Results pane, click the Alerts navigation button. The columns in Table 3.7 are displayed by default for each alert.

Table 3.7 Columns displayed for an alert

Column name

Description

Severity

Indicates the severity of the alert, such as Service Unavailable or Success.

Maintenance Mode

Indicates whether the alert is in maintenance mode.

Domain

Specifies the domain to which the computer belongs.

Computer

Specifies the computer on which an agent generated the alert.

Time Last Modified

Specifies the date and time that the alert was last changed.

Resolution State

Indicates the status of the resolution process of the alert, such as New or Resolved. The resolution state indicates whether the resolution process has begun.

Time in State

Specifies the amount of time that the alert has been in the current resolution state.

Problem State

Indicates what problem state the alert is in.

Repeat Count

Specifies the number of identical duplicate alerts that this instance represents.

Name

Specifies the name of the rule that generated the alert.

Source

Indicates where the alert was generated, for example, from MOM, or a specific server.

Ticket Id

Specifies the ticket ID assigned to the alert.

Owner

Specifies the person responsible for tracking and resolving the alert

Note

    The enabled columns only display data that is available. For example, if an Owner is not assigned to the alert, no information is displayed.

View Alert details

  • To view the details for an alert, click the alert in the Results pane.

  • After a specific alert is selected, the tabbed view, illustrated in Figure 3.3, is dynamically generated for the alert. The following tabs are provided*. See also:* Alert View Sample.

Properties

Describes the alert and provides additional details, such as the Alert Id and the rule that generated the alert. From this tab you can:

  • Copy all or some of the information and paste it into a text file.

  • Print the information.

  • Disable the rule that generated the alert.

To undertake any of the preceding tasks, right-click anywhere in the display area and pick the action that you want to perform.

Custom Properties

Enables the user to provide additional information about the alert, including:

  • The alert owner

  • The ticket ID

    Note

        This information can be generated programmatically by integrating a ticketing system with MOM 2005. For guidance on ticketing solutions, refer to the "Autoticketing Solution" described in Chapter 8 of this guide.

  • Custom Fields (5) for adding information that can be used by other users in the IT support group.

Events

Provides the following summary information about the event that generated the alert: Type (Information, Error or Warning), Time, Source Computer, Provider Type, Provider Name, and Source.

To view more information about the event, right-click anywhere in the display area and pick View Events.

Product Knowledge

Displays the appropriate Management Pack knowledge for the alert.

To view the knowledge in the browser window, click the View button.

Company Knowledge

Depending on the console scope, enables the user to view, copy, print, or add to the company knowledge base.

If the user is a member of the MOM Authors or MOM Administrators groups, they can click Edit to open a text editor and create knowledge for the alert.

Note

    When changes are made to the company knowledge, these changes are not tracked in the alert history.

History

Displays summary information about the history of the alert, such as the management group it was created in and the notification group.

A user can add comments to the alert history by clicking the Append button to open the Alert History dialog box.

Alert view sample

The following sample is typical, and represents the type of information that you can obtain in the Details pane of an Alert.

Properties Tab

         
Description:
The host process host process for script responses (3036) will be restarted because it is using
20480 more bytes than its limit of 104857600. 
To adjust this limit, edit the Software\Mission Critical Software\OnePoint\MaxScriptHostPrivateBytes 
registry key. 
Management Group: MG2749 Name: The MOM Host process was consuming too much memory and will 
be terminated 
Severity: Error 
Resolution State: New 
Domain: SMX 
Computer: WOW406D 
Time of First Event: 11/23/2004 5:52:00 PM 
Time of Last Event: 11/23/2004 5:52:00 PM 
Alert latency: 0 sec 
Problem State: Investigate 
Repeat Count: 0 
Age:  
Source: Microsoft Operations Manager 
Alert Id: 618b8e08-7e14-4778-87f6-d4ed5eeea89e 
Rule (enabled): Microsoft Operations Manager\Operations Manager 2005\Agents on all MOM roles\The MOM Host 
process was consuming too much memory and will be terminated 

Product Knowledge Tab

         
 MOM OnlineManagement Pack 

The Action Account (MOMHost.exe) process was consuming too much RAM (physical) memory and was
restarted by MOM. 
The MOMHost.exe process is run under the agent Action Account and is used to gather information about, 
and perform actions on, the managed computer.
This restart might signify a problem with the managed computer, especially if the host process is restarted 
often, this might indicate a problem with the managed computer.

This could be caused by any of the following:
The amount of memory allotted to the process is too small and needs to be increased. 
The host process is running too many tasks or is gathering data form too many providers at one time. 
The host process is running scripts that are not freeing resources. 

To troubleshoot and fix this problem:
1. Make sure that the managed computer is not low on resources.
2. If the managed computer rarely uses more than 70% of its RAM memory, you can increase the amount 
of memory allotted to the MOMHost.exe process.
To increase or decrease the amount of memory allotted to the MOMHost.exe process:
In Regedit.exe (or some similar Registry editor), change the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Mission Critical Software\OnePoint
MaxDefaultHostPrivateBytes REG_DWORD <bytes>
NOTE - the default setting for this key value is 0x6400000 (100MB).
3. Continue to monitor the process by looking for this alert. If you see this alert for the host process on a 
specific computer and you have already increased the memory allocation, consider enabling tracing 
for the computer.
To enable or disable tracing for a specific agent:
In Regedit.exe (or some similar Registry editor), change the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Mission Critical Software
TraceLevel REG_DWORD = 1 - 6
    -1 = disabled (default)
    0-2 = error level tracing only
    3-5  = error and warning level tracing only
    6 = error, warning and information level tracing 
NOTE - Setting the registry key value to 4 or higher will affect the performance of the MOM Service 
on the managed computer.

Set Alert Resolution State

When an alert is first received, its Resolution State is automatically set to New. Support staff can change this state, as appropriate.

Set alert resolution state

  1. In the Results pane, click the alert that you want to set a resolution state for.

    Tip

        If there are multiple alerts that originate from a single computer, you can bulk-select the alerts and set a resolution state for all of them.

  2. Right-click and then pick Set Alert Resolution State.

  3. Click the state that you want, on the list that is provided, to set the state for the alert.

Note

    Some alerts will automatically be resolved when the alert state changes, or might get removed from the operational database during database grooming.

Use Maintenance Mode

Maintenance mode provides a means of stopping the insertion of alerts in the operational database. This mode does not take the computer that is generating alerts offline; maintenance mode only instructs the Management Server to set all new, incoming alerts from the computer to Resolved. As a result, the new alerts are not included in health calculations, and responses (such as paging or e-mailing) are not run on the Management Server.

However, in MOM 2005, the implementation of maintenance mode allows an error condition. This occurs when the MOM engine fails to recognize that a machine has been placed into maintenance mode. Alerts are generated and auto-resolved, but alert responses such as paging and e-mail notifications are still run. Also, computer state views become inaccurate in the Operator console. The problem can occur, in part, because of issues associated with the Configuration polling interval, the timing of MOM-related SQL jobs, and the manner in which MOM updates the Operator console for computers placed in maintenance mode. These issues have been addressed in MOM Service Pack 1; upgrading MOM with SP1 is strongly recommended.

A workaround for this problem exists for MOM 2005, and best practices to minimize this issue apply to both MOM 2005 and MOM 2005 SP1. The user must commit configuration changes immediately after the computer enters or exits maintenance mode. This should be done from the Administrator console or using the SDK.

Put a computer in maintenance mode

  1. In the Results pane, click the alert for the computer that you want to put in maintenance mode.

  2. Right-click the alert that you select and pick Put Computer in Maintenance Mode to open the Maintenance Mode property page.

  3. You can provide a reason for putting the computer in maintenance mode, adjust the time the computer is in maintenance mode (the default is 20 minutes), or you can specify an ending date and time for maintenance mode.

    Note

        It is recommended that you do not use a time interval of less than 5 minutes for maintenance mode. Due to timing cycles, the Management Server can keep a computer in maintenance mode for a minimum of 5 minutes.

  4. Click OK to close the property page and put the computer in maintenance mode.

  5. Recommended practice is to commit configuration changes when a computer enters maintenance mode and to commit configuration changes when a computer exits maintenance mode. For more detail, see the following section, "MOM 2005 Maintenance Mode Issues."

    Tip

        The Microsoft Operations Manager 2005 SDK contains a sample that shows how to put a computer in maintenance mode, programmatically.

MOM 2005 Maintenance Mode Issues

Following are the problems that can occur in MOM 2005 maintenance mode:

  • When an operator puts a computer into maintenance mode and doesnt immediately commit configuration changes, the MOM engine doesnt recognize this change until the configuration polling interval, which is five minutes by default. During the maintenance mode period, while MOM will generate and auto-resolve any alerts from this computer, it will also generate event alert responses such as paging and e-mailing, if configured to do so, as if the computer was still being monitored. This is undesirable.

  • If a computer is powered down during maintenance mode or the MOM service is stopped and remains down after exiting that mode, or if the MOM service on the client doesnt generate a successful heartbeat after exiting maintenance mode, the MOM engine does not update the heartbeat state displayed in the Operator console with the current operational state of the computer. As a result, an operator could see a green state icon for that computer, the same state it was in prior to entering maintenance, instead of a red state icon representing its actual operational state.

  • If a computer is powered down or becomes unavailable before entering maintenance mode, an alert is appropriately generated. However, if the computer remains down after exiting maintenance mode, no second alert is generated.

  • Similar to the preceding issue, if the user resolves the initial alert prior to entering maintenance mode (turning the state view icon of that computer green) and then the computer exits maintenance mode still down or unavailable, a new alert will not be generated nor will the Operator console update the state view icon from green to red.

MOM 2005 SP1 Maintenance Mode Issues

Service Pack 1 for MOM 2005 has addressed some of the issues discussed in the preceding section.

As in MOM 2005, when an operator puts a computer into maintenance mode, the MOM engine doesnt actually recognize this change until the configuration polling interval has passed. However, some program responses have been changed, specifically the following:

  • Event alert responses, such as paging, e-mailing, and so on, are no longer generated during the time a computer is placed in maintenance mode after configuration changes have been committed.

  • If a computer is powered down during maintenance mode and remains down after exiting that mode, or if the MOM service on the client doesnt generate a successful heartbeat after exiting maintenance mode, MOM will now generate an alert. As a result, the Operator console will be updated to display the current operational state of the computer.

  • If a computer is powered down or becomes unavailable before entering maintenance mode, an alert is appropriately generated. However, if the computer remains down after exiting maintenance mode, no second alert is generated.

  • Similar to the preceding issue, if the user resolves the initial alert prior to entering maintenance mode (turning the state view icon of that computer green) and then the computer exits maintenance mode still down or unavailable, a new alert will not be generated nor will the Operator console update the state view icon from green to red.

New Behavior in MOM 2005 SP1

While a managed computer is in maintenance mode and the MOM service is running, for agents sending out heartbeats, the MOM management server will not update the last heartbeat time value.

Best Practice for Using Maintenance Mode

The best way to avoid this error condition is to commit configuration changes whenever a computer is placed in maintenance mode and whenever it exits maintenance mode. This causes the MOM engine to recognize the mode of the computer in a timely fashion. The Operator console will then display the correct operation state icon.

Because the configuration polling interval is user-configurable, the impact of longer configuration polling intervals on the problems described increases the risk of MOM failing to recognize that a computer is in maintenance mode.

Computers are placed in maintenance mode by using the Operator console. However, the commit configuration changes task cannot be performed at the Operator console. If a computer is placed in maintenance mode, use the Administrator console to commit configuration changes. Remember to also perform the commit configuration changes task when the computer exits maintenance mode.

If you are using the MOM SDK to place a computer in maintenance mode, remember to commit configuration changes in the same script, as well as when the computer is scheduled to exit maintenance mode.

Run tasks

The tasks that are provided in the Operator console enable an operator, depending on their console scope, to run preliminary diagnostics to determine the cause of a problem. Table 3.8 summarizes all of the tasks that are provided with MOM 2005.

The availability of a task to an Operator console user is determined by:

  • The console scope that they are using.

  • The computer group filter that they are using.

Run a task

  • In the Tasks pane, click the task name or right-click the task name and pick Run.

Table 3.8 Available tasks in Operator console

Name

Description

Computer Management

Opens the Computer Management snap-in on a specified computer.

Event Viewer

Opens the Event Viewer for a specified computer.

IP Configuration

Runs the ipconfig command against a specified computer.

Ping

Runs the ping command against a specified computer.

Remote Desktop

Opens a Remote Desktop session to a specified computer.

Start MOM 2005 Service

Starts the local MOM service

Stop MOM 2005 Service

Stops the local MOM service

Test end-to-end monitoring

Creates an event on a managed computer to test the end-to-end monitoring of the MOM system.

Note

    Tasks that are not available to the current scope will either have the Run option grayed out, or else nothing happens when you click the task name.

Tasks that require a higher level of privilege will display an "Access is denied" error message when you run them. In some cases, you may have to look at the Task Status view to obtain this information.

Notes on other Views

The Alerts view may be the primary view used by IT support staff, but the other views provide a means for isolating a problem, as well as meeting the information requirements of different users. The following table adds to the information already provided in Chapter 2 of this guide.

Table 3.9 Summary of Operator console views

View

Personalize

Link to other views

Enable/disable maintenance mode

Comments

State

Y

Y

Y

Aggregates information about alerts and associated entities to display the state (health) of a computer group. See: State Icons, State Alert, State Rollup

Events

Y

Y

Y

See: Time Filtering.

Performance

Y

N

N

See: Performance data view

Computers and Groups

Y

Y

Y

 

Diagram

N

Y - Computer groups

N

See: Diagram View

State Icons

When an agent heartbeat has a Service Unavailable error for a computer, every state icon for the other roles (for example, Exchange Server and Active Directory) associated the other are suspect, and are visually depicted as gray line icons that are identical representations of the full color ones. For example, the gray circle-x is interpreted as follows: the last known state for this role is critical error, but since the agent is either not heart-beating, or the agent is flagged as service unavailable, the data for the other role is suspect.

Until the MOM agent is up again, and heart-beating normally, the gray versions of the state icons will remain. When the agent is OK again, the icons will return to the colored versions. The logic is that, since the agent performs the communication, if it is down, information that it communicates is also suspect.

State Alert

MOM 2005 provides an alert named the state alert. This alert has two problem state values: Active and Inactive. Each of this states handle rule response processing differently.

For example:

When % Processor time crosses a specified threshold, an alert is created with a problem state of Active, and any specified responses are run. If the counter drops below the threshold, another alert with a problem state of Inactive is created; however, none of the responses specified for the rule are run.

State Rollup

The state of a computer group is based on a roll-up policy, which can be configured by MOM authors using the State Roll-up Policy tab of the Computer Group property sheet.

Authors have three possible roll-up polices that they can define for their computer groups. These include:

  • Most Severe of any ServerThis policy indicates that the state of the computer group will be equal to the most severe state of any one of the members of the computer group.

  • Most Severe of the Healthiest X % of Servers

    This policy indicates that the state of the computer group will be equal to the most severe state of some % of the healthiest servers.

    Example: A computer group with 10 members has a policy set to 50%. If 5 have Warning states, and 5 have Service Unavailable states, then the state of the computer group would be Warning.

  • Least Severe of any ServerThis policy indicates that the state of the computer group will be equal to the least severe state of any one of the members of the computer group.

Important

    At times, the state view in the Operator console gets out of synchronization with the database. Some of the reasons for this are:

  • Queues get full (because a block of data from an agent will get inserted to server queue at same time, and likely get processed at same time).

  • The MOM server goes down, causing the agents to failover. (One server might have the red alerts for an agent; another might get the green alerts. Because the server was rebooted, alerts get inserted out of order).

  • The operational database is unavailable.

Note

The best work-around is to resolve the alert.

Time Filtering

Time filtering is a mechanism for determining how many days worth of information you want to see in the Results pane for the Alerts and Events views. The default setting is seven days, but you may want to consider changing this because:

  • In the case of alerts, the actual number of active alerts may appear to be higher than it actually is.

  • In the case of events, which generate more data than alerts, viewing response time is affected by the number of days of data that has to be retrieved from the database and displayed in the console.

To change the time filter

  1. On the Menu and toolbar, click the Edit view time filter button to open the View Date and Time Filter property page. (This button is labeled "F" in Figure 3.3).

  2. By default, Alert and Event data is set to be displayed for within the last seven days.

    • You can change the number of days by typing in a lower value. You can also use the list box to select hours, minutes or seconds.

    • Another option is to specify a time range. To do so, click the radio button beside Within the time range, and set the After or Before date and time.

  3. When you finish configuring the time filter, click OK.

Performance data view

Rather than selecting a computer, picking counters, and then drawing a graph, you can use the Performance Data view to identify specific counters for a computer.

Use the following procedure to create this view. When you are finished, save it in All My Views or Public Views.

Create performance data view

  1. Click the My Views navigation button.

  2. In the Navigation pane, right-click My Views, click New and then select Performance Data View.

  3. In the Create View - Performance Data View dialog, identify the type of performance data view that you want to create.

  4. When you select an item (step 3), the corresponding View description: area displays the description with hyperlinks that you will use later. Click Next to continue.

  5. Click the box beside each type of performance data that you want to include (for example, for specified counter, measured on specified computer.) When you select an item, a hyperlink is displayed in the corresponding View description (click the underlined value to edit): input area.

  6. Click each hyperlink to open a dialog box and provide the required information. Click Next to continue.

  7. Type a View name and Description for the view, and then click Finish.

Tip

Expand the Performance Views navigation tree to include Agent Performance. You can use the Performance Data views that are already constructed as a model for creating your own views.

Diagram View

The diagram view provides an ideal visual representation, complete with state indicators, of a MOM computer group. You can use the Group: list in the Menu and toolbar to diagram specific computer groups that are provided for the console scope that you are using.

If more than one object is shown on the screen, you can arrange the layout by clicking an object and dragging it to a new location. If you want to reset the diagram layout to the default layout, click the Relayout diagram button in the Menu and toolbar area of the console.

Exporting the View

You can export the diagram view and save it as a Visio drawing (.vdx) file.

Export the current diagram

  1. With Diagram as the active view, click the Export to Microsoft Visio button in the Menu and toolbar area of the console. This opens the Save diagram as a Visio .VDX file property page.

  2. Navigate to the location where you want to save the file, provide a filename, and then click Save.

Background Images

Background images are not provided for the diagram view. In order to add a background image, you must be a member of MOM Administrators, and must provide the image. The recommended image size is 640 x 480 pixels. Image quality and distortion will vary depending on how much you zoom in or out.

Note

    A management group can only have one image displayed for it.

Add background image

  1. Open the Operator console as a member of the MOM Administrators group.

  2. Click the Diagram navigation button.

  3. Right-click anywhere on the diagram and click Diagram View Properties to open the properties page for the view.

  4. In Diagram View Properties, click the Diagram Settings tab.

  5. Click the Background Images button to open the Diagram Background Images property page.

  6. Click Add to locate and specify the image that you want to add.

  7. After you finish adding images, you can use any of the selected images as a background image for the diagram view.