Process 3: Continuous Monitoring

 

Figure 5. Continuous monitoring

Activities: Continuous Monitoring

The third process in SMC occurs after any monitoring tool being used is in place. When an event occurs, a notification is received, either by a dedicated SMC group or by a related group that has SMC responsibilities. After analysis, the event is either solved or escalated to a higher level for eventual solution.

This process involves the following activities:

  • Receive notification.
  • Analyze the event.
  • Solve or escalate the event.

The following table describes these activities in greater detail.

Table 6. Activities and Considerations for Continuous Monitoring

Activities

Considerations

Receive notification

 

Key questions:

  • Who should receive alerts?
  • Do incoming alerts require 24/7 support and, if so, who should handle them?
  • Is there a dedicated SMC group, or is monitoring handled by other departments, such as the Service Desk or Operations?
  • Is there a need for correlating events? Correlating events allows for an end-to-end look at related events and makes troubleshooting easier.
  • Have events historically been regarded as incidents, and has the incident management process handled the incident to analyze and resolve events/incidents?
  • Is there a connector between the monitoring system and the Service Desk tools or will alerts be transferred manually?
  • Do other departments or resources work on a given problem?
  • Are automated solutions applied?
  • Can alerts automatically be solved and closed?
  • How are alerts communicated to groups (via pager, text message, monitoring console, e-mail)?

Inputs:

  • IT services configured in the monitoring tool
  • Role descriptions
  • SMC policies and procedures
  • Notifications

Outputs:

  • Incident information
  • Event information
  • Alert information

Best practice:

  • If something needs immediate attention, ensure that there is a way to prioritize it.

Analyze event

 

Key questions:

  • Who is primarily responsible for event analysis?
  • Who is responsible for handling “noise” reduction—for clearing out events that aren’t real and that should be removed from view?
  • Is a known problem causing the event?
  • Is there clear, easily accessible information available about possible solutions?
  • Is the event description understandable?
  • Have there been other alerts about the same problem?
  • Can certain manual tasks help solve the problem?
  • Does any tool used by the Service Desk contain procedures for covering this incident?
  • Are there any changes planned for the IT service or for CIs of the IT service?
  • Is the event actionable? Is it valid?
  • Can the alert be tuned? Alert tuning is the adjustment of a service monitoring tool for a lower level of alert noise to reduce the number of false alerts.
  • Is the impact to the IT service clearly understood and communicated in the SMC tool?

Inputs:

  • Information about event resolution
  • Description of the event
  • Open problems
  • Open incidents
  • Open changes
  • Information from other teams

Outputs:

  • Event is solved
  • Event escalated as an incident and its severity raised, with possible transfer to another team

Best practice:

  • Ensure that all alerts are understandable, relevant, and up to date.

Resolve or escalate event

Key questions:

  • Who has authority to escalate events?
  • Who receives the escalated event?
  • How can we ensure that the receiver takes ownership of the event? If the receiver can’t, is there an alternate individual or team to call upon?
  • Which events should be subject to 24/7 escalation?
  • Was the event resolved through the use of a knowledge base? Product knowledge? Other approaches?
  • Should the alert threshold be tuned or updated?

Inputs:

  • Updated knowledge about alerts
  • Input for tuning the alerts
  • Additional error description of the alert for further troubleshooting
  • Description of previous activities (if problem is not solved)

Outputs:

  • Escalated alerts
  • Solved alerts

Best practice:

  • Encourage each individual on the alert escalation chain to provide input and knowledge. 

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.

Download

Get the Microsoft Operations Framework 4.0

Solution Accelerators Notifications

Sign up to learn about updates and new releases

Feedback

Send us your comments or suggestions