The third process in SMC occurs after any monitoring tool being used is in place. When an event occurs, a notification is received, either by a dedicated SMC group or by a related group that has SMC responsibilities. After analysis, the event is either solved or escalated to a higher level for eventual solution.
This process involves the following activities:
Analyze the event.
Solve or escalate the event.
The following table describes these activities in greater detail.
Table 6. Activities and Considerations for Continuous Monitoring
Who should receive alerts?
Do incoming alerts require 24/7 support and, if so, who should handle them?
Is there a dedicated SMC group, or is monitoring handled by other departments, such as the Service Desk or Operations?
Is there a need for correlating events? Correlating events allows for an end-to-end look at related events and makes troubleshooting easier.
Have events historically been regarded as incidents, and has the incident management process handled the incident to analyze and resolve events/incidents?
Is there a connector between the monitoring system and the Service Desk tools or will alerts be transferred manually?
Do other departments or resources work on a given problem?
Are automated solutions applied?
Can alerts automatically be solved and closed?
How are alerts communicated to groups (via pager, text message, monitoring console, e-mail)?
IT services configured in the monitoring tool
SMC policies and procedures
If something needs immediate attention, ensure that there is a way to prioritize it.
Who is primarily responsible for event analysis?
Who is responsible for handling “noise” reduction—for clearing out events that aren’t real and that should be removed from view?
Is a known problem causing the event?
Is there clear, easily accessible information available about possible solutions?
Is the event description understandable?
Have there been other alerts about the same problem?
Can certain manual tasks help solve the problem?
Does any tool used by the Service Desk contain procedures for covering this incident?
Are there any changes planned for the IT service or for CIs of the IT service?
Is the event actionable? Is it valid?
Can the alert be tuned? Alert tuning is the adjustment of a service monitoring tool for a lower level of alert noise to reduce the number of false alerts.
Is the impact to the IT service clearly understood and communicated in the SMC tool?
Information about event resolution
Description of the event
Information from other teams
Event is solved
Event escalated as an incident and its severity raised, with possible transfer to another team
Ensure that all alerts are understandable, relevant, and up to date.
Resolve or escalate event
Who has authority to escalate events?
Who receives the escalated event?
How can we ensure that the receiver takes ownership of the event? If the receiver can’t, is there an alternate individual or team to call upon?
Which events should be subject to 24/7 escalation?
Was the event resolved through the use of a knowledge base? Product knowledge? Other approaches?
Should the alert threshold be tuned or updated?
Updated knowledge about alerts
Input for tuning the alerts
Additional error description of the alert for further troubleshooting
Description of previous activities (if problem is not solved)
Encourage each individual on the alert escalation chain to provide input and knowledge.