Published: April 25, 2008
The Problem Management SMF provides guidance to help IT professionals resolve complex problems that may be beyond the scope of Incident Resolution requests, which are described in theCustomer Service SMF. An incident is any event that is not part of the standard operation of a service and that causes, or may cause, an interruption to, or a reduction in, the quality of service. Problem Management involves:
- Recording incident, operations, and event data about a problem within an IT service or system.
- When justified, researching the problem to identify its root cause.
- Developing workarounds, reactive fixes, or proactive fixes for the problem.
Problem Management should begin at the start of a service’s lifecycle and should be applied to all aspects of IT—including application development, server building, desktop deployment, user training, and service operation. As more problems are discovered, recorded, researched, and resolved, IT will experience fewer failures. If Problem Management is performed during the period when a service is envisioned, planned, designed, built, and stabilized, the service will be deployed into productive use with fewer failures and higher customer satisfaction.
Problem Management SMF Role Types
The primary team accountability that applies to the Problem Management SMF is the Support Accountability. The role types within that accountability and their primary activities within this SMF are displayed in the following table.
Table 1. Support Accountability and Its Attendant Role Types
Role Type |
Responsibilities |
Role in This SMF |
Customer Service Representative |
- Handles calls
- Has first contact with user, registers call, categorizes it, determines supportability, and dispatches call
|
|
Incident Resolver |
- Diagnoses
- Investigates
- Resolves
|
- Watches for evidence of problems
- Passes on incident information to Problem Manager
|
Incident Coordinator |
- Responsible for incident from beginning to end
- Owns quality control
|
- Watches for evidence of problems
- Passes on incident information to Problem Manager
|
Problem Analyst |
- Investigates and diagnoses
|
- Finds underlying root causes of the incidents
|
Problem Manager |
- Identifies problems from the incident list
|
- Prevents future incidents
|
Customer Service Manager |
- Accountable for goals of Support
- Covers incidents and problems
|
|
Goals of Problem Management
The primary goal of Problem Management is to reduce the occurrence of failures with IT services. Its secondary goals are to generate data and lessons that IT can use to provide feedback during the IT lifecycle and to help drive the development of more stable solutions.
Table 2. Outcomes and Measures of the Problem Management SMF Goals
Outcomes |
Measures |
Problems affecting infrastructure and service are identified and assigned an owner. |
The number of unassigned problems is reduced, and the number of problems assigned to an owner is increased. |
Steps are identified and taken to reduce the impact of incidents and problems. |
The number of incidents and problems that occur is reduced, and the impact of those that still occur is lessened. |
Root cause is identified for problems, and activity is initiated to establish workarounds or permanent solutions to identified problems. |
The number of workarounds and permanent solutions to identified problems is increased. |
Trend analysis is used to predict future problems and enable prioritization of problems. |
More problems are resolved earlier or avoided entirely. |
Key Terms
The following table contains definitions of key terms found in this guide.
Table 3. Key Terms
Term |
Definition |
Problem |
A scenario describing symptoms that have occurred in an IT service or system that threatens its availability or reliability |
Error |
A fault, bug, or behavior issue in an IT service or system |
Known error |
An error that has been observed and documented |
Root cause |
The specific reason that most directly contributes to the occurrence of an error |
Known error database |
A subsection of the knowledge base or overall configuration management system (CMS) that stores known errors and their associated root causes, workarounds, and fixes |
|
|