Problem Management Service Management Function Overview

Published: April 25, 2008

 

The Problem Management SMF provides guidance to help IT professionals resolve complex problems that may be beyond the scope of Incident Resolution requests, which are described in theCustomer Service SMF. An incident is any event that is not part of the standard operation of a service and that causes, or may cause, an interruption to, or a reduction in, the quality of service. Problem Management involves:

  • Recording incident, operations, and event data about a problem within an IT service or system.
  • When justified, researching the problem to identify its root cause.
  • Developing workarounds, reactive fixes, or proactive fixes for the problem.

Problem Management should begin at the start of a service’s lifecycle and should be applied to all aspects of IT—including application development, server building, desktop deployment, user training, and service operation. As more problems are discovered, recorded, researched, and resolved, IT will experience fewer failures. If Problem Management is performed during the period when a service is envisioned, planned, designed, built, and stabilized, the service will be deployed into productive use with fewer failures and higher customer satisfaction.

Problem Management SMF Role Types

The primary team accountability that applies to the Problem Management SMF is the Support Accountability. The role types within that accountability and their primary activities within this SMF are displayed in the following table.

Table 1. Support Accountability and Its Attendant Role Types

Role Type

Responsibilities

Role in This SMF

Customer Service Representative

  • Handles calls
  • Has first contact with user, registers call, categorizes it, determines supportability, and dispatches call

 

  • Helps the customer

Incident Resolver

  • Diagnoses
  • Investigates
  • Resolves

 

  • Watches for evidence of problems
  • Passes on incident information to Problem Manager

Incident Coordinator

  • Responsible for incident from beginning to end
  • Owns quality control

 

  • Watches for evidence of problems
  • Passes on incident information to Problem Manager

Problem Analyst

  • Investigates and diagnoses

 

  • Finds underlying root causes of the incidents

Problem Manager

  • Identifies problems from the incident list

 

  • Prevents future incidents

Customer Service Manager

  • Accountable for goals of Support
  • Covers incidents and problems

 

 

  • Oversight

Goals of Problem Management

The primary goal of Problem Management is to reduce the occurrence of failures with IT services. Its secondary goals are to generate data and lessons that IT can use to provide feedback during the IT lifecycle and to help drive the development of more stable solutions.

Table 2. Outcomes and Measures of the Problem Management SMF Goals

Outcomes

Measures

Problems affecting infrastructure and service are identified and assigned an owner.

The number of unassigned problems is reduced, and the number of problems assigned to an owner is increased.

Steps are identified and taken to reduce the impact of incidents and problems.

The number of incidents and problems that occur is reduced, and the impact of those that still occur is lessened.

Root cause is identified for problems, and activity is initiated to establish workarounds or permanent solutions to identified problems.

The number of workarounds and permanent solutions to identified problems is increased.

Trend analysis is used to predict future problems and enable prioritization of problems.

More problems are resolved earlier or avoided entirely.

Key Terms

The following table contains definitions of key terms found in this guide.

Table 3. Key Terms

Term

Definition

Problem

A scenario describing symptoms that have occurred in an IT service or system that threatens its availability or reliability

Error

A fault, bug, or behavior issue in an IT service or system

Known error

An error that has been observed and documented

Root cause

The specific reason that most directly contributes to the occurrence of an error

Known error database

A subsection of the knowledge base or overall configuration management system (CMS) that stores known errors and their associated root causes, workarounds, and fixes

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.

Download

Get the Microsoft Operations Framework 4.0

Solution Accelerators Notifications

Sign up to learn about updates and new releases

Feedback

Send us your comments or suggestions