Process 1: Define Service Monitoring Requirements

Article
04/27/2008

Figure 3. Define service monitoring requirements

Activities: Define Service Monitoring Requirements

Before introducing a new service into the IT environment, the SMC team needs to determine what is required to monitor the health of the service. The SMC team works with those who will release the new service and those responsible for ongoing operations of the service after its release to the production environment to identify needs and dependencies, breaking down the service into steps to ensure accurate monitoring. This information is used to create a health model, which defines whether a system is healthy—that is, operating within normal conditions—or if it has somehow failed or degraded. This model becomes the basis for system events and instrumentation on which monitoring and automated recovery are built.

This process includes the following activities:

Define the IT service to be monitored.
Prepare the service component health model.
Review the reliability requirements.

The following table describes these activities in greater detail.

Table 4. Activities and Considerations for Defining Monitoring Requirements

Activities

Considerations

Define IT service to be monitored

Key questions:

Is this a new service or an extension of an existing one?
What does the service do?
What are the service’s technology components and their dependencies?
Who are the users?
How important is the service to the business?
How dependent is the business on this service?
How is this service dependent on or related to other IT services?
Are any service level requirements in place?

Inputs:

Configuration description from the configuration management system (CMS). For more information about the CMS, see the Change and Configuration SMF.
Functional requirements for the IT service
Operations requirements
Non-functional requirements for the IT service
Operations plan
Service Catalog
SLAs, OLAs, underpinning contracts (UCs)—if nothing exists, use key questions from this process. For more information about SLAs, OLAs, and UCs, see the Business/IT Alignment SMF.
Forward Schedule of Change (FSC). See the Deploy SMF for more information.

Outputs:

IT service descriptions:
- Technical (technologies and dependencies)
- Organizational (groups dependent on the service)

Best practices:

Understand the service’s importance to the business.
Document the service end-to-end to ensure that it is monitored as a whole—not just as a group of components.
To maximize availability, document and understand the service’s dependencies to other services.
List all stakeholders of a specific service.
Create a set of basic key performance indicators (KPIs) for all IT services so that basic measurements and comparisons can be done among all IT services.

Prepare service component health model

Key questions:

Which configuration items (CIs) make up the service? How are they related?
Should the system monitor for specific failure scenarios?
For each failure event, is there also a way to determine when the failure has stopped or has been fixed?
Which of these scenarios are related to availability? Configuration? Performance? Security?
Is the CI dependent on other CIs with which it communicates?
Which events have an impact on a CI’s availability (for example, a service stoppage)?
Which events have an impact on a CI’s performance (for example, a CPU has insufficient capacity)?
Which events have an impact on a CI’s configuration (for example, a service pack has not been installed)?
Which events have an impact on a CI’s security (for example, access denied)?
Does the severity of the event match the impact on the CI? How are events categorized?
Can sub-components and dependencies be defined so that the failure explanation is more precise?
Are there any mission-critical dependencies to other CIs, such as operating systems, hardware, network, or SAN?
Is there a way to pre-define whether or not the CI is healthy?
Does the event message explain clearly what the problem is? Does it offer a solution?
Can any events or scenarios cause event storms? Event storms are a high volume of events that are logged in a monitoring database and overload the database administrator console. How can the IT team avoid event storms?
Has the SMC team created instrumentation guidelines for the application or infrastructure configuration?
Does the service use clustering? Mirroring?

Inputs:

Events grouped by health model definition
Relationships to other CIs

Outputs:

Alert and event definitions for all CIs
Relationships to other CIs and how these affect each other
A service model that defines all CIs for the application and their relationship to other CIs
A complete health model describing each CI error description and troubleshooting hints for every type of CI alert
A definition of availability for CI via a health model
Reporting needs for IT services

Best practices:

The monitoring team and the development team should agree on standards for such items as CI definition, the preferred way of incrementing the application, format logging design, performance counters, synthetic transactions, and reporting.
Develop the monitoring definition while the service itself is being developed—this way, the definition will be ready to implement when the service is released.

Review reliability requirements

Key questions:

Is service monitoring done internally or externally?
Are team members trained in SMC for the new services?
Is the IT service documented?
What are the monitoring requirements from the Support group?
What are the monitoring requirements from the Release group?

Inputs:

Requirements, data, and KPIs from other IT functions, including availability, capacity, problem management, incident management, and service continuity. See the Reliability SMF for more information.
IT organizational diagrams
Current SMC job descriptions. For more information about accountabilities and role types, see the Team SMF.

Outputs:

SMC process document
Organizational structure that supports the entire SMC process

Best practices:

Required information from other SMFs in terms of reports and statistics should be understood and documented.
Understand the relationships between SMC and other IT functions and processes.

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.

Process 1: Define Service Monitoring Requirements

Activities: Define Service Monitoring Requirements

Additional resources