Reliability Overview

Published: April 25, 2008   |   Updated: October 10, 2008


A reliable service or system is dependable, requires minimal maintenance, will perform without interruption, and allows users to quickly access the resources they need. These characteristics are not only true for business-as-usual conditions; they must also apply during times of business change and growth and during unexpected events. Ensuring reliability involves three high-level processes:

  • Planning. Gathering and translating business requirements into IT measures
  • Implementation. Building the various plans and ensuring that they can meet expectations
  • Monitoring and Improvement. Proactively monitoring and managing the plans and making necessary adjustments

Many outputs of the Reliability SMF, such as the availability plan, capacity plan, data security plan, and monitoring plan, provide input into the activities described in the Business/IT Alignment SMF.

Reliability SMF Role Types

The primary Team SMF accountability that applies to the Reliability SMF is the Architecture Accountability. The role types within that accountability and their primary activities within this SMF are displayed in the following table. The accountable role for Reliability is the Architecture Manager role type.

Table 1. Architecture Accountability and Its Attendant Role Types

Role Type


Role in This SMF

Architecture Manager

  • Accountable for ensuring creation and maintenance of architecture plan


  • Uses Reliability requirements to provide roadmap to support design process and ensuring reliability

Reliability Manager


  • Ensures current state meets reliability requirements


  • Looks at future directions and solutions to propose across infrastructure
  • Designs future state


  • Facilitates reliable solutions

Goals of Reliability

The Reliability SMF ensures that service capacity, service availability, service continuity, data integrity, and confidentiality are aligned to the business needs in a cost-effective manner.

Table 2. Outcomes and Measures of the Reliability SMF Goals



IT capacity aligned to business needs

  • Proactive capacity plan
  • No capacity-related service disruptions
  • Procurement/purchasing plan developed and adhered to

Services available to users when needed

  • Proactive, cost-justified availability plan
  • Reduction in service failures
  • Minimized service disruption from anticipated failures

Critical business services available during significant failures

  • IT disaster recovery aligned to business continuity plan
  • Tested, trusted, recovery plan supported by the business

Data integrity and confidentiality maintained

  • Data classified and managed according to business policy
  • No exceptions to data handling and integrity requirements

Key Terms

The following table contains definitions of key terms found in this guide.

Table 3. Key Terms



Availability management

The process of managing a service or application so that it is accessible when users need it. Availability is typically measured in percentage of uptime; downtime refers to periods of system unavailability.

Business continuity planning

The process for planning and practicing IT’s response to a disaster or disruptive event. These activities span the organization; beyond just IT, continuity planning affects Finance, Operations, and Human Resources (HR) functions.

Capacity management

In the context of IT, capacity refers to the processing or performance capability of a service or system. Capacity management is the process used to ensure that current and future business IT needs are met in a cost-effective manner. This process is made up of three sub-processes: business, service, and resource capacity management.

IT service continuity management

The process of assessing and managing IT risks that can significantly affect the delivery of services to the business.