Process 3: Monitoring and Improving Plans

Article
10/09/2008

Published: April 25, 2008 | Updated: October 10, 2008

Cc506066.image6(en-us,TechNet.10).jpg

Figure 6. Monitoring and improving plans

Activities: Monitoring and Improving Plans

The third process of Reliability Management is monitoring and improving plans, an ongoing procedure that ensures that the first two processes have been followed, that metrics are reported on, that exceptions to targets are tracked, and that improvements are fed back into the Plan phase. Proper monitoring ensures that either the original objectives are being achieved or steps are being taken to improve reliability or adjust business expectations. This process includes the following activities:

Monitor service reliability.
Report and analyze trends in service reliability.
Review reliability.

Business requirements and technology are both subject to frequent change. This iterative review and reporting function helps to promote an ongoing alignment between actual service delivery and business requirements, ensuring that these reliability functions are up to date and relevant.

The following table describes these activities in more detail.

Table 6. Activities and Considerations for Monitoring and Improving Plans

Activities	Considerations
Monitor	Key questions: Do we have adequate information to review and report on service reliability? Do we have an accurate, end-to-end view of the user experience? Is the information that we gather useful to the business? Is it used? Is any necessary information missing? Do monitoring plans, monitoring locations, and/or thresholds need to be adjusted? Inputs: Monitoring plan Data feeds or reports from monitoring systems. For more information about monitoring systems, see the Service Monitoring and Control SMF. Outputs: Reliability specific metrics Management dashboard Best practices If using automatic ticket creation from monitoring tools, ensure that the alert thresholds are appropriately defined so that alerts do not flood the Incident Management tool with irrelevant alerts. Automate the data collection and service monitoring tasks as much as possible, using existing tools and mechanisms where appropriate.
Report and analyze trends	Key questions: Are we consistently delivering an available and reliable service? In the event of an outage, was the service recovered in accordance with the service level targets? If not, why not? Does monitoring correlate with experiences? Were any events, thresholds, triggers, or service degradation missed that should have been captured? Who needs to know how this service is performing? What reports should be generated, and when? Weekly, monthly, quarterly, annually? Are there any unexpected trends, good or bad, that should be investigated? Who should be responsible for doing this? How can this information best be used to improve service reliability and customer satisfaction? Who is responsible for ensuring that the service improvement initiatives are recorded, evaluated, and acted upon? Inputs: Service availability reports Incident mean-time-to-repair reports (see the Service Monitoring and Control SMF) Problem management trend reports (see the Problem Management SMF) Original business requirements and service-level targets Outputs: Reliability report Recommendations for improvement or technical evaluation, requirement review, improved or reduced service level commitments Best practices: Define the reporting requirements early in the process by working with the relevant stakeholders to identify their needs. Develop a common repository for raw metrics information to simplify extraction and data mining. Provide management dashboards for demonstrating and reporting service reliability metrics. Automate data collection and manipulation as much as possible, making it easy for the stakeholders to retrieve reports when they want or need them.
Review reliability	Key questions: Have reliability objectives been achieved? What needs to change? How can reliability be improved? Is there a business justification for these improvements? And will the business support the improvements with budget approval and stakeholder involvement? Can the cost of delivering on the reliability requirements be reduced with new technologies, further automation, or improved processes? Inputs: Reliability report Service availability reports Original business requirements Improvement recommendations Outputs: Information and reports for business review (can be part of the Service Alignment Management Review) Prioritized and approved list of improvement recommendations RFC to implement improvement activities Archived data storage for trend analysis Best practices: Schedule and plan for regular reliability reviews as part of the Operational Health Management Review. Ensure that participants in that review know what information they need to contribute and that they have sufficient time and resources to collect and analyze this information. Reviews should be structured with agendas, minutes, and clearly identified and assigned actions from the reviews. Involve and engage the Account Manager role in the review process. This function should help with prioritization of improvement initiatives, securing budget, and communicating progress back to the business. Much of the reporting output will be useful to this team as it reports to the business on SLA performance.

Process 3: Monitoring and Improving Plans

Activities: Monitoring and Improving Plans

Additional resources