Published: April 25, 2008 | Updated: October 10, 2008
Figure 6. Monitoring and improving plans
Activities: Monitoring and Improving Plans
The third process of Reliability Management is monitoring and improving plans, an ongoing procedure that ensures that the first two processes have been followed, that metrics are reported on, that exceptions to targets are tracked, and that improvements are fed back into the Plan phase. Proper monitoring ensures that either the original objectives are being achieved or steps are being taken to improve reliability or adjust business expectations. This process includes the following activities:
Monitor service reliability.
Report and analyze trends in service reliability.
Business requirements and technology are both subject to frequent change. This iterative review and reporting function helps to promote an ongoing alignment between actual service delivery and business requirements, ensuring that these reliability functions are up to date and relevant.
The following table describes these activities in more detail.
Table 6. Activities and Considerations for Monitoring and Improving Plans
Do we have adequate information to review and report on service reliability?
Do we have an accurate, end-to-end view of the user experience?
Is the information that we gather useful to the business? Is it used? Is any necessary information missing?
Do monitoring plans, monitoring locations, and/or thresholds need to be adjusted?
If using automatic ticket creation from monitoring tools, ensure that the alert thresholds are appropriately defined so that alerts do not flood the Incident Management tool with irrelevant alerts.
Automate the data collection and service monitoring tasks as much as possible, using existing tools and mechanisms where appropriate.
Report and analyze trends
Are we consistently delivering an available and reliable service?
In the event of an outage, was the service recovered in accordance with the service level targets? If not, why not?
Does monitoring correlate with experiences? Were any events, thresholds, triggers, or service degradation missed that should have been captured?
Who needs to know how this service is performing? What reports should be generated, and when? Weekly, monthly, quarterly, annually?
Are there any unexpected trends, good or bad, that should be investigated? Who should be responsible for doing this?
How can this information best be used to improve service reliability and customer satisfaction? Who is responsible for ensuring that the service improvement initiatives are recorded, evaluated, and acted upon?
Prioritized and approved list of improvement recommendations
RFC to implement improvement activities
Archived data storage for trend analysis
Schedule and plan for regular reliability reviews as part of the Operational Health Management Review. Ensure that participants in that review know what information they need to contribute and that they have sufficient time and resources to collect and analyze this information. Reviews should be structured with agendas, minutes, and clearly identified and assigned actions from the reviews.
Involve and engage the Account Manager role in the review process. This function should help with prioritization of improvement initiatives, securing budget, and communicating progress back to the business. Much of the reporting output will be useful to this team as it reports to the business on SLA performance.