Microsoft Availability Reporting Management Pack Guide

This document explains how to set up your system for availability reporting. It also provides information about the various reports that can be generated and what they contain.

The Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 collects and analyzes data from the event logs of your servers and then generates configurable reports that you can view and customize to suit the needs of your organization. You can use these reports to identify the causes for planned and unplanned downtime and take preemptive actions to decrease downtime in the future.

The Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 only facilitates the availability reports and does not provide the following features normally found in a Management Pack:

  • Tasks

  • Views

  • Alerts

  • Topology Diagrams

  • Notifications

About Availability Reports

Availability reports calculate many metrics of server availability and provide this data in several configurable datasets. A standard report is configured to display the resulting data. Each configuration includes tables or graphs with information about the availability and reliability of servers in your environment. You can enter parameters, such as specific periods of time, to filter the data presented.

Note

Availability is defined as the probability that applications, services, or systems are functioning properly at any given point in time. Systems with high availability have minimal downtime, whether planned or unplanned. Reliability is the likelihood of an application, service, or system continuing to function over a given period of time and under specified conditions.

Availability reports provide the following advantages:

  • Determine whether your servers are meeting their availability and reliability objectives.

  • Filter reports to track trends by viewing information collected over a specific length of time, such as over a period of months or years.

  • Filter reports to view information on a specific subset of your servers—for example, servers running a particular operating system such as Microsoft Windows 2000 or Microsoft Windows Server 2003.

  • Identify the best and worst performing computers for a particular area. For example, you can identify which servers have the most or least number of shutdowns or which servers suffer the most or least number of system failures.

  • Identify problem areas, such as a particular application or operating system version that stops responding.

  • View and analyze information gathered using Shutdown Event Tracker, for customers who use a product in the Windows Server 2003 environment. Shutdown Event Tracker is designed to track planned and unplanned server shutdowns, and it includes annotated reasons for the shutdowns.

  • Gather information on operating systems and major Microsoft applications.

Availability reports provide detailed and specific availability and reliability information for several Microsoft applications, including the following:

  • Windows NT4 Server SP4 and later

  • Microsoft Exchange Server 2003

  • Microsoft SQL Server 2000 and later

  • Microsoft Active Directory service for Windows Server 2003 and higher

  • Microsoft Internet Information Services 6.0

Note

It can take some time for the system to generate your reports.

For applications not currently supported, Availability Reporting provides basic availability and reliability statistics at the operating system level only. In such cases, Availability Reporting measures whether the operating system is running but does not measure whether applications are running.

Availability Reporting Management Pack Deployment and Configuration

The topics in this section will enable you to perform the major steps required for deploying and configuring the Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005.

How to Extract the Availability Reporting Management Pack Files

Before you install the Availability Reporting management Pack components, you must download and extract the contents of the Microsoft Operations Manager 2005 Availability Reporting Management Pack.msi file. This file is used to package and distribute the Availability Reporting files and documentation. It does not install the Availability Reporting components. The .msi file contains the following:

  • Availability Reporting.akm   The management pack file.

  • Microsoft Availability Reporting Report.xml   SQL Server Reporting Services reports.

  • MOM-MRAS.msi   The setup for the Availability Reporting SQL Reporting Server components.

  • MRAS_MP_UI.msi   The setup for the Availability Reporting MMC snap-in.

  • Microsoft Availability Reporting MP Guide.doc   Documentation for the management pack.

  • EULA.rtf   The license agreement for the management pack.

Procedures
To extract the downloaded Availability Reporting Management Pack files
  1. Run the Microsoft Operations Manager 2005 Availability Reporting Management Pack.msi file.

  2. Follow the steps in the wizard and extract the setup files and documentation to the folder you specified.

Security

You must run the .msi file under an administrator account by using the RunAs command or by logging in as a local administrator.

How to Import the Availability Reporting Management Pack and Reports

You import the Microsoft Availability Reporting Management Pack .akm file and reports using the MOM Administrator console as you would any other Management Pack in MOM 2005. For more details about importing management packs, see Chapter 6, “Deploying Management Packs,” in the Microsoft Operations Manager 2005 Deployment Guide.

Procedures
To import a Management Pack by using the MOM Administrator console
  1. In the MOM Administrator console, click Management Packs.

  2. In the details pane, click Import/Export Management Packs.

  3. On the Import or Export Management Packs page, click Import Management Packs and/or Reports.

  4. On the Select a Folder and Choose Import Type page, click Browse.

  5. Navigate to the folder where you extracted the Availability Reporting components, and then click OK. For more information about extracting the files, see "Extracting the Availability Reporting Management Pack Files".

  6. On the Select Management Packs page, select the Management Packs that you want to import, select an import option, and then click Next.

  7. To complete and close the wizard, click Finish.

Security

The script used by the Management Pack must read the registry of the local computer. Registry hives have permissions (ACLs) set on them individually, and therefore the permissions required by the agent’s Action Account depend on these registry permissions. Most registry hives allow read access for local users.

The minimum privileges documented in the MOM 2005 Security Guide will be sufficient, unless one or more of these registry hives has more restrictive settings on them.

How to Install the Availability Reporting Management Pack Reporting Components

The MOM-MRAS.msi file installs the MOM reporting database back-end tables and the DTS job for creating availability back-end tables used to create and display Availability reports.

Procedures
To install the Reliability Analysis Reporting components
  1. Browse to the folder where you extracted the Availability Reporting components. For more information about extracting the files, see "Extracting the Availability Reporting Management Pack Files."

  2. Run the MOM-MRAS.msi setup on the MOM reporting server computer with the SystemCenterReporting database.

  3. On the SQL Server Information wizard page, specify the name and instance for the SQL Server that hosts the MOM reporting database. Click Next.

  4. On the User Information for Scheduled Task page, specify an account for the scheduled task that runs the Availability Reporting component, which updates the availability data in the MOM reporting database.

  5. Follow the remaining wizard pages to chose an installation location and complete the setup.

Security

You must run the .msi file under an administrator account by using the RunAs command or by logging in as a local administrator.

How to Install and Configure the Availability Reporting MMC Snap-In

You can install the Availability Reporting Console MMC snap-in on any computer that has a connection to the MOM Reporting Server. This snap-in is used to create and configure Availability reports.

Procedures
To install the Availability Reporting MMC snap-in files
  1. Browse to the folder where you extracted the Availability Reporting components. For more information about extracting the files, see "Extracting the Availability Reporting Management Pack Files".

  2. Run the MRAS_MP_UI.msi file and follow the instructions in the setup wizard.

To configure the Availability Reporting MMC console
  1. Either open an existing MMC console (*.msc file) or create a new one by clicking Run on the Start menu and typing mmc in the Run window.

  2. On the File menu, select Add/Remove Snap-in.

  3. In the Add/Remove Snap-in dialog, on the Standalone tab, click Add.

  4. In the Add Standalone Snap-in dialog, select the Microsoft Availability Reporting Console snap-in and click Add.

  5. To save the console to the Administrative Tools folder, click File, then Save As. Type a name, such as Availability Reporting.msc, that will allow other users to easily identify the console, and then click Save.

  6. If this is the last snap-in you are adding to the console, click Close.

Security

You must run the .msi file under an administrator account by using the RunAs command or by logging in as a local administrator.

You must be a member of the MOM ReportPublishers group to use the console.

How to Connect the Availability Reporting Console to the Reporting Server

Before creating any reports, you must connect the Availability Reporting MMC console to the SQL Server computer that hosts the MOM Reporting Database (SystemCenterReporting).

Procedures
To connect the Availability Reporting console to the MOM Reporting Server
  1. Open the Availability Reporting MMC console (Start/Programs/Administrative Programs/Availability Reporting.msc).

  2. In the console, right-click the root node (Microsoft Reliability and Analysis Reporting Service) and select Connect To Database.

  3. In the Connect to Machine dialog, enter the DNS name of the MOM Reporting Server in the text box and click Connect.

How to Add Computer Groups to a Rule Group

The computer groups for the applications you want to monitor must be manually installed to provide availability for the applications that support the management pack. The MOM Availability Reporting Management Pack doesn't include these computer groups to prevent them overwriting existing computer groups.

By default, the Timezone Information group targets all MOM-managed Windows servers in your organization. The time zone information is needed to correctly convert the times of Windows shutdown and restart events for availability reporting.

To collect the data that is used to generate availability reports, you must add computer groups to the Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 rule group.

Procedures
To add computer groups to the rule group
  1. In the Administrator console, in the navigation pane, expand Management Packs, and then expand Rule Groups.

  2. Right-click the Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 rule group, and then click Associate with Computer Group.

  3. In the Rule Groups Properties - Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 dialog box, click Computer Groups.

  4. In the Computer Groups dialog box, click Add.

  5. In the list, click Microsoft Windows Servers, and then click OK.

  6. Click Add, click Windows Server 2003 Domain Controllers, and then click OK.

  7. Click Add, click Windows 2000 Domain Controllers, and then click OK.

  8. Click Add, click Microsoft Exchange Server 2003, and then click OK.

  9. Click Add, click Microsoft Exchange Server 2000, and then click OK.

  10. Click Add, click Microsoft SQL Server 2000, and then click OK.

    Note

    To ensure that availability reports generate correctly, do not associate computer groups to the child rule groups under the Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 rule group.

How to Create and Configure Availability Reports

The SQL Server computer must have the Availability Reporting MP reporting components installed. Once the Availability Reporting Console is connected, you use the console to create reports or edit the configuration for existing reports.

Note

Availability reporting can generate reports only when it captures the necessary events. This can result in generating an availability report only after the restart of a server or an application.

Procedures
To create an Availability report
  1. In the Availability Reporting MMC console, right-click the node (Microsoft Reliability and Analysis Reporting Service) and select Create New Report.

  2. In the New Report dialog, enter the configuration settings for the tabs as described in "Report Configuration Settings by Dialog Tab".

To modify an existing Availability report
  1. In the Availability Reporting MMC console, right-click the report in the results pane and click Properties.

  2. In the Modify Existing Report dialog, enter the configuration settings for the tabs as described in "Report Configuration Settings by Dialog Tab".

Report Configuration Settings by Dialog Box Tab

This section provides information about the various configuration settings for an Availability report. You can use this information to either create a new report or modify and existing report.

General Tab

Name—Specify the name for the report you are creating.

Application—Select one of the provided applications for which you want to create an Availability Report.

Application Version—Select a provided application version to report on.

Note

To make the version information for Microsoft SQL Server available in this drop-down list, before you use this tool you must restart the MSSQLSERVER service on the SQL Server computers you want to monitor. This enables you to create reports that compare availability for different versions of the Windows operating system.

Operating System Version—From the drop-down list, click the operating system version that you want to report upon.

Time Range—Specify the units of time over which you want the report data to cover.

Number of days/months—Specify the number of units of time over which you want the report data to cover. This parameter is not available with the year-to-date time unit.

Domains Tab

Use this tab to specify the domains that you want to report on. You can specify either particular domains or all domains (default). Only data for computers in these domains will be available for reports, but data for all domains available to the MOM Management Server will be collected.

Computer Groups Tab

Use this tab to specify the computer groups that match the reports you want to create. You can specify either individual computer groups or all computer groups (default). Only data for computers in these computer groups will be available for reports, but data for all computer groups available to the MOM Management Server will be collected.

Schedule Tab

Use this tab to specify how often and when you want to create the report. You can generate report creation weekly or monthly.

Note

This setting does not affect when the data is collected by the Availability Reporting MP but only when and how often the report is created.

You should not schedule the Windows scheduled task created by the MOMMRAS.msi at the same time the MOM DTS transfer job is run.

Information Provided by Availability Reports

Availability Reports contain many tables and graphs that combine multiple collections. The order in which these elements are described in this section roughly mirrors the order in which they appear in the reports.

Note

The Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 collects attribute information once a day for all the computers it monitors and collects events from those computers as they happen. The SystemCenterDTSPackageTask DTS job copies data from the MOM operational database to the MOM Reporting Database every night. Then the ReliabiltyAnalysisReporting DTS job finds the availability data in the MOM Reporting Database and copies it to tables created by the Availability Reporting setup. Therefore, some data might not be available for Availability reports until after the next nightly DTS job has completed.

Availability, Reliability, and Selected Failure Data Summary Table

For the server set being analyzed, the following summary statistics for reliability and availability are included in this table:

  • Mean Time to Shutdown. The average uptime duration between system or application shutdowns; longer is better.

  • Shutdown Frequency. The number of system (or application) shutdowns per server year; lower is better.

  • Mean Downtime. The average downtime duration for a system or application; shorter is better.

  • Counts and Frequencies for Operating System Failures. The count and frequency of Kernel-mode failures. For more information about operating system failures and failure analysis, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457. Online Crash Analysis (OCA) can automatically analyze individual operating system failures and can recommend specific steps to identify the root cause of the problem or to avoid the problem in the future.

  • Counts and Frequencies for Application Exceptions. The count and frequency of failures in user-mode applications. For more information about failures, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457.

  • Availability. The percentage of time a system or application is in a running state for a specified period of time; higher is better. Near-perfect availability is considered to be 99.999%, with a downtime of approximately five minutes per server per year.

Shutdown Event Tracker Graph

This graph provides user-supplied information about suspected causes for shutdowns. This information is based on Windows Server 2003 Shutdown Event Tracker, which provides a way for users to document reasons for computer shutdowns and restarts and that helps create a comprehensive picture of an organization’s system downtime environment. Shutdown Event Tracker is enabled by default and supported on all Windows Server 2003 operating systems.

The Shutdown Event Tracker graph header and legend section displays information only for computers running the Windows Server 2003 operating system. The legend displays the number of computers running Windows Server 2003, the number of shutdowns for the servers, and the percentage of server shutdowns that have a corresponding Shutdown Event Tracker annotation.

For example, a report could be run on 100 servers having a total of 200 shutdowns. The breakdown by operating system for the servers is 80 servers running Windows Server 2003 with a corresponding 50 shutdowns and 20 servers running Windows 2000 with a corresponding 150 shutdowns. Of the 50 shutdowns on the Windows Server 2003 servers, there are 45 shutdowns with corresponding Shutdown Event Tracker annotations. In this example, the information displayed in the legend would be as follows: number of servers equals 80, total shutdowns equals 50, and percentage annotated equals 90%.

System-Generated Shutdown Events Table

This table lists known system-generated reasons for operating system shutdowns. It contains the event information for the cause of the shutdown, the count for each cause, a description of the cause, and the number of causes that correspond to customer-annotated information that appears in the Shutdown Event Tracker graph.

User-Defined Exclusions Table

This table provides information about the effects of the various exclusions applied to the data during report generation and reflects the exclusions specified in the computer exclusions inputs fields. Exclusions should be used to incorporate business rules into the reports. For example, if a datacenter administrator leaves computers out of production for 24 hours following every service pack installation, it would be appropriate to apply a 24-hour initial run-time exclusion to that datacenter’s report.

Shutdown Event Tracker Graph

This graph provides a list of servers whose run time and events have been excluded from the report. All or a portion of a server’s data can be excluded from the report. Exclusion can occur for the following reasons:

  • Events are missing from the event log.

  • Events with incorrect time stamps are in the event log.

  • The server did not have events collected during the selected period of time because the server was not part of the collection list, the server was deactivated, or some other issue prevented data collection from this server.

  • The server is specified (selected) in a group but not in the report application model.

Version-Based Availability and Reliability Summary Table

This table contains summary statistics for the reliability and availability of server set in the report. The statistics are grouped by application release version and include the following:

  • Mean Time to Shutdown. The average uptime duration between system or application shutdowns; longer is better.

  • Shutdown Frequency. The number of system (or application) shutdowns per server-year; lower is better.

  • Mean Downtime. The average downtime duration for a system or application; shorter is better.

  • Availability. The percentage of time that a system or application is in a running state; higher is better. For example, 99.999% availability is approximately 5 minutes of downtime per server per year.

It is important to consider the run time of the operating system or application when analyzing the availability and reliability statistics. As a rule, six months of run time is necessary for these statistics to be meaningful.

Version-Based Failure Data Table

This table contains summary statistics for reliability and availability of the set of servers in the report. The statistics are grouped by operating system release version and include the following:

  • Mean Time to Shutdown. The average uptime duration between system or application shutdowns; longer is better.

  • Shutdown Frequency. The number of system (or application) shutdowns per server-year; lower is better.

  • Mean Downtime. The average downtime duration for a system or application; shorter is better.

  • Counts and Frequencies for Operating System Crashes. The count and frequency of failures in kernel-mode. For more information about operating system failures and failure analysis, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457. Online Crash Analysis (OCA) can automatically analyze individual operating system failures and can recommend specific steps to identify the root cause of the problem or to avoid the problem in the future.

  • Counts and Frequencies for Application Exceptions. The count and frequency of failures in user-mode applications. For more information about failures, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457.

  • Availability. The percentage of time that a system or application is in a running state; higher is better. For example, 99.999% availability is approximately 5 minutes of downtime per server per year.

It is important to consider the run time of the operating system or application when analyzing the availability and reliability statistics. As a rule, six months of run time is necessary for these statistics to be meaningful.

Monthly Shutdowns, Downtime, and Failure Data Table

This table lists monthly counts of shutdowns, duration of downtime, application failures, and operating system failures. The following statistics are included:

  • Counts and Frequencies for Operating System Crashes. The count and frequency of failures in kernel-mode. For more information about operating system failures and failure analysis, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457. Online Crash Analysis (OCA) can automatically analyze individual operating system failures and can recommend specific steps to identify the root cause of the problem or to avoid the problem in the future.

  • Counts and Frequencies for Application Exceptions. The count and frequency of failures in user-mode applications. For more information about failures, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457.

Top Server Mean Time to Shutdown Table

This table shows the average uptime duration between system or application shutdowns; longer is better. The table shows the individual instances having the longest mean time to shutdown statistics, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Bottom Server Mean Time to Shutdown Table

This table shows the average amount of time until a system or application is shut down. The table shows the individual servers having the shortest mean time to shutdown statistics, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Longest Individual Uptimes Table

This table lists the longest individual server uptimes during the reporting period. When the Hours of Operation filter is used, the uptime duration reflects only the uptime duration that occurred during selected hours of operation.

Top Server Mean Downtime Table

This table shows the average amount of time a system or application spends in a stopped state; shorter is better. The table shows the individual servers having the shortest mean downtime statistics, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Bottom Server Mean Downtime Table

This table shows the average amount of time a system or application spends in a stopped state; shorter is better. The table shows the individual servers having the longest mean downtime statistics, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Outage Distribution Graph

This graph shows the distribution of server outages by outage duration. It shows the percentage of total outages that have a duration less than or equal to the time interval in which they occur. Using this graph, you can determine whether all the outages are occurring for approximately the same amount of time or whether the majority of the outages are occurring for a short period of time with a few long outages.

For example, if 80% of the outages are at an outage duration of 10 minutes, 90% of the outages are at an outage duration of 50 minutes, and 95% of the outages are at an outage duration of 60 minutes, this means that 80% of the individual outages were less than 10 minutes in duration, 90% of the individual outages were less than 50 minutes in duration, and 95% of the individual outages were less than 60 minutes in duration.

In addition, this means that 80% of the individual outages were between zero and 10 minutes in duration, 10% of the individual outages were between 10 and 50 minutes in duration, and 5% of the individual outages were between 50 and 60 minutes in duration.

Longest Individual Outages Table

This table shows the longest individual server outages that occurred during the reporting period. This table can be used in conjunction with the Outage Distribution graph to identify long downtime periods that are increasing the overall mean downtime of the system and therefore should be investigated. When the Hours of Operation filter is used, the downtime duration reflects only the downtime duration that occurred during selected hours of operation.

Top Server Operating System Crash Frequency Table

This table lists the number of operating system failures or stop errors per server year; lower is better. The table shows the individual servers having the lowest frequency of operating system failures or stop errors, based on release version.

Bottom Server Operating System Crash Frequency Table

This table shows the number of operating system failures or stop errors per server-year; lower is better. The table shows the individual servers having the highest frequency of operating system failures or stop errors, based on release version.

Operating System Crash Legend Table

This table contains brief descriptions and counts for each operating system failure or stop error. In Stop Code, a 32-bit hexadecimal number associated with the cause of the kernel-mode failure is shown. In Description, a descriptive title associated with the stop-code number is shown. For more information, see the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457. Windows Online Crash Analysis (OCA) can automatically analyze individual failures and can recommend specific steps to identify the root cause of the problem or to avoid the problem in the future.

Operating System Crash Details Table

This table provides details about the individual failures that occurred during the reporting period. The table contains the server name, operation system version and service pack, stop code for the operating system failure, and date and time of the operating system failure. In Stop Code, a 32-bit hexadecimal number associated with the cause of the kernel-mode failure is shown. For more information, see the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457. Windows Online Crash Analysis (OCA) can automatically analyze individual failures and can recommend specific steps to identify the root cause of the problem or to avoid the problem in the future.

Top Server Application Exception Frequency Table

This table provides the number of application exceptions per server-year; lower is better. The table shows the individual servers having the lowest frequency of application failures, based on release version.

Bottom Server Application Exception Frequency Table

This table provides the number of application exceptions per server-year; lower is better. The table shows the individual servers having the highest frequency of application failures, based on release version.

Application Exception Counts by Module Table

This table shows the individual user-mode modules having the highest failure counts. Check with the software vendor to see whether there are any patches or updates for modules having unusually high application failure counts. Also, verify that these modules are configured correctly and that there is enough disk space and memory capacity for these modules to function properly.

Application Exception Details Table

This table contains information about the individual application failures that occurred during the reporting period. The table contains the server name, operating system version, service pack number, name of the module that had the failure, and date and time of the failure. For more information about failures, visit the Microsoft Online Crash Analysis site at https://go.microsoft.com/fwlink/?LinkId=51457.

Top Server Availability Table

This table provides the percentage of time that a system or application is in a running state; higher is better. For example, 99.999% availability is approximately 5 minutes of downtime per server per year. The table shows the individual servers having the highest availability, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Bottom Server Availability Table

This table provides the percentage of time that a system or application is in a running state; higher is better. For example, 99.999% availability is approximately 5 minutes of downtime per server per year. The table shows the individual servers having the lowest availability, based on release version. Servers with a metric of “not applicable” cannot be given a priority order; therefore, these computers are placed at the bottom of the list. Other metrics can be calculated for individual servers. It is recommended that you investigate the other report tables to determine the complete picture of the reliability and availability of each individual server.

Availability and Reliability Metrics

After Availability Reporting analyzes the data collected from your server event logs, it presents the information in reports that you can view. Your system administrator or other authorized personnel can filter data to create reports that are relevant to your organization’s needs. Each report includes tables or graphs with information about the availability and reliability of servers.

Table 1 lists and defines the availability and reliability metrics that appear in Availability Reporting reports.

Table 1   Definitions of metrics provided in availability reports

Metric

Definition

Availability

The percentage of time that a system or application is in a running state; higher is better. For example, 99.999% availability is approximately five minutes of downtime per server per year.

Application exception

A user-mode exception in an application that is caught and reported to the event log by the fault-logging application. These types of failures usually do not result in operating system downtime.

OS Crash (Stop error)

A kernel-mode exception in Windows. Stop errors always result in operating system downtime.

Shutdowns

The number of times the system or application is in a stopped state.

Shutdown Frequency

The number of system (or application) shutdowns divided by the total run time for a system (or application); lower is better.

Mean Time to Shutdown

The average uptime duration between system or application shutdowns; longer is better.

Mean Downtime

The mean duration of all downtimes; shorter is better.

Technical Reference: Availability Reporting Management Pack

This section provides reference data for this Management Pack, including attributes collected, associated computer groups associated, scripts used, event rules and reference information about the report configuration property page settings.

Computer Attributes, Computer Groups, Scripts

This section provides reference data for this Management Pack, including attributes collected, associated computer groups associated, scripts used, and event rules.

Computer Attributes

The Availability Reporting Management Pack collects the following attributes for computers. These attributes are registry keys from the Timezone Information registry group in the Windows registry.

  • DayLightBias

  • ActiveTimeBias

  • DayLightName

  • Time Zone Bias

  • StandardBias

  • Bias

  • Standardname

Computer Groups

The Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 includes the Timezone Information computer group.

Scripts

The Microsoft Availability Reporting Management Pack for Microsoft Operations Manager 2005 includes only the Availability Reporting Service Discovery script. This script attempts to detect a 6008 event (dirty shutdown). The binary part of this event on the local computer contains the time of the shutdown; the binary part is extracted and collected by the management pack. Time zone information is required to correctly convert this shutdown time according to the time zone of the server. This script attributes used in availability reporting and is associated with the Run Availability Reporting Service Discovery rule in the Availability Reporting rule group. The script has no parameters. The Run Availability Reporting Service Discovery rule is a timed-event rule and runs every day at 00:23 hours, by default.