Establishing a Service Level Agreement

 

After considering the impact of downtime on your organization and deciding on a level of uptime that you want to achieve in your messaging environment, you are ready to establish a service level agreement (SLA). SLA requirements determine how components such as storage, clustering, and backup and recovery factor into your organization.

When assessing SLAs, you should begin by identifying the hours of regular operation and the expectations regarding planned downtime. You should then determine your company's expectations regarding availability, performance, and recoverability, including message delivery time, percentage of server uptime, amount of storage required per user, and amount of time to recover an Exchange database.

In addition, you should identify the estimated cost of unplanned downtime so that you can design the proper amount of fault tolerance into your messaging system.

Features in Exchange 2003 and Windows Server 2003 may affect how you design your organization to meet SLAs. For example, the Volume Shadow Copy service and the Exchange recovery storage group feature may allow you to challenge the limits that were previously imposed by your SLAs. For information about how you can implement these features to significantly reduce the time it takes to restore Exchange databases, see "SAN-Based Snapshot Backups" in the Exchange Server 2003 Disaster Recovery Planning Guide.

The following table lists some of the categories and specific elements you may want to include in your SLAs.

Categories and elements in a typical enterprise-level SLA

SLA categories Examples of SLA elements

Hours of Operation

  • Hours that the messaging service is available to users

  • Hours reserved for planned downtime (maintenance)

  • Amount of advance notice for network changes or other changes that may affect users

Service Availability

  • Percentage of time Exchange services are running

  • Percentage of time mailbox stores are mounted

  • Percentage of time that domain controller services are running

System Performance

  • Number of internal users that the messaging system concurrently supports

  • Number of remotely connected users that the messaging system concurrently supports

  • Number of messaging transactions that are supported per unit of time

  • Acceptable level of performance, such as latency experienced by users

Disaster Recovery

  • Amount of time allowed for recovery of each failure type, such as individual database failure, mailbox server failure, domain controller failure, and site failure

  • Amount of time it takes to provide a backup mail system so users can send and receive e-mail messages without accessing historical data (called Messaging Dial Tone)

  • Amount of time it takes to recover data to the point of failure

Help Desk/Support

  • Specific methods that users can use to contact the help desk

  • Help desk response time for various classes of problems

  • Help desk procedures regarding issue escalation procedures

Other

  • Amount of storage required per user

  • Number of users who require special features, such as remote access to the messaging system

Including a variety of performance measures in your SLAs helps ensure that you are meeting the specific performance requirements of your users. For example, if there is high-latency or low available bandwidth between clients and mailbox servers, users would view the performance level differently from system administrators. Specifically, users would consider the performance level to be poor, while system administrators would consider the performance to be acceptable. For this reason, it is important that you monitor disk I/O latency levels.

Note

For each SLA element, you must also determine the specific performance benchmarks that you will use to measure performance in conjunction with availability objectives. In addition, you must determine how frequently you will provide statistics to IT management and other management.

Establishing Service Level Agreements with Your Vendors

Many businesses that place importance on high availability solutions use the services of third-party vendors to achieve their high availability goals. In these cases, achieving a highly available messaging system requires services from outside hardware and software vendors. Unresponsive vendors and poorly trained vendor staff can reduce the availability of the messaging system.

It is important that you negotiate an SLA with each of your major vendors. Establishing SLAs with your vendors helps guarantee that your messaging system performs to specifications, supports required growth, and is available to a given standard. The absence of an SLA can significantly increase the length of time the messaging system is unavailable.

Important

Make sure that your staff is aware of the terms of each SLA. For example, many hardware vendor SLAs contain clauses that allow only support personnel from the vendor or certified staff members of your organization to open the server casing. Failure to comply can result in a violation of the SLA and potential nullification of any vendor warranties or liabilities.

In addition to establishing an SLA with your major vendors, you should also periodically test escalation procedures by conducting support-request drills. To confirm that you have the most recent contact information, make sure that you also test pagers and phone trees.