Quantifying Availability and Scalability Requirements

 

When quantifying the level of availability you want to achieve, it is important that you compare the costs of your current information technology (IT) environment (including the actual costs of outages) and the costs of implementing high availability solutions. These solutions include training costs for your staff as well as facilities costs, such as costs for new hardware. After you calculate the costs, IT managers can use these numbers to make business decisions (not just technical decisions) about your high availability solution.

Setting high availability goals is the responsibility of many parties, and these goals must be appropriate to all stakeholders. You must evaluate the impact of setting high availability goals on messaging administrators, business users, and customers. For example, although executive management and end users may want 99.999 percent availability, messaging system administrators must make clear the cost of achieving such strict availability goals.

After deciding how you will measure availability in your organization, it is important that you routinely monitor your system to verify that you are meeting your availability requirements. For information about monitoring tools that can help you measure the availability of your services and systems, see Implementing Software Monitoring and Error-Detection Tools.

Determining Availability Requirements

Understanding Availability explained how availability can be expressed numerically as the percentage of time that a service is available for use (for example, 99.9 percent service availability). Understanding Availability also discussed how, when determining your availability percentage, you must consider the context of the service and the organization that uses that service. For example, if a public folder store on a server that hosts non-critical public folders is unavailable, productivity may not be affected. However, if a mailbox store on a server that hosts a mission-critical mailbox or public folders is unavailable, productivity may be affected immediately.

In an organization that is operational 24x7x365 (24 hours a day/seven days a week/365 days a year), systems that are 99 percent reliable will be unavailable, on average, 87 hours (3.5 days) every year. Moreover, that downtime can occur at unpredictable times—possibly when it is least affordable. It is important to understand that an availability level of 99 percent could prove costly to your business.

Instead, the percentage of uptime you should strive for is some variation of 99.x percent—with an ultimate goal of five nines, or 99.999 percent. For a single server in your organization, three nines (99.9 percent) is an achievable level of availability. Achieving five nines (99.999 percent) is unrealistic for a single server because this level of availability allows for approximately five minutes of downtime per calendar year. However, by implementing fault tolerant clusters with automatic failover capabilities, four nines (99.99 percent) is achievable. It is even possible to achieve five nines if you also implement fault tolerant measures, such as server-class hardware, advanced storage solutions, and service redundancy.

For information about the steps required to achieve these availability levels, including the implementation of fault tolerant hardware and server clustering, see Making Your Exchange 2003 Organization Fault Tolerant.

Setting availability goals is a complex process. To help you in this task, consider the following information as you are setting your goals.

Note

To further assist you in setting availability goals, answer the questions in Questions to Consider When Developing Availability and Scalability Goals.

Downtime and availability percentage considerations

Because you can schedule planned system outages to occur at a time that least impacts productivity, planned downtime is frequently treated differently than unplanned downtime. Whether you should factor planned downtime into the availability equation depends on your business needs. For unplanned outages that occur during scheduled business hours, a goal of three or four nines (99.9 percent or 99.99 percent) is less of an investment than full-time availability, which must include both planned and unplanned system outages. For more information about 8-hour versus 24-hour availability levels, see the "Availability percentages and yearly downtime" table in Understanding Availability, Reliability, and Scalability.

Even minimal scheduled downtime (for example, 2 hours a month or 24 hours a year) reduces availability to 99.73 percent. You can increase availability to 99.93 percent by reducing scheduled downtime to 30 minutes a month, or 6 hours a year. Moreover, if you use your primary messaging system server for only production purposes and to perform database backups, health checks, and other tasks on secondary servers that have copies of the same data, the chances of achieving 99.99 percent availability or higher increase.

Maintenance considerations

To determine the best high availability solution, you must understand when your users need the messaging system. For example, if there are times when the messaging system is not heavily used or is not used at all, you can perform maintenance operations (such as security patch updates or disk defragmentation processes) during these times at a reduced cost. However, if you have users in different time zones, be sure to consider their usage times when planning a maintenance schedule.

Recovery considerations

When setting high availability goals, you must determine if you want to recover your Exchange databases to the exact point of failure, if you want to recover quickly, or both. Your decision is a critical factor in determining your server redundancy solution. Specifically, you must determine if a solution that results in lost data is inconvenient, damaging, or catastrophic.

For more information about selecting a recovery solution, see the Exchange Server 2003 Disaster Recovery Planning Guide.

Determining Scalability Requirements

When planning for high availability, determining scalability requirements provides your organization with a certain amount of flexibility in the future. However, because scalability is based on future needs (for example, larger messaging volumes and increased disk space), it can be difficult to quantify. As a result, planning for scalability requires a certain amount of estimation and prediction. To help you determine the scalability requirements for your organization, consider the following information.

Hardware considerations

If your hardware budget is sufficient, you can purchase hardware at regular intervals to add to your existing deployment. (The amount of hardware you purchase depends on the exact increase in demand.) If you have budget limitations, you can purchase servers that can be enhanced later (for example by adding RAM or CPUs).

Growth considerations

Researching your organization's past growth patterns can help determine how demand on your IT system may grow. However, as business technology becomes more complex, and reliance on that technology increases every year, you must consider other factors as well. If you anticipate growth, realize that some aspects of your organization may grow at different rates. For example, you may require more Web servers than print servers over a certain period of time. For some servers, scaling up (for example, additional CPU power) may be sufficient to handle an increase in network traffic. In other cases, the most practical scaling solution may be to scale out (for example, add more servers).

For more information about scalability, see "Defining Scalability" in Understanding Availability, Reliability, and Scalability.

For information about monitoring your messaging system for the purpose of analyzing long-term trends, see "Monitoring for Long-Term Trend Analysis" in Monitoring Strategies.

Testing considerations

Re-create your Exchange 2003 deployment as accurately as possible in a test environment, either manually or using tools such as Exchange Server Load Simulator 2003 (LoadSim), Exchange Stress and Performance (ESP), and Jetstress. These tools allow you to test the workload capacities of different areas in your Exchange organization. Observing your messaging system under such circumstances can help you formulate scaling priorities. To download these tools, see the Downloads for Exchange Server 2003 Web site. For information about pilot testing, see "Laboratory Testing and Pilot Deployments" in System-Level Fault Tolerant Measures.

Monitoring considerations

After you deploy Exchange 2003, use software monitoring tools to alert you when certain components are near or at capacity. Specifically, tools such as Performance Monitor (which monitors performance levels and system capacity) and programs such as Microsoft Operations Manager can help you decide when to implement a scaling solution. For more information about monitoring performance levels, see Implementing Software Monitoring and Error-Detection Tools.