Modeling Principles for Sizing and Capacity Planning

The first principle to consider when modeling for size or capacity is that you cannot determine appropriate hardware resource requirements without first creating a measurement standard for use of that hardware resource. Another way to look at it is that you cannot determine when the server is not performing optimally - or within the bounds of the SLAs - if you do not know how to measure optimal performance. The measurement standard is the baseline chart or log of acceptable server performance described in the last section of this chapter.

There are several books that describe modeling principles and formulas in great detail. Because the SMS site database is a Microsoft SQL Server(tm) database, it is not unreasonable for you to consider applying many of the same principles and techniques that are used to determine appropriate hardware requirements for computers running SQL Server to your SMS component servers. For more information about modeling principles and server sizing formulas that are effectively applied to your SMS component servers, see Chapters 8-11 in the Microsoft SQL Server 2000 Performance Tuning Technical Reference, available from Microsoft Press.

Among the many performance objects that you can track, there are three primary performance objects that you can track to determine server size and capacity. These are:

  • CPU utilization

  • Disk utilization

  • Memory (RAM) utilization

The key principle in achieving realistic and acceptable server performance is to avoid running the server at maximum hardware resource utilization on a regular basis. You must establish acceptable thresholds for hardware resource utilization to provide a reserve capacity for peak utilization periods.

CPU Utilization Thresholds

The relationship among CPU utilization, queue lengths, and response time is an important consideration when you are developing a sizing model for SMS component servers. There is a direct correlation between CPU utilization and queue lengths that affects the performance of a system. Generally, smaller queue lengths indicate better CPU performance. For most server configurations, queue lengths grow linearly until the processor reaches about 75 percent utilization as illustrated in Figure 9.1.

Figure 9.1 Graph of exponential queue length versus CPU utilization growth

cpdg_009_001c

At that point, queue length growth becomes exponential and rises quickly. This is referred to in some books as the asymptotic point. Whether your SLA stipulates 75 percent as a reasonable threshold between acceptable CPU utilization and a problem zone is up to you. As administrator, you are in the best position to understand the nuances of your own networking and server environment.

Assume that your threshold for acceptable CPU utilization is 75 percent. You should expect that, on occasion, CPU utilization greater than 75 percent probably occurs for short periods of time. Nevertheless, the longer or more often such periods occur, the more likely queue lengths and response time are adversely affected.

Disk Utilization Thresholds

Similarly, disk utilization tends to take an exponential turn at about 85 percent. For example, a 9 GB disk should not store more than 7.6 GB of data at any given time. This allows for growth and helps keep response times at a reasonable level. By the same principle, if a disk has a read/write capability of 70 requests per second, a constant read/write arrival rate of more than 60 requests (70×.85) per second in a steady state of operation indicates that the read/write capability of the disk is inadequate.

Memory Utilization Thresholds

Page faults occur when data cannot be found in memory and needs to be retrieved from disk. An operating system issues a page fault interrupt when a needed page of code cannot be found in its working set in main memory. The page fault interrupt prevents further processing until the required data has been retrieved from disk. Response time in memory is measured in microseconds (millionths of a second), whereas response time for disks is measured in milliseconds (thousandths of a second). This means that pages are retrieved from memory approximately 1,000 times faster than from disk. Consequently, to reduce the number of page faults that occur, you should increase the amount of memory in the computer.

For More Information

Did you find this information useful? Please send your suggestions and comments about the documentation to smsdocs@microsoft.com.