Chapter 10 - Working with Performance Counters

Article
08/31/2009

This chapter focuses on using the Microsoft Application Center 2000 (Application Center) performance monitoring feature; however, because performance monitoring and capacity planning are such complex and inter-related topics, it was necessary to provide additional background information. As a result, this chapter is divided into four major parts that:

Provide an overview of performance tuning and capacity planning.
Present high-level guidelines for performance testing and tuning.
Illustrate how to work with the default Application Center performance counters and create new ones.
Provide performance-monitoring examples by using different cluster topologies.

One dictionary defines performance as "a manner of functioning: the manner in which something or somebody functions, operates, or behaves," which nicely describes what computer system performance refers to. Unfortunately, because there are so many variables, monitoring and tuning performance isn't as easy as defining it.

A colleague once described a computer system as a series of bottlenecks in motion. In order to achieve high levels of performance on a cluster, you have to be able to identify and resolve bottlenecks—which are not isolated, but are inter-related and constantly shifting depending on what the individual cluster members and their applications are doing at any given time.

There are numerous elements to consider when dealing with performance, but whether you're dealing with a single server or several servers, you can divide these elements into the following broad categories:

The hardware
The applications (including Microsoft Internet Information Services 5.0 [IIS] and Active Server Pages [ASP] runtime)
The database
The network
The operating system (as manifested through the processor, memory, and disks)

These are the parts of a cluster environment that you have to deal with when trying to achieve and maintain high levels of performance on an Application Center cluster.

Note Although the focus of this chapter isn't on system health, remember that this element can influence the overall performance picture, and in many cases, poor performance can be a good indicator of a failing system component.

Before dealing with the specifics of performance monitoring and tuning, let's examine the different aspects of performance, including an overview of performance management and the different perspectives on performance and performance goals that you have to consider when undertaking performance tuning on a system.

Performance Management

When dealing with performance management, we're talking about the continuous process of evaluating a server to determine whether or not it can deliver the level of performance that's required, which is to say, the server's ability to handle a certain load of concurrent users. Performance management is closed linked to capacity planning; the difference is that performance management involves tuning the current system so that it can perform better, thereby enabling it to support more users. Capacity planning, on the other hand, focuses on how many users a site can support and how to scale the site so it can support more users.

While confronting the myriad of elements that make up a production system, you also have to balance the goals and priorities of two viewpoints—that of the user and that of the administrator. Although it often seems like users and administrators have conflicting views, both want the same things from a system. They want the system to provide good performance, they want the applications to work, and they want the site to be up all the time. It's really a matter of perspective; users and administrators simply have slightly different ways of viewing system goals and interpreting performance.

The User's Perspective

For most users, performance equates to speed—the perceived response time of the system they're using. When they activate a hyperlink and the requested page is retrieved and displayed quickly—typically in less than 10 seconds—their perception of performance is favorable. (It's interesting to note that it's not uncommon for a user to think that a page takes longer to retrieve and display than it actually does.)

From a user's perspective, the definition of performance and the primary goal of performance tuning is the same—make it fast. This speed-based viewpoint encompasses the following:

Initialization
Shut down
Page retrieval and rendering
Reasonable time-outs

The Administrator's Perspective

From an administrator's viewpoint, performance is a measure of how system resources are utilized by all the running programs. The scope of resource usage ranges from the lowest level program (drivers, for example) up to and including the applications that are hosted on a server.

In terms of performance tuning, the administrator's primary goal is to make the system satisfy client requests quickly and without errors or interrupts. His secondary tuning goals are:

Conserving bandwidth
Conserving CPU resources and RAM utilization

Note An indirect goal is eliminating or reducing Help Desk calls, a goal that is usually achieved by meeting the direct goals.

Unlike the user, who deals primarily with perception, the administrator can quantify resource utilization through the collection, observation, and analysis of performance data (see Figure 10.1 on page 327).

You can use performance data to:

Observe changes and trends in resource usage and workload distribution.
Quantify the relationship between the workload and its effect on system resources.
Test configuration changes or tuning efforts by monitoring the results.

Regardless of the perspective you take, you have to approach tuning systematically and employ a methodology for implementing and testing system configuration changes.

The business perspective

The business perspective also plays a significant role in performance management. In this context, someone has to do determine how much hardware is required, how to make provisions for peak loads, how to balance out spikes with low overall load, and how to determine or satisfy service-level agreements.

It's often necessary to make price and performance trade-offs—it may be too expensive to have enough servers for maintaining low processor utilization at all times, so low average utilization with spikes becomes acceptable.

An Overview of Performance Tuning

Performance tuning is the main activity associated with performance management. Reduced down to its most basic level, tuning consists of finding and eliminating bottlenecks—a condition that occurs, and is revealed, when a piece of hardware or software in a server approaches the limits of its capacity.

Before starting the performance tuning cycle illustrated in Figure 10.1, you have to do some preparatory work that establishes the framework for ongoing performance tuning activities. You should:

Identify constraints—A site's business case determines priorities, which in turn establish boundaries. Constraints, such as maintainability and budget limits, are factors that cannot be altered in search of higher performance. You have to focus performance work on factors that are not constrained.
Specify the load—This involves determining what services the site's clients require and the level of demand for those services. The most common metrics for specifying load are the number of clients, client think time (the delay between when a client receives one reply and when it submits the next request), and load distribution (steady or fluctuating, average, and peak load).
Set performance goals—Performance goals have to be explicit, which involves identifying the metrics that will be used for tuning as well as their corresponding benchmark values. Total system throughput and response time are two common metrics that are used to measure performance. After identifying the performance metrics, you have to establish quantifiable and reasonable benchmark values for each one.

Note Because performance and capacity are so closely related, the constraints, load, and goals that you identify are also applicable to capacity planning

After establishing the boundaries and expectations for performance tuning, you can begin the tuning cycle, which is an iterative series of controlled performance experiments.

The Tuning Cycle

The four phases of the tuning cycle shown in Figure 10.1 are repeated until you achieve the performance goals that you established prior to starting the tuning process. Let's examine each phase, starting with Collecting.

Bb734903.f10uj01(en-us,TechNet.10).gif

Figure 10.1 The performance tuning cycle

Collecting

The Collecting phase is the starting point of any tuning exercise. During this phase you're simply gathering data with the collection of performance counters that you've chosen for a specific part of the system. These counters could be for the network, the server, or the back-end database.

Regardless of what part of the system you're tuning, you require a baseline against which to measure performance changes. You need establish a pattern of system behavior when the system is idling as well as when specific tasks are executed (for example, adding a member to the cluster and synchronizing it to the controller). Therefore, your first data-gathering pass is used to establish a baseline set of values for the system's behavior. The baseline establishes the typical counter values that you'd expect to see when the system is behaving satisfactorily.

Note Baseline performance is a subjective standard—you have to set a baseline that's appropriate for your work environment and that best reflects your system's workload and service demands.

Once you've established your baseline, you can apply load to the system by using a tool such as Web Application Stress (WAS) to simulate user load.

Note Test scripts should try to mimic typical client usage patterns on your system and will use the load factors that you established earlier: number of concurrent connections, think time, and load distribution.

Analyzing

After you've collected the performance data that you require for tuning the part of the system that you're working on, you need to analyze the data to determine where the bottleneck is. Remember, a performance number is only an indicator—it doesn't necessarily identify the actual bottleneck because a performance problem can be traced back to multiple sources. It's also not uncommon for problems in one system component to be the result of problems in another component (a memory shortage is the best example of this; it's indicated by increased disk and processor use).

The following points, taken from the Microsoft Windows 2000 Resource Kit, provide guidelines for interpreting counter values and eliminating false or misleading data that might cause you to set inappropriate target values for tuning.

Monitoring processes of the same name—Watch for unusually large values for one instance and not the other. Sometimes, the System Monitor misrepresents data for separate instances of processes of the same name by reporting the combined values of the instances as the value of a single instance. You can work around this by tracking processes by the process identifier.
Monitoring several threads—When you are monitoring several threads and one of them stops, the data for one thread might appear to be reported for another. This is because of the way threads are numbered. You can get around this by including the thread identifiers of the process's threads in your log or display. Use the Thread\Thread ID counter for this purpose.
Intermittent spikes in data values—Don't give too much weight to occasional spikes in data. These might be due to the startup of a process and are not an accurate reflection of counter values for that process over time. Counters that average, in particular, can cause the effect of spikes to linger over time.
Monitoring over an extended period of time—We recommend using graphs instead of reports or histograms because the latter views only show the last values and averages. As a result, you might not get an accurate picture of values when you're looking for spikes.
Excluding start-up events—Unless you have a specific reason for including start-up events in your data, exclude these events because the temporarily high values they produce tend to skew overall performance results.
Zero values or missing data—Investigate all occurrences of zero values or missing data. These can hamper your ability to establish a meaningful baseline.

Configuring

After you've collected your data and completed the analysis of the results, you can determine which part of the system is the best candidate for a configuration change and implement this change.

The cardinal rule for implementing changes is only implement one configuration change at a time. A problem that appears to be related to a single component might be the result of bottlenecks involving several components. For this reason it's important to address problems individually. If you make multiple changes simultaneously, it may be impossible to accurately assess the impact of each change.

Testing

After implementing a configuration change, you'll have to complete the appropriate level of testing to determine the impact of the change on the system that you're tuning. At this point, it's a matter of determining whether or not the change:

Improved performance. Did the change improve performance, and if so, by how much?
Degraded performance. Did the change cause a bottleneck somewhere else?
Had no impact on performance. Did the change have any noticeable impact at all on performance?

If you're lucky and performance improves to the level you anticipated, you can quit. If not, you have to step through the tuning cycle again.

Testing do's

Check the correctness and performance of the application that you're using for testing by looking for memory leaks and inordinate delays in response to client requests.
Ensure that all the tests are working correctly.
Make sure that all the tests can be repeated by using the same transaction mix and the same clients generating the same load. (See "The Web Application Stress Tool," later in this chapter.)
Document changes and results.

Tip You can obtain the monitoring results of your testing from monitoring log files—which can be exported to Microsoft Excel—and the Event log.

Before—and during—performance tuning, we recommend that you use the following two resources for performance monitoring and tuning: the Microsoft Windows 2000 Server Resource Kit and Appendix C, "The Art and Science of Web Server Tuning with Internet Information Services 5.0." These two resources provide most of the information that you will need to successfully tune your network, servers, and Web servers.

Microsoft Windows 2000 Server Resource Kit: Server Operations Guide

The performance monitoring section of the Server Operations Guide contains the following topics and subtopics:

"Overview of Performance Monitoring":

"Performance Monitoring Concepts"

"Monitoring Tools"

"Starting Your Monitoring Routine"

"Analyzing Monitoring Results"

"Investigating Bottlenecks"

"Troubleshooting Problems with Performance Tools"

"Specific Monitoring Scenarios"

"Monitoring Legacy Applications"

"Integrating the System Monitor Control into Office and Other Applications"
"Evaluating Memory and Cache Usage":

"Overview of Memory Monitoring"

"Determining the Amount of Installed Memory"

"Understanding Memory and the File System Cache"

"Optimizing Your Memory Configuration"

"Establishing a Baseline for Memory"

"Investigating Memory Problems"
"Analyzing Processor Activity":

"Overview of Processor Monitoring and Analysis"

"Establishing a Baseline for Processor Performance"

"Recognizing a Processor Bottleneck"

"Processes in a Bottleneck"

"Threads in a Bottleneck"

"Advanced Topic: Changing Thread Priority to Improve Performance"

"Eliminating a Processor Bottleneck"
"Examining and Tuning Disk Performance":

"Disk Monitoring Concepts"

"Configuring the Disk and File System for Performance"

"Working with Disk Counters"

"Establishing a Baseline for Disk Usage"

"Investigating Disk Performance Problems"

"Resolving Disk Bottlenecks"

"Evaluating Cache and Disk Usage by Applications"
"Monitoring Network Performance":

"Introduction to Network Performance Analysis"

"Tools for Monitoring Network Performance"

"Resolving Network Bottlenecks"
"Measuring Multiprocessor Activity":

"Overview of SMP Performance and Monitoring"

"Monitoring Activity on Multiprocessor Systems"

"Optimizing and Tuning Multiprocessor Installations"

"Application Design and Multiprocessor Performance"

"Network Load Balancing and Scaling"

Appendix C, "The Art and Science of Web Server Tuning with Internet Information Services 5.0"

This white paper, written by George Reilly at Microsoft, is arguably the best single resource for tuning a server that's running IIS. This document covers the following topics and subtopics:

"Why Tune Your Web Servers?"
"What to Tune"

"Monitoring Your Hardware"

"Security"

"Monitoring Your Web Applications"

"Tuning Your Web Applications"

"Tools to Monitor and Test Server Performance"

"Features and Settings in Windows 2000 and IIS 5.0"

"Tuning and Troubleshooting Suggestions"
"Testing, Piloting, and Going Live"

This white paper also provides a wealth of information, such as optimal IIS metabase and registry settings, Windows 2000 optimization tips, and ASP caching. Because this document is updated on a regular basis, you should check for the most recent version at https://www.microsoft.com/technet/iis/.

An Overview of Capacity Planning

In order to realize the full potential of a site, you have to satisfy the demands of your users, which typically consist of quality of service, quality of content, and speedy access to the site's content and services. (The latter is, for most of your users, the key contributing factor to a positive user experience.) Capacity planning is the process of determining the most cost efficient method of increasing a Web site's performance and scalability, while at the same time predicting the point at which a resource will cause a bottleneck on the Web site.

The starting point for capacity planning is determining a site's capacity, which is determined by:

The number of users it can handle before performance falls off
The server's ability to handle increased load, either due to an increased number of users or increased content complexity
The nature of the site's content, which is to say, the complexity of its applications

Note Capacity is influenced indirectly by performance; a well-tuned site can increase capacity by making better use of existing resources and, in some cases, free up resources. At some point, regardless of how well tuned your site is, the site cannot handle more traffic without degrading performance. This is the point at which you either have to scale up by upgrading/replacing the existing servers or scale out by increasing the size of your cluster.

Ideally, you will have done some capacity planning that establishes acceptable performance benchmarks and resource usage limits, and you will have either scaled or have a plan in place to scale your system before performance degrades.

The key factors for successful capacity planning are:

Understanding the nature of the site's content. Different types of content (for example, static HTML pages and ASP pages) have a different—and often dramatic—impact on system resources. Your capacity planning has to take into account how the existing content types affect capacity, as well as how a change in the content mix could affect resource usage.
Understanding the site's users. You have to be able to understand site usage patterns in order to predict traffic growth and accommodate short-term usage spikes.

Once again, you have to gather baseline data before you can determine when and how to increase the capacity of your system.

We recommend that you use the "Capacity Planning" white paper (Microsoft TechNet) as a guide for your capacity planning activities. The Microsoft Internet Information Server Resource Kit (Microsoft Press, 1998) for IIS 4.0 and the Microsoft Information Services Resource Guide (Microsoft Press, 2000) for IIS 5.0 also provide useful information about capacity planning for Web sites.

White Paper: "Capacity Planning"

The "Capacity Planning" white paper, produced by Microsoft Enterprise Services, is available from the TechNet Web site at the following URL: https://www.microsoft.com/technet/archive/itsolutions/ecommerce/default.mspx.

"Capacity Planning" is a part of a series about applying Microsoft Enterprise Services frameworks to e-commerce solutions, and although it deals with capacity planning for sites running Windows 2000, IIS, Microsoft Site Server version 3.0, and Microsoft SQL Server version 7.0, it's methodology is not limited to these products or this particular business solution.

The following topics and subtopics are covered in the white paper:

"Introduction to Capacity Planning"

The introduction covers the why, when, and how of capacity planning and introduces the capacity planning equation: Number of supported users = Hardware capacity/load on hardware per user
"Analyzing Your Site"

"Dynamic Content Analysis"

"Site Server 3.0 Commerce Edition"

"Transaction Cost Analysis"

"Predicting Site Traffic"

"Analyzing a Typical User"

"Acceptable Operation Parameters"

"A Detailed Test Methodology"

"User Cost Calculations" (for CPU, memory, disk, and network)
"Deriving Site Capacity"

In this section, you'll learn how to calculate hardware needs and how to plan site topology scalability—both vertically and horizontally.

Testing and Tuning the Infrastructure

When monitoring and tuning your infrastructure, you have to remember that the various elements must be treated as a whole as well as individually. Before delving into these elements, it's necessary to understand the two primary performance metrics that are used for both performance tuning and capacity analysis—throughput and response time.

Throughput

Throughput is a measure that describes the rate at which a server can process requests. The higher the throughput is, the better your servers can accommodate spikes in the load.

Typically, throughput is expressed in terms of requests per second or requests per day. Because of browser behavior (for example, a single HTML page request might be coupled with separate requests for imbedded images or frames), it can help to think of throughput in terms of page hits.

Note Some administrators estimate throughput on their sites by dividing the number of clients by their think time, or the number of seconds that a user takes to read a page before clicking a link. Using this approach, if you had 1000 users with an average think time of 10 seconds, the throughput would be 100 requests per second. However, throughput is really a function of how quickly requests arrive at the server and how quickly the server can respond to these requests.

The following factors can diminish throughput:

Bandwidth
Page size
Application complexity

You can measure throughput by using two performance counters that provide instant and historical values. For static HTML pages, you should use Web Service(_Total)\Get Requests/sec, and for ASP pages, you should use Active Server Pages\Requests/sec. After obtaining baseline data, you should apply stress to your server to determine how the increased load affects throughput and system resources.

Response Time

The two determining factors in response time are network latency, which is the time it takes a request to move through the server request queue, and request execution time.

Network latency is the measure of how long a data packet takes to travel between two points. In today's network environments, there are many factors that have an impact on latency, including network congestion, link quality and bandwidth, the physical distance between the two points, and the hop count between the two points.

Don't forget that network latency also affects the time it takes for a request to return from the server to the client.

Note Even in a hypothetical network scenario with zero latency, a request can still spend time in a server queue (request queue time) before it is processed. The number of outstanding requests in the server queue determines this queue time; typically, server queue length is proportional to the server load.

Two important response-time measures are the time-to-first-byte (TTFB) and the time-to-last-byte (TTLB) values. These values are provided whenever you run test scripts by using the WAS tool, which is documented in "The Web Application Stress Tool," later in this chapter.

Note TTFB and TTLB are calculated by using the time that a page is first requested and the times that the first and last bytes of data are received on the client.

The second factor in response time is request execution time. In addition to adding to response time, long execution times contribute to throughput degradation. Tuning your applications is the primary method for reducing execution time as well as throughput, which is covered in "Testing and Tuning Applications," later in this chapter.

Now let's examine the counters that you can use for measuring network and server performance.

The Network and Server

The Server Operations Guide in the Microsoft Windows 2000 Server Resource Kit identifies numerous counters that you can use to monitor your system's hardware resources. The biggest challenge that you're going to face is determining which resources to monitor and which counters are appropriate for each resource.

However, you can use the suggested thresholds for the selected counters in Table 10.1 as a guideline for evaluating server performance. If your system consistently reports these values, it's quite likely that a bottleneck exists on the system and you should take the appropriate steps—tune or upgrade the affected resource.

Once you understand the baseline or average values for your site, you can then use Microsoft Health Monitor 2.1 to track deviations and alert you to potential problems.

Table 10.1 Suggested Counter Thresholds for a Server

Resource	Object/Counter	Threshold	Comments
Disk	PhysicalDisk\% Disk Time	90%
Disk	PhysicalDisk\ Disk Reads/sec, PhysicalDisk\Disk	Depends on manufacturer's specifications	Check the disk's specified transfer rate to verify that the logged rate doesn't exceed specifications.(1)
Disk	PhysicalDisk\ Current Disk Queue Length	Number of spindles plus 2	This is an instantaneous counter; observe its value over several intervals. For an average over time, use PhysicalDisk\ Avg. Disk Queue Length
Memory	Memory\Available Bytes	Less than 4 MB	Research memory usage, and then add memory, if needed.
Memory	Memory\Pages/sec	20	Research paging activity, the activity that occurs when data is swapped out of memory and stored on disk when memory is low.
Network	Network Segment\% Net utilization	Depends on network type	For Ethernet networks, the recommended threshold is 30 percent.
Paging File	Paging File\% Usage	99%	Review this value in conjunction with Available Bytes and Pages/sec to understand paging activity on your system.
Processor	Processor\% Processor Time	85%	Isolate the process that is using a high percentage of processor time. Upgrade to a faster processor, or install an additional processor.
Processor	Processor\ Interrupts/sec	Depends on the processor	A dramatic increase in this counter with a corresponding increase in system activity indicates a hardware problem. Identify the network adapter that is causing the interrupts.
Server	Server\Bytes Total/sec		If the sum Bytes for all servers is roughly equal to the maximum transfer rates for your network, you might need to segment the network.
Server	Server\Work Item Shortages	3	If this value reaches the threshold, consider tuning InitWorkItems or MaxWorkItems in the registry.
Server	Server\Pool Paged Peak	Amount of physical RAM	This value indicates the maximum paging file size and the amount of physical memory.
Server	Server Work Queues\Queue Length	4	If this value reaches the threshold, there may be a processor bottleneck. This is an instantaneous counter; observe it over several intervals.
Multiple Processors	System\Processor Queue Length	2	This is an instantaneous counter; observe it over several intervals.

(1) To monitor Logical and Physical Disk object counters, you have to activate them first by typing diskperf –yv at the command prompt. They will be enabled after you restart the system.

Note Deciding whether or not server performance is acceptable is of course, highly subjective, and should reflect the baseline values that you establish for your own environment.

The Web Server

The main elements that you have to consider when tuning your Web servers are:

Memory
Processor capacity
Network capacity, latency, and bandwidth
Disks
Security features

The following sections, which provide guidelines for handling each of these elements, are taken from Appendix C, "The Art and Science of Web Server Tuning with Internet Information Services 5.0."

Caution Remember, in keeping with the Application Center homogenous server philosophy, virtually all of the IIS configuration settings on the cluster controller are replicated to every cluster member. Therefore, you can't maintain unique IIS settings for each member. When you're tuning IIS on the controller, you have to take a holistic approach and consider what impact the Web server settings on the controller will have on the rest of the cluster members.

Memory

Monitor memory first to ensure that your server has enough before moving on to other components. Because the IIS file cache is set up to use up to one-half of the available memory by default, the more memory you have, the larger the cache can be—up to its limit of 4 GB. Lack of memory is the number one performance bottleneck on Web sites.

Note Adding more memory doesn't guarantee that all your performance problems will be solved—you should also monitor how the IIS cache settings are affecting performance.

Table 10.2 summarizes the key memory counters.

Table 10.2 Memory Counters

Counter(s)	Comments
Memory:Available Bytes	Indicates available memory. At least 10 percent of memory should be available for peak use.
Memory:Page Faults/sec, Memory:Pages Input/sec, and Memory:Page Reads/sec	Use the first counter to determine the overall rate at which the system is handling hard and soft page faults. Memory:Pages Input/sec, which should be greater than or equal to Memory:Page Reads/sec, indicates the hard page fault rate. If these numbers are high, it's likely that too much memory is dedicated to the caches.
Memory: Cache Bytes, Internet Information Services Global: File Cache Hits %, Internet Information Services Global: File Cache Hits, and Internet Information Services Global: File Cache Flushes	Because IIS automatically trims the file system cache if it is running out of memory, you can use the File Cache Hits % counter trend to monitor memory availability. Use the second counter to see how well IIS is using the file cache. On a site made up mostly of static files, this value should be 80 percent or higher. You can compare Internet Information Services Global: File Cache Hits and Internet Information Services Global: File Cache Flushes to determine whether objects are flushed too quickly (more often than they need to be) or too slowly (thus, wasting memory).
Page File Bytes: Total	Indicates the size of the paging file. The paging file on the system drive should be at least twice the size of physical memory. You can improve performance by striping the paging file across multiple disks.
Memory: Pool Paged Bytes, Memory:Pool Nonpaged Bytes, Process: Pool Paged Bytes:Inetinfo, Process: Pool Nonpaged Bytes:Inetinfo, Process: Pool Paged Bytes: dllhost#n, and Process: Pool Nonpaged Bytes: dllhost#n	Use these counters to monitor the pool space for all of the processes on the server as well as those used directly by IIS, either by the Inetinfo or Dllhost processes.

Tip Besides adding more memory, you can enhance memory performance by:

Improving data organization on the disk.
Implementing disk mirroring or striping.
Replacing Common Gateway Interface (CGI) applications with ISAPI or ASP applications.
Increasing paging file size.
Retiming the IIS file cache.
Eliminating unnecessary features.
Changing the balance of the file system cache to the IIS working set.

Processor Capacity

Bottlenecks occur in the processor when one or more processes consume practically all of the processor time. This forces process threads that are ready to be executed to wait in a queue. Adding more hardware to overcome a processor bottleneck usually isn't effective and often makes the situation worse. In a site that hosts primarily static content, a two-processor computer is sufficient. With sites that host dynamic content, a four-processor system can handle the load.

Tip Before implementing a hardware change, such as adding another processor, rule out memory problems and then monitor the processor activity.

Table 10.3 summarizes the key processor counters.

Table 10.3 Processor Counters

Counter(s)	Comments
System: Processor Queue Length	Use to flag a bottleneck. If this counter has a sustained value of two or more threads, there is likely a bottleneck.
Processor: %Processor Time	Use to flag a bottleneck. A bottleneck is indicated by a high Processor: %Processor Time value and values that are well below capacity for the network adapter and disk I/O.
Thread: Context Switch/ sec:Dllhost#n=>Thread#, Thread: Context Switch/sec:Inetinfo=> Thread#, and System: Context Switches/sec	Use to determine whether to increase the size of the thread pool.
Processor: Interrupts/sec and Processor: % DPC Time	Use to determine how much time the processor is spending on interrupts and DPCs. Client requests can be a major source of each type of load on the processor.

If the counters in Table 10.3 indicate a processor bottleneck, you have to determine if the current workload is significantly CPU-intensive. If it is, it's unlikely that a single system will be able to keep up with processing requests, even if it has multiple CPUs. The only remedy in this scenario is to add another server.

Network Capacity, Latency, and Bandwidth

The time it takes for client requests to be satisfied by a server response—latency—is one of the largest limiting factors in a user's perception of system performance. This request-response cycle time is for the most part out of your direct control as a system administrator. For example, there's nothing you can do about a slow router on the network. Network bandwidth is the most likely source of a performance bottleneck on a site that's serving primarily static content. You can monitor the network and mitigate some of these issues by tuning your connection to the network and maximizing your effective bandwidth as best you can.

You can measure effective bandwidth by determining the rate at which your server sends and receives data. There are several performance counters that measure data transmission for the various network service components available on the server. These include counters for the Web, FTP, and SMTP services, the TCP object, the IP object, and the Network Interface object.

Table 10.4 summarizes the key network-related counters.

Table 10.4 Network-Related Counters

Counter(s)	Comments
Network Interface: Bytes Total/sec	Use to determine if your network connection is creating a bottleneck. Compare this counter to the total band-width of your network adapter. You should be using no more than 50 percent of the network adapter capacity.
Web Service: Maximum Connections and Web Service: Total Connection Attempts	If you are running other services that use the network connection, you should monitor these counters to ensure that the Web server can use as much of the connection as it needs.

Note Remember to check memory and processor usage. If these numbers are high, the network might not be the problem.

Disk Optimization

Generally speaking, if there is high disk activity other than logging, this means that other areas of your system need tuning. However, the type of site you host can have a significant impact on the frequency of disk seeks. For example:

There is a very large file set that's accessed randomly.
The files on the site tend to be very large.
A database is running on the same server, and clients are making dissimilar requests.
Intensive logging routines are running.

Table 10.5 summarizes the key disk-related counters.

Table 10.5 Disk-Related Counters

Counter(s)	Comments
Processor: %Processor Time, Network Interface Connection: Bytes Total/sec, and PhysicalDisk: %Disk Time	If all three of these counters have high values, the hard disk is not causing a bottleneck. However, if %Disk Time is high and the other two counters are low, the disk might be the bottleneck.

Security Overhead

There are performance costs associated with all security techniques. Because the Windows 2000 and IIS security services are integrated into several of the operating system services, you cannot monitor security features separately from these services. The best way to measure security overhead is to run tests against the Web server with the security feature turned off and then run them again with the security feature turned on. Make sure that you run these tests against a fixed server configuration with a fixed workload to ensure that the only variable is the security feature.

Table 10.6 summarizes the key security-related items to monitor.

Table 10.6 Security-Related Items

Counter	Comments
Processor Activity and the Processor Queue	Authentication, IP address checking, Secure Sockets Layer (SSL) protocol, and encryption schemes require significant processing. You will see increased processor activity (in privileged and user mode) and an increase in context switches and interrupts. If the processors aren't adequate for the load, you'll see queues develop.
Physical Memory	The system has to store and retrieve more user information. In addition, SSL uses long keys—up to 1024 bits—for encrypting and decrypting information.
Network Traffic	You will see an increase in network traffic between the Web server and the domain controller that is used for authenticating logon information and verifying IP addresses.
Latency and Delays	The most visible performance degradation is the result of encryption and decryption, both of which use a significant number of processor cycles. Downloading files from servers by using SSL can be anywhere from 10 to 100 times slower than from servers that are not using SSL.

Tuning and Troubleshooting Suggestions

If your investigations lead you to believe that you need to address specific hardware-related performance issues, consider the alternatives listed in Table 10.7, which are based on a single Web server scenario.

Table 10.7 Tuning and Troubleshooting Your Web Server

Suggestion	Comments
Upgrade to larger L2 caches.	If you need to add or upgrade processors, select processors with a large secondary (L2) cache. Server applications, such as IIS, benefit from large processor caches (2 MB or more if the cache is external, up to the maximum available if it is on the CPU).
Upgrade to faster CPUs.	Web applications, in particular, benefit from faster processors.
Set aggressive connection time outs.	Aggressive time-outs help combat latency because open connections degrade performance. The default time-out setting in the metabase is 15 minutes.
Use expires headers.	Set expires headers on static and dynamic pages to allow content to be stored in the client's cache. This improves response time and reduces the load on the server as well as network traffic.
Enable ASP buffering.	Buffering allows all application output to be collected in a buffer before it's transmitted across the network. This cuts down on network response times. Although it reduces response time and creates the impression that a page is slower and less interactive, you can compensate for this by using Response.Flush. ASP buffering is enabled by default after a clean installation of Windows 2000, but it might not be enabled after an upgrade.
Lengthen connection queues and use HTTP keep-alives.	Longer connection queues enable you to reduce overhead by enabling the server to maintain more connection requests. HTTP keep-alives maintain a client's connection to the server even after the initial request is complete. This feature reduces latency and CPU processing. Both these techniques can help make better use of the available bandwidth.
Reduce file sizes.	Reduced file sizes generally improve performance. You can use compressed format for image files and limit the number of images and other large files. You can also reduce file sizes by tightening up HTML and ASP code, and by removing redundant blocks of code in ASP files.
Store log files on separate disks and remove nonessential information.	Disk writes for the separate log files that are maintained for each site can cause bottlenecks. Store these files on a separate partition or disk from your Web server. You can also avoid logging non-vital information. For example, you could place image files in a separate virtual directory and disable logging for that directory.
Use RAID and striping.	Use RAID and striped disk sets to improve disk access. Another option is using a controller with a large RAM cache. If the site uses frequent database access, make sure that the database is on a different server than the Web server.
Use CPU throttling, if necessary.	Use process accounting, which logs the CPU and other resources used by a Web site, to determine if process throttling should be implemented. Process throttling limits the amount of resources that a site can use. Both these features work for CGI applications and for applications that are run out of process. Take care to monitor your system carefully after implementing process throttling—it can backfire on you. Because the throttled Dllhost process runs at a lower priority, it won't respond quickly to requests from the Inetinfo process, which means that several I/O threads can be tied up, thereby degrading server responsiveness.

Testing and Tuning Applications

One of the benefits of running your Web servers in a cluster is that the impact of a slow running application is alleviated. Unfortunately, this only serves to hide an application performance issue; it doesn't fix the problem.

Before you deploy an application to a production server, it should be tested, not only for bugs and memory leaks, but for performance as well.

Anticipating Application Load

To properly test an ASP application you have to determine what type of load is anticipated for the application. We recommend that you break this load down as follows:

Total number of unique application users—You can use the total of hits per month or, for more granularity, the total number of hits per hour.
Total number of concurrent users—You should base this number on peak time usage.
Peak request rate—You should determine how many pages need to be served per second in a worst-case scenario.

Determining the Total Number of Users

In a production environment, it may be difficult to determine the total number of users and concurrent users for the application. For Internet sites, you should:

Break down the IIS server logs to segment usage data.
Take a best guess at how much traffic the site is likely to attract.
Project a worst-case usage scenario.

If your site is primarily for intranet use, you should:

Break down the IIS server logs to segment usage data.
Try to determine who is using the site. Is everyone or a selected group of users? Calculate how many computers are on the corporate network, and try to identify usage peaks.
Project a worst-case scenario.

Stress Test the Application

After you've established a context for testing, you can use the WAS tool to run test scripts against the server. This tool enables you to simulate different types and degrees of user load and collect performance data.

You can download the newest version of the tool from the WAS Tool site at https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx. While you're at the site, you should also download the white paper "Web Application Stress Test and Data Analysis." Prepared by the Unisys Consulting Service, this paper documents the work they did for an enterprise customer who wanted them to assess and analyze the scalability and performance of a Web application that made extensive use of SQL Server 7.0 stored procedures. The customer's goals included determining the appropriate hardware platform for hosting the application, addressing potential performance bottlenecks, and estimating the typical response time that the application's users could expect.

We also recommend the following print-based resources for optimizing, testing, and tuning your ASP applications:

Reilly and Gibbs, Chapter 26, "Optimizing ASP Performance," Professional Active Server Pages 3.0, WROX Press, October 1999.
Appendix A, "ASP Best Practices," in the Internet Information Services 5.0 Resource Guide, Microsoft Press, 2000.

Table 10.8 lists several additional online resources that deal with Web application performance and tuning.

Table 10.8 Application Performance Resources

Title	Author	Location
15 ASP Tips to Improve Performance and Style		https://msdn.microsoft.com/workshop/server/asp/asptips.asp
Server Performance and Scalability Killers	George Reilly	https://msdn.microsoft.com/workshop/server/iis/tencom.asp
Maximizing the Performance of Your Active Server Pages	Nancy Winnick Cluts	https://msdn.microsoft.com/workshop/server/asp/maxperf.asp
Got Any Cache?	Nancy Winnick Cluts	https://msdn.microsoft.com/workshop/server/feature/cache.asp
Tips to Improve ASP Application Performance	Srinivasa Sivakumar	https://www.15seconds.com/issue/000106.htm
Timing the Execution Time of Your ASP Scripts	Mike Shaffer	https://www.4guysfromrolla.com/webtech/122799-1.shtml
Testing the Performance of Your Web Application	Matt Odhner	https://www.microsoft.com/technet/iis/wastip.asp
Improve the Performance of Your MDAC Application	Suresh Kannan	https://www.microsoft.com/data/impperf.htm

The Web Application Stress Tool

We used the Web Application Stress (WAS) tool extensively when creating and testing our sample clusters for functionality, performance, load balancing adjustment, and monitor testing. Because this tool realistically simulates multiple browsers requesting pages from a Web application, you can gather meaningful performance metrics.

You can create the scripts that the WAS tool uses in several ways: manually, by recording browser activity, by pointing to an IIS log file, by pointing to the content tree, or by importing a script. The many benefits of using Web Application Stress include:

Using multiple user names and passwords to gain access to test sites that use the most common forms of authentication and encryption including Distributed Password Authentication (DPA), NTLM, and SSL.
Support for dynamic cookies that maintain a relationship with the WAS clients, which enables realistic personalized test scenarios and session support.
Running a test script by using any number of clients, all of which can be controlled from a single centralized WAS manager.
Configurable bandwidth throttling to simulate modem throughput.
A custom query-string editor that allows you to save name-value pair combinations as templates and then use these templates across multiple tests.
Providing summary reports with extensive performance data, including percentiles that remove outliers. In addition to the performance data gathered by the test targets, WAS allows you to specify performance counters that you run against the targets, which can be used to provide a validity check on performance data.
Support for page groups, which allows you to logically group files and control script flow execution.
Configuration of time delays between requests (socket level) and script item requests, which enables you to produce exact time sequences for testing trace conditions.

Figure 10.2 shows the configuration options that are available in WAS. (Note: The last option, which is not fully visible, is Name resolution. You can enable this option so that network lookups on remote clients are supported.)

Bb734903.f10uj02(en-us,TechNet.10).gif

Figure 10.2 The WAS tool configuration window

In addition to the configuration options shown in Figure 10.2, you can configure individual pages that are used in your test scripts. Table 10.9 summarizes the main configuration settings that you can use at the page level.

Table 10.9 Page-Level WAS Configuration Options

Setting	Description
HTTP Verb	Specify the GET, POST, HEAD, or PUT method for handling the page.
Querystring	Specify formatting; provide name, distribution, and value. Import ASP or HTML fields.
Post data	Specify custom POST data in text or binary format.
Custom headers	Use default header information or provide custom HTTP headers. Headers can be static or dynamic.
SSL	Enable SSL for a page.
Remote Data Services (RDS)	Enable Remote Data Services (RDS) and convert query to RDS format.

Figure 10.3 shows the WAS reporting interface and the sample report that was generated after we ran one of our test scripts against a test cluster consisting of two Web servers.

Bb734903.f10uj03(en-us,TechNet.10).gif

Figure 10.3 Performance data generated by the WAS tool

Note If you want to test loads for clients that are running the Microsoft Win32 API, download the Windows DNA Performance Kit Beta from https://www.microsoft.com/com/resources/windnaperf.asp.

Using WAS to Test NLB Web Clusters

Because a WAS stress test uses a small, limited set of client IP addresses and ports, the Network Load Balancing (NLB) assumption of wide distribution in client numbers is invalidated. As a result, you may observe uneven traffic across the cluster.

The following factors will affect the distribution of traffic in WAS testing for an NLB cluster:

Load balancing affinity—For best results, the cluster should be configured for No affinity. If Single IP or Class C affinity is used, be sure to use several WAS clients, with different Class C addresses in the latter case. No affinity is often the most practical choice.
The number of WAS clients—Each WAS client uses a single IP address for all HTTP connections. The more clients that are used, the more diversity there is in client IP numbers.

Note Adding multiple IP addresses to a single client will not affect WAS behavior because only one IP is ever used per computer.

At the socket level, WAS uses an implicit bind when making a request. This means that the operating system supplies the client IP address and port. Microsoft Windows NT behavior is to always provide the interface address from its routing table. This interface address is unique, so adding additional IP addresses to a network adapter does not provide more diversity to the WAS client address space.
HTTP keep-alives—When HTTP keep-alives are enabled, all items in a page group are requested over a single socket. Because this socket uses a common IP address and client port, NLB sends all requests in that page group to the same Web server. Disabling keep-alives forces a different client port for each item in the page group. This means that each item can be served from a different Web server.

Note With Single or Class C affinity, the keep-alive feature will not affect load balancing. Disabling keep-alives applies only to the No affinity setting.

Windows NT uses incremental local port numbers in the 1500 through 4000 range, looping back to 1500 after exceeding the upper boundary. This provides excellent diversity in port numbers; however, keep-alives must be avoided in order for large page groups to take advantage of this.

Performance Counters

As we noted in Chapter 7, "Monitoring," Application Center enables a default set of performance counters that are used to capture performance data on every cluster member and logs this data to the Application Center Events and Performance Logging database. As soon as you create a cluster on a server, or add a server to a cluster, counter logging is initiated and counter data is written to the local instance of the ACLog database.

Note The default counters are defined in the file Perflogconsumer.mof, which is used to create the Windows Management Instrumentation (WMI) counter instances that the Application Center Events and Performance Logging database uses. In turn, a WMI performance-logging consumer uses an agent to write counter information to the database. In order to display this counter list in the user interface, Application Center queries the database with a query component.

Each of the installed counters can be enabled for graphing on the performance chart that's available for the cluster or member nodes by using a Web page dialog that you can launch from any performance chart that's displayed in the details pane of the snap-in. (See "Enabling Counter Graphing," later in this chapter.)

The Default Performance Counters

The cross-section of counters selected as the Application Center default performance counters are listed in Table 10.10. Based on the feedback provided by Microsoft Consulting Services, product teams, early adopters, and beta testers, it was determined that these counters were the ones most likely to be used on a regular basis by system administrators. These counters should meet most of your normal operational performance monitoring requirements. You'll notice that most of these counters have already been identified in earlier sections of the chapter that dealt with monitoring the different aspects of a Web server environment. You can, of course, add additional counters, which we'll cover later in this section.

In addition to listing the Application Center default performance counters alphabetically by name, Table 10.10 also provides a short description for each counter, identifies the counter's unit of measurement, and identifies the scope of the data. Scope describes what the data represents, the present value, an accumulated value, an average, or data collected over a period of time.

Table 10.10 ApplicationCenter Performance Counters

Counter	Description	Units	Scope
Available Bytes (memory)	The amount of physical memory that is available to processes running on the computer. It is calculated by summing space on the Zeroed, Free, and Stand by memory lists. This figure should be at least 5 percent of total memory at all times.(1)	Bytes	Present value
Bytes Total/sec (Web Service)	The sum of Bytes Sent/sec and Bytes Received/sec. This is the total rate of bytes that are transferred by the Web Service.	Integer	Data per time period
Connections active (TCP)	The number of times TCP connections have made a direct transition to the Syn-sent state from the Closed state.	Integer	Present value
Context Switches/sec (System)	This value can indicate excessive locking in code, perhaps creating a contention for resources. If too high, add another server or check with Microsoft for the latest patches.	Integer	Data per time period
Current Connections (Web Service)	The number of current client connections to the Web Service.	Integer	Present value
Current Disk Queue Length (physical disk)	The number of requests outstanding on the disk at the time the performance data is collected. It includes requests in service at the time of the reading. Multi-spindle disk devices can have multiple requests active at one time, but other concurrent requests are awaiting service. This counter might reflect a transitory high or low queue length, but if there is a sustained load on the disk drive, it is likely that this will be consistently high. Requests are experiencing delays proportional to the length of this queue minus the number of spindles on the disks.(2)	Integer	Present value
Errors per second (ASP)	The number of errors generated by ASP applications, per second.	Integer	Data per time period
Get Requests/sec (Web Service)	The number of HTTP requests that are using the GET method, per second. The GET method is the most common method used on the Web.	Integer	Data per time period
ISAPI extension requests/sec (Web Service)	The number of ISAPI extension requests that are simultaneously being processed by the Web Service, per second.	Integer	Data per time period
Page faults/sec (memory)	The number of times, per second, that the server reads the page file on the disk or from memory that is not assigned to the working set. Most CPUs can handle a large numbers of page faults without consequence; however, if disk reads are high, there might be performance degradation.	Bytes	Data per time period
Private Bytes (process: Inetinfo)	The number of bytes of memory that are taken up by a particular process (in this case, Inetinfo, which is part of IIS).	Bytes	Present value
% Privileged Time (CPU)	The percentage of non-idle processor time spent in privileged mode. (Privileged mode is a processing mode designed for operating system components and hardware-manipulating drivers. It allows direct access to hardware and all memory. The alternative, user mode, is a restricted processing mode designed for applications, environment subsystems, and integral subsystems. The operating system switches application threads to privileged mode to access operating system services). % Privileged Time includes time servicing interrupts and deferred procedure calls (DPCs). A high rate of privileged time might be attributable to a large number of interrupts that are being generated by a failing device. This counter displays the average busy time as a percentage of the sample time.	Percentage	Average of accumulated values
Processor Utilization (CPU)	The percentage of time that the processor is executing a non-idle thread. This counter was designed as a primary indicator of processor activity. It is calculated by measuring the time that the processor spends executing the thread of the Idle process in each sample interval, and subtracting that value from 100 percent. Processor bottlenecks are characterized by high Processor:% Processor Time numbers while the network adapter remains well below capacity.(3)	Percentage	Average of accumulated values
% User Time (CPU)	The percentage of non-idle processor time spent in user mode. (User mode is a restricted processing mode designed for applications, environment subsystems, and integral sub-systems. The alternative, privileged mode, is designed for operating system components and allows direct access to hardware and all memory. The operating system switches application threads to privileged mode to access operating system services.) This counter displays the average busy time as a percentage of the sample time.	Percentage	Average of accumulated values
Request execution time (ASP)	The number of milliseconds that it took the most recent ASP request to complete.	Milliseconds	Last value
Requests per second (ASP)	The number of requests executed, per second.	Integer	Data per time period
Requests Queued (ASP)	The number of requests waiting for service from the queue. This number should be small, except during heavy traffic periods. Large numbers of queued requests indicates that there is a performance bottleneck somewhere in your server.	Integer	Present value
Request wait time (ASP)	The amount of time that the most recent ASP request was waiting in the queue.	Milliseconds	Last value
Total Server Memory (SQL Server: Memory Manager)	The total amount of dynamic memory the server is currently consuming.	Bytes	Present value

(1) This value should be greater than 20 MB.

(2) This difference should average less than 2 for good performance.

(3) Processor utilization does occasionally peak at fairly high levels, but this level should not be sustained for a long period.

The System Test team's favorite counters

The Application Center System Test team identified the following counters as their favorites for isolating performance bottlenecks and identifying memory leaks:

Active Server Pages: Requests/sec
Active Server Pages: Errors/sec
Active Server Pages: Transactions/Sec
Distributed Transactions Coordinator: Response Time -- Average
Distributed Transactions Coordinator: Transactions/sec
Memory: Available MBytes
Network Interface: Bytes Total/sec
Processor: %Processor time

You can obtain a current list of the installed counters on a server running Application Center by using one of several techniques. The first method, of course, is via the Application Center user interface:

In the Application Center snap-in, in the performance chart view, click Add.

The Add a Counter dialog box, which displays all the counters that are currently installed on the system, appears.

Note It is possible to get two different counter lists depending on the way you query for them. The Add a Counter dialog box queries data from the Application Center Events and Performance Logging database; all other methods query the WMI repository. If there are counters that are not enabled for logging, the two lists will differ, with the one retrieved from the database being shorter. You can retrieve old data for counters that are no longer being collected.

The second method involves using the WMI Tester (Wbemtest.exe) or WMI Common Information Model (CIM) Studio (CIM Studio) and running one of these against the member from which you want to obtain counter information. Follow these steps:

Connect to the namespace, root\MicrosoftApplicationCenter.
Enumerate the instances of the class MicrosoftAC_CapacityCounterConfig.

Finally, for the third method, you can run the Counters.vbs script that's provided on the Application Center CD. In addition to obtaining a list of the installed counters, you can use this script to "delete" a counter. In the context of the Counters.vbs script, "delete" means to stop collecting data from the counter. It does not remove the counter from the ACLog database.

Caution You should be extremely cautious when writing any scripts that access ACLog and remove counters. If done incorrectly, you can easily affect data integrity and corrupt the database.

To run this script:

In Windows 2000, open a command prompt.
In the Run box, type Counters.vbs and then press ENTER.

Run without parameters, the script displays help for the two parameters that are available, /list and /delete. Use the /list parameter to list the installed counters and the /delete parameter, accompanied by a counter name enclosed in quotation marks, to delete the specified counter.

Here is the Counters.vbs script:

set args = wscript.arguments 
cmd = "" 
if args.Count > 0 then  
cmd = args(0) 
end if 
select case cmd 
case "/list" 
listCounters 
case "/delete" 
deleteCounter(args(1)) 
case else 
showHelp 
end select 
function e(str) 
wscript.echo(str) 
end function 
// 
// Display help if script is executed without parameters 
// 
function showHelp() 
e("/list to display installed counters") 
e("/delete <counter name> to stop collecting a counter") 
end function 
// 
// List the counters 
//  
function listCounters() 
Set wbemLocator = CreateObject("WbemScripting.SWbemLocator") 
Set wbemService = wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter") 
wbemLocator.Security_.ImpersonationLevel=3 
Set counterInstances = wbemService.InstancesOf("MicrosoftAC_CapacityCounterConfig") 
For Each counter in counterInstances 
counterName = counter.Name 
e(counterName) 
Next 
end function 
// 
// Stop logging data from the specified counter 
//  
function deleteCounter(counterName) 
Set wbemLocator = CreateObject("WbemScripting.SWbemLocator") 
Set wbemService = wbemLocator.ConnectServer(strComputerName,"root\MicrosoftApplicationCenter") 
wbemLocator.Security_.ImpersonationLevel=3 
wbemService.Delete("MicrosoftAC_CapacityCounterConfig.Name=""" + counterName +"""") 
e("Deleted counter: " + counterName) 
end function

Adding Additional Performance Counters

If the counters that are provided don't completely meet your monitoring requirements, you can load additional counters into the Application Center namespace. Creating new counters isn't difficult; however, you should determine whether or not new counters are needed to meet an ongoing operational requirement.

When to Create New Counters

We recommend that you only create new cluster-wide counters if you intend to gather data on an ongoing basis with the intention of accumulating historical data for reporting and planning purposes. In this case, you would create the counter on the cluster controller so that the updated counter collection is replicated to all the cluster members the next time there's a full synchronization—which you can force manually after you create the new counter(s).

In situations where you require additional monitoring capability for a short period of time, such as performance tuning on a single member, you can add performance counters to that member. Remember to take the member out of the synchronization loop before creating the new counter so that the local counter collection isn't overwritten by the counter definitions on the controller. After you've finished collecting performance data, you can bring the member back into the synchronization loop; the next time a full synchronization occurs, the counter collection will be restored to its original state. If a new counter is added on a member, you need to connect directly to that member—in the Connect to server dialog box, click Manage this server only—in order to see the counter on the member. If you don't do this, you will see only the counter list for the cluster controller.

An alternative to creating a new counter is to use the available operating system tools, such as Performance Monitor and Network Monitor, to perform in-depth monitoring of the server in question. With these tools, you can log the necessary data you need for ongoing analysis without changing the structure of the ACLog database and in general, it will be easier to isolate the information you require for tuning the server or an application.

Creating a New Counter

Creating a new counter is accomplished by writing a counter definition and saving it as a MOF file or by modifying the sample counters file that's provided on the Application Center CD.

The following code illustrates a typical counter definition that defines a counter for the Application Center namespace:

// Specifies the WMI namespace for the instance 
#pragma namespace("\\root\\MicrosoftApplicationCenter") 
// 
// Counter consumer class definition 
// 
instance of MicrosoftAC_CapacityCounterConfig 
{ 
Name = "CPU 0 Interrupts/sec"; 
CounterPath = "\\Processor(0)\\Interrupts/sec"; 
CounterType = 1; 
Units = ""; 
AggregationMethod = 1; 
ClusterAggregation = 1; 
DefaultScale = 0; 
};

After you run Mofcomp against this script, the new counter is created as an instance of the MicrosoftAC_CapacityCounterConfig class. After the performance log consumer retrieves this information and logs it, a stored procedure detects the counter identifier and then writes an entry to the counter metadata table. Data integrity is enforced through this process.

Let's analyze the preceding sample in more detail and then create a new counter definition that defines a new counter for the Application Center counter collection.

The required properties for a counter are as follows:

Name—The counter name, which is used to identify the counter in the Application Center Events and Performance Logging database and the Application Center user interface. The name must be unique among all the counters that are being logged.
CounterPath—The counter path, which must be specified by using Performance Data Helper (PDH) syntax with English or the default system names: \\PerfObject( ParentInstance/ObjectInstance#InstanceIndex )\\Counter
CounterType—The counter type is 1 by default. This is an internal property. Do not change it.
Units—A string value that is used to specify the units of the counter that are displayed in the user interface.
AggregationMethod—Specifies the aggregation method that will be used to do server-wide rollup calculations. AggregationMethod determines how counter values are rolled up from one time interval to another, for example, from two hours to one day. You should not aggregate any counter that collects state, such as On or Off.

The following values can be used to specify an aggregation method for the counter:
- 0 = None—no aggregation is used; the existing value is rolled up.
- 1 = Average—when the counter value is rolled up on the server from one interval to another, the source values are averaged.
- 2 = Sum—all the values for the recording period are totaled.
- 3 = Last—the last value recorded by the counter is used.
- 4 = Min—the minimum value for the recording period is used.
- 5 = Max—the maximum value for the recording period is used.
ClusterAggregation—Specifies the method that is used to roll up server values to provide a cluster-wide aggregated value.

Warning Do not use the Min or Max aggregation methods for ClusterAggregation when a counter specifies Sum—a cumulative counter—for server aggregation. The results are not useful, very unpredictable, and not supported. In addition, a ClusterAggregation value of 0 indicates no aggregation. As a result, this counter will not be displayed in the cluster-wide view. An example of this is Thread\ID Process. ID Process is the unique identifier for this process; ID Process numbers are reused, so they only identify a process for the lifetime of that process.

Let's say, for example, that we want to add two counters to verify that there is a potential processor bottleneck caused by a client request. The two counters are Processor:Interrupts/sec and Processor:% DPC Time. The first counter tells us how much time the processor is spending on hardware interrupts, and the second tells us how much time is spent on deferred procedure calls.

The easiest way to obtain the counter information that is required for the counter definition is as follows:

On the server, in the Microsoft Management Console (MMC), open the Performance Monitor snap-in, and then click Plus.

The Add Counters dialog box appears.
Click the down arrow to the right of the Performance object box, and then click Processor, which is the object that you want to monitor.
Scroll down the list of counters for the object, and select the one that you want to use.

Figure 10.4 shows the Performance snap-in with the %DPC Time object selected as the counter. Note also that the _Total instance is selected by default.

Bb734903.f10uj04(en-us,TechNet.10).gif

Figure 10.4 The Performance snap-in and the Add Counters dialog box

Using the information provided in the Add Counters dialog box, we can start building our MOF file to add the new counters. For the counter path, we have:

\\ PerfObject = Processor
(ParentInstance/ObjectInstance#InstanceIndex) = _Total
\\ Counter = %DPC Time

The next code sample contains our new counter definition for the %DPC Time counter:

// Specifies the WMI namespace for the instance 
#pragma namespace("\\root\\MicrosoftApplicationCenter") 
// 
// DPC counter consumer class definition 
// 
instance of MicrosoftAC_CapacityCounterConfig 
{ 
Name = "DPC Interrupts/sec"; 
CounterPath = "\\Processor(_Total)\\%DPC Time"; 
CounterType = 1; 
Units = "Interrupts/sec"; 
// 
// Use averaging for cluster aggregation because summing this value across // the cluster does not provided meaningful results 
// 
AggregationMethod = 1; 
ClusterAggregation = 1; 
DefaultScale = 0; 
};

We can repeat the preceding steps to obtain information about the %Interrupt Time counter so that we can add it to the preceding code. When all of the necessary coding is finished, we'll save the file—as a text file with a .mof file name extension—on the server where we want to add the counter. Next, we'll open the command-line window, and run Mofcomp against the file to add it to the Application Center counter collection. Finally, to verify that the counters were successfully added, from the command line, we'll run Counter.vbs /list to obtain a list of the currently active counters. This list verifies that the WMI class instances were successfully stored in the WMI repository. To verify that the counter is available for logging in the Performance view, open the Add counter dialog box, and then confirm that the counter name is listed. If the counter isn't listed, check the Event view to see if any error events were generated from running Mofcomp to add the counter.

Note You should add new counters on the cluster controller. Because counters are a replicated property, any new counter information is replicated to all the cluster members. In addition, the list of cluster-wide counters that is displayed in the Application Center snap-in is retrieved from the controller.

Enabling Counter Graphing

Through the Application Center user interface, you can enable counter graphing on a per-member basis or across the cluster. This provides flexibility in managing your members, particularly when some, such as ACDW802AS in the test cluster we set up, do not have the same performance capabilities as the other members.

The steps in enabling counter graphing in a performance chart are as follows:

In the console tree (on a member or the controller), click membername to display its status page in the details pane. In addition to member status, the details pane also displays an area where counter graphs are plotted.
Click Add to activate the Add a Counter dialog box.
In the Counters list, click the counter(s) you want, and then click Add.
Click Close when you've finished adding counters.

Figure 10.5 illustrates the user interface for enabling a counter.

Cluster-wide performance graphs are displayed when you select the cluster node view. Server counter graphs are automatically rolled up to the cluster view—in accordance with the counter aggregation settings—when the same counter is enabled on every member. (See Figure 10.6, later in this chapter, for an illustration of cluster-wide counter displays.)

Bb734903.f10uj05(en-us,TechNet.10).gif

Figure 10.5 Using the Add a counter dialog box to enable graphing for a counter

Performance Monitoring Samples

This collection of samples is provided to illustrate how, with a minimal collection of performance counters, you can monitor a cluster and its members. It also demonstrates how you can test a cluster and its applications by applying a load to the cluster with the WAS tool.

Before proceeding further with our monitoring examples, there are two items that need to be highlighted: the test configuration we're using for our examples and the counter graphs.

Cluster Test Configuration

It's important to note that the test server configurations we used for working with cluster scenarios in this book are not representative of typical production servers. You should not infer any performance expectations from these tests.

If you examine the configuration summary provided in Table 10.11, you'll see that our test servers are by no means capable of delivering the same levels of performance as the servers that most of you use in a production environment. Keep this in mind when looking at the performance results provided later in this chapter. View these results as conceptual illustrations in the context of our test computers; don't use the results as performance metrics for your own equipment.

Table 10.11 Computer Configurations Used in Test Clusters

Server name	Cluster type and role	CPU	Memory	Bus Speed
ACDW516AS	Web, controller	1xP6-550	256 MB	66 MHz
ACDW802AS	Web, member	1xP6-366	256 MB	66 MHz
ACDW518AS	Web, member	1xP6-550	256 MB	66 MHz
ACDW522AS	COM+, controller	1xP6-366	256 MB	66 MHz
ACDW811AS	COM+, member	1xP6-233	256 MB	66 MHz
ACDW822AS	Web, stager	1xP6-266	128 MB	66 MHz

Counter Graphs

When you're graphing different counters, you should be aware of how the individual counter values are rolled up at the server and cluster level. Table 10.12 lists the default counters and the aggregation method that is used at the server and cluster levels. Figure 10.6 illustrates how three counters (Processor Utilization, Web Service GET Requests per second, and ASP Requests per second) are rolled up to the cluster level.

Table 10.12 Counter Aggregation at the Server and Cluster Levels

Counter name	Server aggregation	Cluster aggregation
ASP Errors per second	Average	Sum
ASP Requests Queued	Average	Sum
ASP Requests Queued	Max value	Max value
ASP Requests per second	Average	Sum
ASP Request Execution Time	Average	Average
ASP Request Wait Time	Average	Average
Memory Available Bytes	Average	Average
Memory Page Faults per second	Average	Average
Physical Disk Queue Length	Average	Sum
Inetinfo Private Bytes	Average	Average
Processor Utilization	Average	Average
Processor User Time	Average	Average
Processor Privileged Time	Average	Average
Log Database Total Memory	Average	Average
Context Switches per second	Average	Average
TCP Connections Established	Average	Sum
Web Service Current Connections	Average	Sum
Web Service GET Requests per second	Average	Sum
Web Service Bytes Total per second	Average	Sum
Web Service ISAPI Requests per second	Average	Sum

In Figure 10.6, the graph plots are denoted as follows:

A: Processor Utilization
B: Web Service GET Requests per second
C: ASP Requests per second

Note This labeling convention is used for all the sample performance graphs in the balance of this chapter.

Bb734903.f10uj06(en-us,TechNet.10).gif

Figure 10.6 Performance graph for a Web cluster with two load-balanced nodes

Referring to Figure 10.6, note that the Processor Utilization is averaged, whereas Web Service GET Requests and ASP Requests are summed.

NoteApplicationCenter performance charts exhibit the same behavior as the Windows 2000 Performance Monitor. The values used for the counter graph appear out of synchronization with the numeric values (for example, Min, Max, and Average) that appear below the graph. This is because the graph uses the values for the specified period (for example, 15 minutes), but the numeric display uses all of the values that are accumulated during the session—the session context is defined by when the Application Center snap-in is first activated.

Let's move on and work with some performance monitoring examples that employ the servers and applications that we described in Chapter 8, "Creating Clusters and Deploying Applications."

The Base Environment

Our test environment uses the same application, clusters, and members that we described in Chapter 8, "Creating Clusters and Deploying Applications." We started testing by using the following configuration, and as we tested, we changed this topology by scaling out the front-end and back-end clusters. Only a few of the performance graphs produced by our testing are shown in this section. However, the entire collection of performance graphs for the various cluster topologies is included in Appendix E, "Sample Performance Charts."

Initial Topology and Cluster Configuration

The following cluster topology was used for performance testing:

A front-end Web cluster (RKWebCluster) that consists of a single member, the cluster controller, ACDW516AS
A back-end COM+ application cluster (RKCOMCluster) that also consists of one member, ACDW522AS, the cluster controller

The Web cluster was configured as follows:

Web request forwarding disabled
NLB client affinity set to custom (none)
HTTP keep-alives disabled
Load balancing weight equal for all members

Application

We used the Pre-Flight Check application for testing and distributed it across two tiers, as described in Chapter 8, "Creating Clusters and Deploying Applications." The HTML and ASP pages are hosted on the Web tier, and the COM+ applications, AC_PF_VB and AC_PF_VC, are hosted on the COM+ application tier. Component Load Balancing (CLB) was enabled by configuring the AC_PF_VB and AC_PF_VC components to support dynamic load balancing, and ACDW522AS was identified as the member for handling component requests.

Performance Counters

Before applying a test load to the controller, we added three counters to the performance graph for the controller: Processor Utilization, Web Service GET Requests/second, and ASP Requests/second. We also added the Processor Utilization counter to the performance chart for the COM+ server. Because we're not doing any in-depth performance tuning or capacity planning, these counters are sufficient to give us a good indication of cluster performance under load and illustrate the effect of scaling out a cluster and adjusting server load balancing weights.

WAS Configuration

We used the ACPreflight script, which is included on the Resource Kit CD, for our tests and retained the script's default settings for HTTP verbs, page groups, users, and cookies.

Four WAS clients were used for testing, and the following settings were changed from their default configurations:

Stress level (threads)—88
Use random delay—0 to 1500 milliseconds
Suspend and Warmup—5 minutes

Scenario: Single-Node Web Cluster and Single-Node COM+ Application Cluster

In this first scenario, we wanted to push processor utilization up fairly high, which is why we reduced the amount of random delay that is used for TCP connections. Figure 10.7 shows the results we achieved.

Bb734903.f10uj07(en-us,TechNet.10).gif

Figure 10.7 Performance results on a single node Web cluster

The next scenario illustrates the affect that adding an additional server has on the performance indicators shown in Figure 10.7.

Scenario: Two-Node Web Cluster and Single-Node COM+ Application Cluster

For this scenario, we added the server ACDW802AS, which, as you may recall from Table 10.11, is a less robust computer than the cluster controller. However, even this server had a significant impact on the controller's performance. Figure 10.8 shows the affect that this server had on the controller's processor utilization.

Bb734903.f10uj08(en-us,TechNet.10).gif

Figure 10.8 Cluster controller (ACDW516AS) performance after creating a two-node cluster

If you compare the new Processor Utilization (A), Web GET Requests (B), and ASP Requests (C) indicators with those for the same server in Figure 10.7, you can see a noticeable difference in resource usage after adding another cluster member. Throughput decreases as well, but only on the cluster controller. The graph lines labeled D and E show the cluster total throughput for HTML and ASP pages. As you can see, total throughput is higher than on a one-node cluster.

Scenario: Three-Node Web Cluster and Single-Node COM+ Application Cluster

For this scenario, we added a third member, ACDW518AS, to the Web cluster. As expected, resource utilization decreased on the two original cluster members, but as you will note in Figure 10.9, the new member is underutilized in comparison to the other members.

The graph labeled ACDW802AS in Figure 10.9 includes Processor Utilization (A1) for the cluster controller as well as the member. For clarity, the controller's Web GETs and ASP requests aren't displayed, but their performance is at the same level as the member. If you look at the performance graph (ACDW518AS in Figure 10.9) for the new member, you'll see that although ACDW518AS has the same level of throughput as the other members—indicating that the load is well-distributed on our test cluster—Processor Utilization is significantly lower on this member.

Bb734903.f10uj09(en-us,TechNet.10).gif

Figure 10.9 Resource utilization and throughput on ACDW518AS

In the next scenario, we'll adjust the load balancing weight to reduce resource usage on the controller (ACDW516AS) and ACDW802AS.

Adjusted Load Balancing Weight

In order to take advantage of the lower processor utilization on ACDW518AS, we decided to increase the load balancing weight on this member. In the membername Properties dialog box, we set the server weight at the midway mark between the Average load and More load indicators.

Note The impact of the amount of load added to a single member is more pronounced as the number of clients increases. Nonetheless, the graphs in Figure 10.10 serve to illustrate how a load balancing weight adjustment affects server performance.

Bb734903.f10uj10(en-us,TechNet.10).gif

Figure 10.10 Resource utilization on ACDW518AS after adjusting the load-balancing weight

Referring to Figure 10.10, the graph labeled ACDW516AS shows our base performance metrics for Processor Utilization (A), Web GET Requests (B), and ASP Requests (C) after load balancing was adjusted. Note the following:

Both graphs show a spike. Note that the throughput is dropping and processor utilization is increasing, just before the first date/time indicator on the chart. This is the point where the weight was adjusted on ACDW518AS and the convergence took place.
In the ACDW518AS graph, you can see where throughput and processor utilization increased after convergence.
The white plot line (A1) in the ACDW516AS graph shows processor utilization for the cluster controller, taken from a previous test that was run before the load balancing adjustment. As you can see, processor utilization did decrease on the controller, as did throughput. Similar performance results were experienced on the ACDW802AS cluster member.

Note Even though the controller is the same class of server as ACDW518AS, higher processor utilization on the controller was expected. There are two reasons for this. First, there is a performance cost associated with the controller role; and second, there is the monitoring cost. For our tests, we used the cluster controller for all the performance monitoring displays that were generated.

Our final scenario demonstrates the effect of scaling out the COM+ application cluster.

Scenario: Three-Node Web Cluster and Two-Node COM+ Application Cluster

Up to this point, we've been using a single server on the back-end component server tier to handle all the COM+ requests coming from the Web cluster. Processor utilization on this server typically ranged from 50 percent to 65 percent during our tests. Figure 10.11 provides two graphs. The first, labeled ACDW522AS/ACDW811AS, shows the processor utilization for the component servers in RKCOMCluster, the COM+ application cluster. The second graph, labeled RKWebCluster, provides a cluster-wide performance view.

Bb734903.f10uj11(en-us,TechNet.10).gif

Figure 10.11 Resource utilization after scaling out the COM+ application cluster

Let's examine the graphs shown in Figure 10.11 in more detail, starting with the component servers. The cluster controller's processor utilization graph is labeled A; the new member's graph is A1. As you can see, the point where the new member is brought online is noticeable (approximate time 2:53 P.M.) and the reduction in the controller's processor load is significant. After the cluster is scaled out, processor utilization for both servers is evenly matched, which indicates that component requests are being well distributed between the two servers.

The RKWebCluster graph provides a cluster-wide performance view, with Web GET Requests and ASP Requests aggregated as totals for all the members.

Note The throughput levels indicated in the RKWebCluster graph remained consistent in all of our test scenarios. There were, of course, occasional drops when members were added, load-balancing weights adjusted, or cluster synchronization took place.

Two processor utilization graphs (A and A1) are shown. The dark line is the graph for processor utilization after the COM+ application cluster is scaled out; the while line shows processor utilization when there was only one COM+ application server in RKCOMCluster. As these graphs indicate, scaling out the COM+ application cluster resulted in reduced processor utilization across the Web tier.

The testing we did for this chapter is by no means exhaustive, but it gives you an idea of the performance monitoring capability that is at your disposal. To summarize, the Application Center performance monitoring interface:

Provides a commonly used collection of pre-installed performance counters
Supports the creation of additional counters on a per-member or cluster-wide basis
Supports a single console view for monitoring an individual member, an entire cluster, and multiple clusters

Resources

The following books and Web sites provide additional information about performance analysis and tuning.

Books

Microsoft Internet Information Server Resource Kit, for Internet Information Server 4.0 (Microsoft Press, 1998)

Microsoft Internet Information Services Resource Guide, for IIS 5.0 (Microsoft Press, 2000)

Microsoft Windows 2000 Resource Kit (Microsoft Press, 2000)

Web Sites

"Capacity Planning" white paper

https://www.microsoft.com/technet/archive/itsolutions/ecommerce/default.mspx

"Maximizing IIS Performance" white paper

https://www.microsoft.com/technet/prodtechnol/windows2000serv/technologies/iis/maintain/optimize/perflink.mspx

Microsoft TechNet page for IIS-related information

https://www.microsoft.com/technet/prodtechnol/windowsserver2003/technologies/webapp/iis/default.mspx

Microsoft Web Application Stress Tool Web site

https://www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx

Chapter 10 - Working with Performance Counters

On This Page

Performance Management

The User's Perspective

The Administrator's Perspective

An Overview of Performance Tuning

The Tuning Cycle

Collecting

Analyzing

Configuring

Testing

Microsoft Windows 2000 Server Resource Kit: Server Operations Guide

Appendix C, "The Art and Science of Web Server Tuning with Internet Information Services 5.0"

An Overview of Capacity Planning

White Paper: "Capacity Planning"

Testing and Tuning the Infrastructure

Throughput

Response Time

The Network and Server

The Web Server

Memory

Processor Capacity

Network Capacity, Latency, and Bandwidth

Disk Optimization

Security Overhead

Tuning and Troubleshooting Suggestions

Testing and Tuning Applications

Anticipating Application Load

Determining the Total Number of Users

Stress Test the Application

The Web Application Stress Tool

Using WAS to Test NLB Web Clusters

Performance Counters

The Default Performance Counters

Adding Additional Performance Counters

When to Create New Counters

Creating a New Counter

Enabling Counter Graphing

Performance Monitoring Samples

Cluster Test Configuration

Counter Graphs

The Base Environment

Initial Topology and Cluster Configuration

Application

Performance Counters

WAS Configuration

Scenario: Single-Node Web Cluster and Single-Node COM+ Application Cluster

Scenario: Two-Node Web Cluster and Single-Node COM+ Application Cluster

Scenario: Three-Node Web Cluster and Single-Node COM+ Application Cluster

Adjusted Load Balancing Weight

Scenario: Three-Node Web Cluster and Two-Node COM+ Application Cluster

Resources

Books

Web Sites

Additional resources