General Resource Consumption

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By Paul Hinsberg

general

Chapter 5 from Windows NT Applications: Measuring and Optimizing Performance, published by MacMillan Technical Publishing

Before you start weeding through your code looking for a memory leak or planning a complete rewrite of functions to improve processor utilization, you will want to be sure that the problem is really contained within your application. This chapter will provide definitive methods for isolating your application from the hardware and operating system that is supporting it. We will examine system performance as a whole. Then, we will pick apart and offer concrete indicators of resource problems. We will then analyze the particular resources and the problems that are unique to each one. Special attention is paid to identifying when an application is at fault and when a hardware resource is at fault. Let's not forget the operating system either. Poor configurations of the system can lead to problems with the applications and the users that the OS is trying to serve. We will work to be able to single out the true resource bottleneck and identify its cause.

On This Page

Examining Overall System Performance
Specific Resources
Summary
About the Author

Examining Overall System Performance

When examining overall system performance, you will want to examine the entire system prior to making any rash judgments about the actual cause of a problem. We have already seen how memory can possibly have an effect on the performance of the disk. In addition, the network can affect the processor. Processor performance and disk performance can also be tied together depending on your type of hardware. What this all means is that you will want to take a view of the whole system's performance to get the big picture. Zooming in on what you might think is the problem can lead to misinterpretation. When that happens, you can spend a great deal of time working on a solution to the wrong problem. Also, you could be working on your application, trying desperately to get it to perform well, when actually you have a problem with the system's hardware performance. The best way to explain this is with an example.

Getting to the Source of the Problem

Let's assume that we have some performance issue on a computer we are using for application testing. In general, the application appears to be rather slow in its response. If we were rash, we might think that because this is a network application, the problem is a slow network response or even a slow response from the server component of the application. While this might be true, it is better to analyze the entire system and then zero in on the suspected problem. We start with the four basic building blocks of an application:

Memory

Processor

Disk

Network

Every process on the system will be using at least two of these resources at any time. Consider that, if an application is doing nothing else, it is using memory and thus, potentially the disk. To get a complete picture, we start up good old Performance Monitor and analyze the primary counters for each of the resources, as detailed in Table 5.1.

Table 5.1 Counters Enabling You to Pinpoint Performance Problems

Object

Counter

Description

Memory

Pages/sec

The frequency at which data is being retrieved and sent to the page file.

Processor

%Processor Time

The amount of time that the process is not idle.

Physical Disk or Logical Disk

Ave Disk Queue Length

The length of the queue for transaction aimed at the hard drive.

Network Segment

%Network Utilization

The amount of the available network bandwidth being utilized.

Let the analysis begin! Remember that the focus of this particular section is to discover where the problem is. After that has been discovered, the following sections will go into the details of analyzing each particular resource. You should start with the memory because it is the most common resource for applications to use and abuse. Chapter 2, "Windows NT Kernel Debugger," defined the process of paging and the use of the page file on Windows NT. For the most part, if the memory's pages per second are more than 16, you might consider that you have a memory problem. However, this is not always the case. You will also need to figure out if poor disk performance is actually giving you the problem.

Note: Keep in mind that throughout the text, when values for concern are indicated, the values must be sustained values. Throughout the operation of the system there will be spikes in performance. The situations that you want to zero in on are when the critical values are reached or exceeded for a suitable period of time. The suitable period of time will be relative to the various counters that you are looking at. In the processor's world even short spikes are significant—on the order of 2–3 seconds—because the processor is operating at the nanosecond level. For the hard drive, which is barely managing milliseconds for the standard workstation, a few seconds are not as significant. Keep the component you are observing and the speed of its expected operation in mind when examining counters and comparing them to critical thresholds.

After you have examined disk performance, you might consider looking at processor performance. Certainly, a slow system might be related to an overburdened processor. Generally, processor performance above 80% will be cause for alarm. However, hardware or software can cause this. In most cases, we won't worry about hardware. In Chapter 7, "The Web Server," you will see that sometimes there are situations when you will have to dig a little deeper into the relationships between software and hardware. You should have a look at the processor's queue length. This particular counter—Processor Queue Length—is actually found under the Systems object. Understand that this is a measure of the number of transactions the operating system has in a wait state while the processor completes other tasks.

Accessing Additional Counters and Objects

If you really have the desire, you can obtain some additional objects and counters that work with the Performance Monitor to track the activity of Intel-based Pentium and Pentium II processors. There are two places to get these additional counters.

The first place is in the Windows NT Resource Kit in the \NTRESKIT\PERFTOOL\CNTRTOOL folder. Occasionally, I have had problems when adding these counters. They will affect the availability of other extended counters by slightly corrupting a Registry key. You need only to adjust the Performance Monitor's Registry keys to solve this problem. We actually discussed the keys for the counters, earlier in Chapter 1, "Introduction to Performance Monitor," when we were talking about the architecture of the Performance Monitor.

The second place you can find additional objects and counters is from the good guys at https://www.sysinternals.com. Search for "processor." You can download a utility that enables all sorts of additional counters for Pentium II processors. The details might require you to go out and get a good book on processor theory, but hey—if you're interested, it's there.

When disk activity is the source of your problem, it is more obvious than the others are. Hardware and operating systems both apply various caching and other tricks in an attempt to improve the performance of the slowest subsystem on the computer. Your first indication is usually the little flashing light on the outside of the computer case. However, nothing is ever as simple as it seems. The disk performance can, of course, be related to the memory. It might also be related to the way that the application is reading information or the condition of the file. Fragmentation and disk hardware configuration can certainly play their part.

The disk activity is highly transactionalized, especially in Windows NT where NTFS actually works much like a mini-database system. We will see, later in Chapter 8, "Monitoring Database Systems," how closely these are related. In the mean time, understanding the transactional nature of the disk will be an important factor. For this reason, the disk queue length is an important counter. This will tell us how many transactions are in the disk's queue for processing. This will really be the I/O Managers queue for processing. Any queue larger than two will be considered a bottleneck and a reason to investigate the activity further.

Last, you have the network activity. The network, although often treated as a separate entity, is about as separate from the computer as the ocean is from the shore. The tidal wave might be made of water, but if you are on the beach when it hits, you're going to feel it. Activity on the network, even if it is not directly related to your computer or your application, will affect the performance of the system. The network often calls on the other components of the computer to interpret the traffic. Generally, we can see this interpretive request affecting the processor of our systems. Usually, network issues are related to infrastructure and external system configuration issues. Of course, there are exceptions.

So, we have hit upon each of the main resources that we are concerned with. From our brief discussions, you can see that an overview of the system is a necessary first step. After we have performed this necessary first step, we can isolate one or perhaps two of the components that we are really concerned with, and then move forward in our analysis. Table 5.2 summarizes the counters and the values that we will be concerned with in hunting for the source of problems.

Table 5.2 Primary Indicators of Problems Per Major System Resource

Object

Counter

Indications of Problems

Memory

Pages/sec

Less than 16 Pages/sec is an indication of an issue. However, this is relative to the application.

Processor

%Processor Time

Less than 80% should get your interest. Also, have a look at System: Processor Queue Length.

Physical Disk or Logical Disk

Ave Disk Queue Length

Any queue with a length of more than two is a bottleneck.

Network Segment

%Network Utilization

Less than 67% is a problem for most networks.

Now, consider that these values are only guides to lead you in the right direction. They are not intended to be hard and fast rules. Many systems will behave differently. Systems from the same manufacturer are not necessarily made from the same parts—consistency is usually an added cost. This can lead to difference in performance. Also, systems running different components or running different service pack revisions are expected to have different performance behaviors. This brings up the need for performance logs, which can be used as baselines.

The Procedures for Analysis

The best thing to do for the system that you will be testing on is to first have a benchmark or baseline of how the system behaves under normal conditions. This will allow you to judge exactly how your application might be affecting the performance of the system. Sure, you could look at Table 5.2 and say, "Wow, Pages/sec is above 24 for a whole two minutes. I must have a memory problem with my application. This might be true, but what if the system always runs at about 24 Pages/sec due to another application or service? Perhaps you are running with a low memory configuration or a poorly configured page file. Unless you had a baseline of what the system behaves like normally, you might make a hasty choice. Hasty choices often lead to wasted time. If you have taken the time to performance test your application, you certainly don't want to waste time on a wild goose chase.

In Chapter 1, we thoroughly described how to create a Log file with the Performance Monitor. Prior to any testing of an application or a new version of an application, you should create a new baseline on the machine. If you have an old version of the application installed on the machine, create a baseline running through your normal test cycle with the old application, and then install the new one and test again. Compare the performance logs of the two tests and make a determination on whether you have met your performance objectives.

Now that you have your baseline for performance, you might think that you are ready; but effective analysis takes a little more planning. You must remain objective and scientific when you are analyzing a particular problem. You will want to perform the analysis in the following steps:

  1. Get the Big Picture.

  2. Hypothesize on the possible bottleneck.

  3. Test your hypothesis.

  4. Repeat Steps 2 and 3 until you have a solid hypothesis.

  5. Make an adjustment and start over with Step 1.

We have already discussed getting the big picture and establishing the baseline for comparison. After you have done this, formulate your hypothesis by looking at Table 5.2 and comparing values with your baselines. You then have to utilize the tools that you were given in the previous chapters to test the hypothesis. Reformulate and retest until you have strong confidence in the root of the performance issue. Then, make a change.

Sounds easy, doesn't it? However, this simple set of steps is actually very difficult to maintain without a certain level of discipline. In any company, especially a software development company, the need to produce is strong. Stopping to perform calculated and repeated steps for performance analysis might be difficult to do, while the guy from marketing and your boss are hanging over you like hyenas waiting for you to throw them a piece of meat. Therefore, with performance analysis you might have to choose your battles with care.

For example, if your application is historically poor on memory utilization, you will want to focus your attention only on the performance tests that indicate memory issues. You might see a processor or disk issue, but you might have to let them go in favor of meeting a deadline. I am not promoting this type of activity. However, we are not all working for Perfect Company, Inc.—we are operating in the real world. This will be amplified if the application you are working with is complex and has a long history already. Previous developers might have made choices based on technology that was present and proven at the time they wrote the code. Trying to fix everything at once is a risky business, and not one that usually results in enhanced performance.

So, in Step 5 when it says make a change, it doesn't mean make a bunch of changes, causing you to spiral into the oblivion of unknown solutions and never-ending repairs. If you have a problem and a hypothesis for the cause of the problem, make a single or very few incremental changes and re-test. This might require you to write small sample applications that mimic your primary applications' activities using different technologies. This will allow you to isolate the problem better and make direct comparisons between technologies or methodologies, prior to committing your primary application to a direction. After you decide which solution to implement, retest and compare it both to the previous application and the results from the sample application testing. Making sweeping changes to the application might fix the problem, but it has two other immediate ramifications:

  • You don't really know how you fixed the problem.

  • The chance that you have introduced other unknown problems is increased.

Consider a problem with memory consumption by your application. To correct the problem, you alter the code to cache more data, increase the physical RAM in the machine by 32MB, double the size of the page file, and reduce the reliance of the application on global variables. Each of these might assist in removing the symptom of excess memory consumption, but which one really fixed the problem? If in the next revision of the program you encounter the same problem, do you then repeat all the actions you took here?

Fixing the problem without understanding how you fixed it is worse than ignoring the problem altogether. Further, how do you know that you even fixed the problem? What if the real problem was a subroutine that failed to close handles to a Registry key causing a memory leak over time? By adding the memory and reducing the memory requirements of other parts of the application, you have only masked the symptom. The problem might occur later on, because the added memory only delays the problem temporarily. This also points out the necessity for thorough retesting of any solution. The other point is that, when you make large sweeping changes, you increase the probability that you will make other mistakes and introduce other problems, or perhaps magnify other problems. The more you handle the fine china, the greater the probability that you will drop it.

The final step in proper analysis is to retest any solution, no mater how small the fix. When any part of the system or the code is altered, there is a possibility that it will affect some other resource on the system. You will need to verify that:

  • You have fixed the problem that you intended to fix.

  • You have not created another problem.

  • You have not amplified an existing, but different problem.

You have to make sure that your changes have not adversely affected other parts of the system. In addition, you want to make sure that you have truly fixed the problem. Your hypothesis of the cause could be solid, yet your solution to the problem can still be incorrect for the circumstances. Starting over with the big picture will make sure that you do not miss anything.

Now that you are clear on how to proceed, we should begin a discussion of the specific resources, and how they affect system and application performance.

Specific Resources

After you have zeroed in on the particular resource that might be the problem, you will want to further analyze that resource utilization to make a more exact diagnosis of the problem. In this section the general rules for isolating the problem are presented. After you have the cause isolated, you can investigate your code and determine how to resolve the problem.

Note: The discussions in this section are relatively brief and assume some knowledge of hardware, operating systems, and the techniques for some of the basics of bottleneck detection. This section might review some of the techniques but is intended to expand on the techniques and utilize the tools from the previous sections. If you require more analysis of Windows NT internal architecture and basic bottleneck detection or implementation of hardware solutions to performance and OS tweaks, you might want to read Windows NT Performance: Monitoring, Benchmarking and Tuning, New Riders Publishing, 1998 (ISBN: 1-56205-942-4).

Memory

Once again, we start with memory. This section assumes that you have performed your original analysis and found some incriminating indication that the memory resource is exceptionally scarce on the system. This might have many causes:

  • Shortage of physical RAM

  • Disk issues creating memory issues

  • A memory leak or memory-hungry application

The shortage of RAM is an easy problem to pinpoint. It is the result of ruling out all the other possibilities. Most people make it their first choice to try and determine if they really need to add more memory. Occasionally, they add more memory and don't worry about why they might have to add memory. With the falling prices of memory, many administrators will take the easy way out and just buy more. They will go down to the local computer store and pick some up, grumbling all the way about how the developers should have coded the application so that it didn't use up all the memory on their systems. Sometimes, we (the developers) are at fault, but not always. Nonetheless, it is often left to the developer to defend his code and his honor. You will need to prove that it is either not your application causing the problem, or that the application, by the nature of what it is doing, must utilize the memory.

With any memory problem the first step is to eliminate a problem being caused by another component. In this case, it is the hard drive.

Keeping Disk and Memory Separate

Paging is a requirement for the system to offer 4GB of memory to every process on the NT system. Few if any servers will have 4GB of physical RAM for every process. NT simulates this by moving unused code and data out to the page file. This process was described in detail earlier in Chapter 2. Check to see that the memory paging and processing is not slow due to a problem with the disk I/O. When observing and comparing the disk to the memory, you will want to examine the following Performance Monitor counters:

Memory: Available Bytes

Memory: Pages/sec

Memory: Page Reads/sec

Physical Disk: %Disk Read Time

Physical Disk: Ave Disk Read Queue Length

Physical Disk: Disk Reads/sec

Physical Disk: Avg Disk Bytes/Read

Note: You are aware of the difference between the Physical Disk and Logical Disk objects. In this case, it is more beneficial to look at the Physical Disk object. Whenever you are unclear about what application is causing a potential memory or disk problem, you start with the Physical Disk object to make sure that you see all the activity affecting the resource. Once you can isolate the activity to an application or subsystem, you can focus on the drive that is being accessed. This can be done by using the Logical Disk object, which will allow you to focus on the counters for a particular partition.

First, the Available Bytes will give you an indication of exactly how low memory is. Remember that Windows NT likes to maintain 4MB of RAM for moving information in around in memory. The Pages/sec will affirm that you are having a memory problem when the value is excessive over time. The %Disk Read Time will give you an indication how busy the disk is—that is, how much time is being spent reading the disk.

Notice that many of the counters are focused on reads. Generally, when memory is a problem, the applications suffer when they generate page faults. A page fault, as you will recall, is when an application is looking for information within its Working Set and does not find it. When a page fault is generated, the VMM looks in the File System Cache and the PAGEFILE.SYS for the information. Thus, the application is forced to wait while the requested information is moved into physical RAM where it can be accessed. If the %Disk Read Time is low (less than 50%) and the memory response is still slow, you will most likely find a queue developing on the disk. The Ave Disk Read Queue Length will be a clear indicator of a queue forming.

Now, based on whether the %Disk Read Time is high or low, you will determine whether the disk activity is related to reads or writes. If the %Disk Read Time is low—indicating that most of the activity is due to writes—you most likely do not have a memory problem. You probably have a problem with an application performing some type of heavy I/O or some severe fragmentation on the disk.

However, if most of the activity is reads, you will need to determine if the reads are application I/O related or memory related. To see how much of the disk time is being spent servicing the memory paging requests, you need only compare the Memory:Page Reads/sec to the Physical Disk:Disk Reads/Sec. Dividing memory counters by the disk counter, you will get the percentage of reads that are due to the memory page faults. If this is a case of memory being used up too quickly, then the percentage will be very high—if not 100%—indicating that all the disk's read time is being spent servicing the memory page faults. This would point to a memory resource issue. You will then need to figure out if it is a leak and whether it is your application causing the problem. You will also want to have a look at the Disk Reads/Sec. When the I/O is primarily read related, as it usually is, this will be an indication of how many I/O operations are being performed per second.

EIDE disk subsystems and some older SCSI subsystems would get about 30–60 I/O operations per second. With SCSI adapter cards with inboard processors and integrated cache you can expect to get much higher value—in the realm of 1500 I/O operations per second on a good system. More complex systems, such as Storage Area Networks with Fiber connections, can get on the order of 2500 I/O operations per second. Making a determination of how well the particular subsystem you are using should be performing typically requires some simulation testing and building of performance baselines. At this point let's summarize our discussion:

  1. Observe key memory and disk counters.

  2. Determine whether it is reads or writes to the disk that is in greater percentage by examining the Physical Disk:%Disk Read Time.

    If writes are more frequent, you are more likely to be having a disk problem due to another application consuming a lot of the disk I/O.

  3. Investigate the possibility of a disk queue by observing the Physical Disk: Ave Disk Read Queue Length.

    If you concluded that disk reads are the problem, you can check to see if a queue is forming the disk. This will be a strong indication that the disk is overburdened with requests.

  4. Determine if the reads are due to memory or application activity.

    If they are due to memory, the following equation will result in a high percentage, if not 100%:

    Memory: Page Reads/sec / Physical Disk: Disk Reads/sec

    If this relation is solid, the problem is memory and not disk.

  5. Observe the Physical Disk: Disk Reads/sec to get a determination of how many I/O operations your system is performing.

From the information you acquire following these steps, you can tell what is memory and what is being caused by an inadequate disk subsystem. Although you will usually find that the memory is the problem, you can often find disk issues and repair the problem without too much trouble. Sometimes this requires some new hardware. Sometimes, it is as simple as reconfiguring the page files on the system to reduce fragmentation and decrease contention for I/O operations with other operating system components. Now that you have determined you have a memory problem, you will want to start figuring out exactly where the memory is going.

Accounting for Memory

You have determined that you have a memory problem, so you need to start figuring out where the memory is going. Initially, you will want to figure out what application or system process is consuming the memory. After you have done this, you will then want to examine the memory utilization of the particular process.

To determine which process is consuming a large amount of memory, you will need to examine a few Performance Monitor counters:

Memory: Available Bytes

Memory: Pool Paged Bytes

Memory: Pool NonPaged Bytes

Memory: Cache Bytes

Process: Working Set: _Total

Note: When examining memory counters, you will often see that the values don't always add up to be exactly the amount of memory that you have in your system. Memory allocations are very dynamic. When the various data is being collected, values are changing. Thus, difference can lead to slight variances in the totals, or even variances of the same values between the various tools such as PMON.EXE, Task Manager, and the Performance Monitor. This is just one of those times when close is good enough.

From the information obtained from these counters, you can determine roughly how the memory is being used. Then, you can get more specific. However, let's see what we have. The Memory: Available Bytes gives us what is left in memory, untouched at that particular time by any other system, process, or application. The Memory: Cache Bytes tells us how much memory is being used by the File System Cache at that particular moment. Finally, we examine the Process: Working Set: _Total. This final counter shows us how much memory is being used by the active processes. This will include kernel operations as well as drivers and user processes. If you add these three counters together, you will get roughly the amount of physical memory in the system. So, you are able to see how the system has divided the memory.

Generally, having an overly large amount of available bytes is not a problem. It is an indication that perhaps the memory is being underutilized and might have been better placed on another system. The Cache Bytes counter will let you know if the memory is being used for I/O operations instead of application processes. While you might have confirmed that you are not using the hard drives to a large degree, you might be moving large amounts of data across the network. In that case, the File System Cache is utilized, just as it is when you are writing to the hard drive. The System Cache will expand to meet the demands of the I/O operations. These operations might be simple file copies of ODBC-type information. The latter is usually the case.

General file I/O (except for loading an initial file) is usually not enough to cause the system cache to expand and maintain a large size. However, the ODBC database commands and objects, such as dynasets, can create large file transfer operations when being built in the local memory. In addition, they will cause the Working Set of the process to grow equally as fast. So, if you have a large system cache, you might actually have a poorly constructed database query, as opposed to a memory allocation problem with your application. Most of the memory will usually be seen in the Working Set: _Total counter. We use the _Total instance to get an overview of how memory is being divided among the various systems. After that has been determined, you might continue by examining the Working Sets' sizes of each of the processes on the system. After you have found the one that appears to have the largest Working Set at the time, you can dig into the specifics of the memory utilization.

Note: Although the Performance Monitor is an excellent tool for examining performance values, in the case of a quick overview of the memory allocation, you might want to launch the Task Manager and examine the Performance tab. All the information discussed on the previous few pages is available in a simple display on the Task Manager.

For more detailed information, you can use PMON.EXE (see Chapter 2). It will display a large amount of information in a text-based, column-formatted display.

You might have noticed that we also included counters for the paged pool and the non-paged pool. These values give you an indication of how specifically NT is dividing up the system memory. The system memory is represented in the various Working Sets and the file system cache values. The use of paged pool and non-paged pool offers you more detail on how that memory is being used. This is especially important to those of you who are writing drivers for Windows NT. User processes and services typically will have sections of these memory pools allocated on their behalf by the NT Executive Services. Excessively large or increasing values for the Pool Paged Bytes or the Pool Non-Paged Bytes are strong indicators of driver memory leaks or even memory leaks in the NT Executive Services. That's right; no one is above making mistakes.

After you have determined which process is using all the memory, you will want to examine it in more detail, which we will do in Chapter 6, "Examining the Application." Our goal here is to be able to break down the performance information and determine the application or process that is consuming the most memory and potentially giving us the most problems. The application could simply require some tuning to more efficiently utilize the memory. Of course, you could potentially have a memory leak.

Identifying the Leak

A memory leak is more than an application that consumes a large amount of memory. The distinctiveness of leaks is that the application, process, or driver causing the leak does not return its allocation of the memory to the system under any circumstances. When memory is plentiful, Windows NT will allow applications the luxury of keeping unused items in memory. When memory becomes scarce, NT will scavenge all the memory from the various processes if they are not actively using the memory. This will cause the information to be paged out to disk. The leaky process will claim that it is using all its allocations and refuse to surrender any amount of its memory to the pruning process. This constitutes the true memory leak.

Memory leaks can take time. Some leaks, usually related to user applications and services, will consume memory rapidly. This would usually be on the order of hours or days. Others, such as device drivers and Kernel mode services, will consume memory slowly. Drivers and Kernel mode services tend to be rather slim on memory usage to begin with, so when there is a leak, it is a small one that builds over time. These types of leaks will cause problems over weeks or even a month. Identifying the leaks calls for either long logging processes or simulations.

You are already familiar with the logging process. You simply set up a Performance Monitor log to capture all the Process and Memory objects on the system that are suspected to have a memory leaking process. Then, when the system appears to run low on memory, stop and review the log. You will see the Working Set for the errant process growing, while the Available Bytes decreases over time. Again, this could take days or weeks.

To speed things up a bit, you can observe the processes closely and use the CLEARMEM.EXE tool to observe any changes to the minimum Working Set. A leaky application will not respond to the forcible removal of unused memory sections that CLEARMEM.EXE will be exercising. To find the leaky application using CLEARMEM.EXE follow these steps.

  1. Reboot the system.

  2. Start all processes.

  3. Use Performance Monitor to log the size of all the Working Sets.

  4. Run CLEARMEM.EXE twice in a row.

  5. Use Performance Monitor to log the minimum Working Set of all the processes.

  6. Wait for some time, exercising as much of the system as possible.

  7. Repeat the last three steps, each time comparing the minimum Working Sets to the initial minimum Working Sets.

    By using this process, you should eventually be able to see a particular process growing in size, even if it is by small increments. You will still cut down on the amount of time that it takes to identify the process.

CPU

The processor's speed is the benchmark of the system's performance. It sets the standard for the users' expectations in many cases. The processor on a system has several duties. In addition to coordinating the activity between the local cache systems, the CPU must service the operating systems as well as the other hardware components. With this in mind, our first duty will be to separate the hardware issues from the software issues. Because we are writing applications, software issues will be our priority. However, understanding the hardware issues will be necessary, so that you can eliminate them and get on with your appointed task of performance tuning your application. After you have cleared up any hardware issues, you will want to analyze the application issues more closely. With applications, especially more complex applications, you will be running Windows NT Services or even drivers. It is often a little more difficult to determine which services or drivers are producing problems, as they are somewhat hidden behind Kernel mode services. Here we will see how to determine exactly where the process cycles are being spent.

Isolating Software

At this point, you have been able to determine that you have a processor problem. You should have examined the Processor: %Processor Time and the System: Processor Queue Length. You would have found that the processor is busy over 80% of the time. The queue of waiting transactions is over two and thus, indicates a problem with the processor being able to keep pace with the demands being placed on it. Of course, this should cause a question to pop into your mind: Is the processor too busy because too many requests are being made upon it, or is it not holding its weight given the relatively low number of requests being placed upon it? Usually, the case will be that the processor is being overwhelmed, indicating a problem with either hardware requests or application requests being too numerous.

Note: If the processor is not performing up to its specifications, you might have a hardware configuration problem. This was found to be especially true of pre-PII systems. The processor requires the use of an L2-Cache for moving information back and forth from RAM to the processor. If there is an overabundance of RAM without a suitable increase in L2-Cache, the Processor performance can suffer. The general rule is to load the system with the maximum allowed amount of L2-Cache, so that you do not have to worry about the ratio of L2-Cache to RAM. L2-Cache maximums are now about 1MB for high-powered workstations. On server models, you can get 2MB or greater for the L2-Cache. Keep in mind that this is for the Intel architecture. Alpha systems have higher capacities and will even employ the use of L3-Caches.

Applications use threads to place tasks in the processor's queue. Hardware does not use the same type of interface. The hardware uses interrupts to get the processor's attention. Therefore, when we are trying to determine if a particular problem is related to hardware, we will want to track the interrupts. We will also want to track the deferred procedure calls, DPCs, as well. As developers, you already have an understanding that a process has a priority, and that the base priority of the process will determine the range of the thread's priority. Within the realm of priorities, for the various processes and threads on an NT system, there are 32 levels.

The first levels—from 0–16—are for the user processes. The remaining levels are for the Kernel mode processes, which often operate in real-time. However, above the Kernel mode processes are the DPCs, and above that are the hardware interrupts. Thus, when servicing a queue of tasks, the process must first take care of all the interrupts; then the DPCs; then the Kernel mode threads; and then finally the User mode threads. Knowing this, you can easily imagine situations where the interrupts and DPCs can consume all the processor's time. Let's examine the processor just a little closer.

When a piece of hardware is detected on Windows NT, the hardware registers an interrupt with the system. The Hardware Abstract Layer (HAL) is responsible for taking care of the registration and matching up the routines with the interrupts when required. When an interrupt hits the processor, all activity stops and the interrupt is interpreted. This means that the microkernel and the HAL work together to identify the component that is making the request and then contact that component to service the request. Some requests can be noted and delayed. The delayed processing is usually done for network cards. When new data is received, the NIC sends an interrupt to the processor, indicating that it needs attention. The processor, microkernel, and HAL determine that this is a NIC and defer processing until other interrupts are handled. This is the nature of the Deferred Procedure Call (DPC). The DPC is handled after all of the other hardware interrupts are handled.

Note: On multiple-processor systems, the handling of DPCs is altered. Generally, when a network card is added to a multiple-processor system, the network card is loosely bound to a particular processor. Thus, when that NIC receives information, it contacts that specific processor. When the interrupt is determined to be from a NIC, the processor defers the procedure. However, it might defer the procedure to run on another processor. This is partially dependent on the hardware vendor's implementation of DPC handling routines within the processor. With some vendors, the DPC will get queued on the same processor—on others, the DPC is queued on another. This scenario becomes more complicated when multiple NICs are involved. In Chapter 7, we will return to this discussion and its ramifications for Web servers.

At this point, it is clear that you will want a method for determining whether hardware or software is impacting the processor. Equally clear is that we will need to use the interrupts in making that determination. The counters that you will want to observe are

Processor: %Processor Time

Processor: Interrupts/sec

System: System Calls/sec

System: Processor Queue Length

Keep in mind that in a multiple-processor system, the processor counters should be selected for each instance of a processor. Also, note that the System object is global to the system and thus, includes values for all processors, and not just a single processor on the system. This will be important when you begin to make comparisons. If a single processor appears to be having a problem, then examining the following addition counters may offer more insight into the nature of the problem:

Processor: %Processor Time

Processor: %Privileged Time

Processor: %User Time

Processor: %Interrupt Time

Processor: %DPC Time

Let's return to the first set of Processor counters. Here you are basically observing the %Processor Time and the Processor Queue Length to make sure that during your observation, you are actually experiencing the problem. The other two counters—Interrupts/sec and System Calls/sec—can be directly compared to assist in the determination of whether your problem is hardware-related or software-related. Generally, these two counters will be about equal. Keep in mind that we are looking for sustained values even in this comparison. The occasional spikes in hardware activity will show up. However, if the Interrupts/sec counter is consistently much higher than the System Calls/sec counter, then you are experiencing processor issues due to a hardware problem. Examining the other counters will assist in the determination of which component is creating the problem.

The %User Time counter will represent the amount of time that the processor is spending on non-idle user tasks. This will include the applications and the subsystems. It will also include services and some components of the Win32 Subsystem. (In this section, when we say services, we are talking about Windows NT services that are installed and viewable from the Control Panel's Services applet. We are NOT referring to Executive Mode kernel components.)

The Executive Services' activity will display in the %Privilege Time counter. The %Interrupt Time and %DPC Time can be combined to represent the amount of time that is being spent servicing hardware requests. This is not completely accurate, as some of the processing is actually represented in the %Privilege Time that we must attribute to the software's demand for processing time. However, the results are close enough for a determination. So, combining the first three counters in our list and comparing them to the interrupt and DPC counters again will tell us if more time is being spent servicing the hardware than the software. If the DPC and interrupt values are substantially higher, which would be over 60% of the overall processor time in general, then the cause is hardware. If they are about equal, then the problem is typically software. After you have ruled out the hardware factor, you will then want to break down the system calls and analyze the applications and processes individually.

Some Causes for Increased Interrupts

The causes of increased hardware interrupts do not always indicate a problem, although they can. Some of the issues could be as follows

Mis-configured hardware drivers

Faulty, outdated, or incorrect drivers

Hardware about to fail

Loose connections

I have had the opportunity to do some technical training. One company would send the computers to remote locations where they would be set up and installed with software to run the classes. The problem was that, through all the (sometimes-rough) movement, some of the adapter cards would pop loose. Not completely out, mind you, just half way. This would often lead to very high interrupt counts, but the system for the most part would continue to operate.

Aside from actual problems, you could simply have a system whose purpose in life is to work tightly with hardware components. A Web server is such a machine. For a large site, the network cards, especially if there are multiple cards, will generate a lot of interrupts and DPCs on the processor without there actually being a problem. Other examples might be customized instrumentation that is connected to Windows NT for monitoring some system—perhaps using an NT workstation to monitor flow and pressures through a water system.

Processes and the Processor

Okay, you have now eliminated the hardware from the equation. You need to analyze the various processes on the system to determine what is going on. To start, you would view the Process: %Processor Time for each process on the system. This will give you an indication as to which process is using up the processor's time. You might find a single process standing out in the crowd. But, you also might find that there is no single process taking up a majority of the processor's time. In the latter case, you will need to think about the various programs and services that you are running on your system because the processor is generally being overloaded. The other case will show which application is giving the processor the most grief. From there, you have a variety of tools to begin to break down what is going on. The first is the Performance Monitor. You will want to start by observing the following counters:

Process: %Processor Time

Thread: %Processor Time

Process: ID Process

Thread: ID Thread

This will help you to break down which threads are taking up the CPU time among the ones created by your program. You might then begin investigating the various components that are being accessed by using the APIMON.EXE and the Process Explode program from the Resource Kit. From these programs, you will be able to determine the API calls that are being made and the DLLs that are being used. This allows you to track the progress of the application and also associate various levels of the Systems Calls/sec counter to activities of your program.

A system call is really a way for an application to get the attention of the processor and the Kernel mode services. The request originates from an environmental subsystem that is running some user process or service. The process will generate some event that results in a system call. The Executive Services have registered themselves, much like the hardware interrupts have done. The various system calls will indicate which services they are trying to reach by indicating the software interrupt level. After the Executive Service is contacted, parameters are copied from the User mode stack down to the Kernel mode stack. Further data that is required might be copied, depending on the method the programmer used to construct the data. If it is in a shared area, the Executive Services checks to see that the area can be accessed. Otherwise, the data will be copied or the system call rejected.

Disk

Another one of the specific resources to examine is the disk subsystem. The disk subsystem is always one of the slowest, yet most demanded resources on the computer's system. Making sure that your disk subsystem is properly configured is always the beginning point in optimizing disk and memory performance. At this point, we will assume that you have taken the time to make sure that the disks are properly configured for the task at hand. We'll assume that you have an appropriate hard drive controller, such as SCSI, EIDE, or UDMA. In addition, we'll assume that the disk partitions and appropriate advanced disk configurations, such as striping and RAID, have been properly implemented.

Note: If you haven't gotten the message yet from this text, the disk subsystem has a lot of configuration concerns. These concerns and sample configurations can be seen in the previously mentioned book Windows NT Performance: Monitoring, Benchmarking, and Tuning. This book deals more with the administrative side of performance monitoring and system optimization.

With the disk subsystem, the first thing do is establish the parameters of the system you are using. This implies the use of some of the Resource Kit tools. Then, you can begin to analyze the affect of your application on the disk's performance.

Understanding Disk Utilization

Disk performance is very much dependent on the type of I/O that is being done and factors such as:

Sequential versus random I/O

Writes versus reads

Large files versus smaller files

Disk configurations

Buffered I/O

File fragmentation

They can all affect the performance of the disk subsystem. To get accurate readings on the affect of your application or a particular type of I/O on the disk subsystem, you will need to establish a good set of baselines for the performance of the disk subsystem. In Chapter 1, you learned about logging information to the Performance Monitor log, and in Chapter 3, you learned about the Response Probe. In this chapter, you want to put those tools and techniques to use.

The Response Probe can be used in this situation to produce predictable output for the system. You want to first simulate the best conditions, which are buffered reads/writes in a sequential order. This will give the disk the best performance marks. Then, you will want to deviate and begin to get closer to the way that you anticipate your application will be using the disk.

Perhaps you are using a Jet database and want to see how the performance of the system will fair when the database approaches 20MB. This is generally the point at which Microsoft suggests that you consider compacting the database manually to maintain integrity and performance for the MS Jet database. You can create a file of this size and then simulate multiple random reads and writes. You will also want to adjust the Response Probe files, so that the size of the reads matches the record size of the database. This will give you more of an indication of how the system will behave and how a standard disk configuration will react to the type of I/O you are planning. Based on the information you find, you might want to choose a different database system or go with a proprietary database system that you build yourself. When performing this type of analysis, the primary counters to watch will be as follows:

Logical Disk: Avg Disk Bytes/Read

Logical Disk: Avg Disk sec /Read

Logical Disk: Disk Read Bytes/sec

Logical Disk: Disk Reads/sec

Processor: %Interrupt Time

Processor: %Processor Time

You might notice that two processor counters are in the mix. Disk I/O to some extent always utilizes the processor. You will want to watch the processor, at least a few times, to see if there is any significant affect on the processor's performance.

Note: Disk I/O and Network I/O devices are generally programmable I/O devices. This means that they rely on the processor to perform some of their work. Of course, some technology is different from others. IDE ISA controller cards are severe in their use of the processor; they can use as much as 40% of the available processor time to perform disk I/O. SCSI bus mastering cards with their own processor and cache will take up almost no time on the processor. Network interface cards (NICs) always consume processor resources. Some vendors are now coming out with cards that have some intelligence and can buffer data/interrupts if the processor is being overwhelmed.

The Disk Read Bytes/sec is the overall measure of throughput for the hard drive. You will want to keep a close eye on this value throughout your testing. This will let you know how much you are affecting you performance. Generally, aside from database applications and perhaps some utilities, the disk I/O performance is not too much of an issue. Later in Chapter 8, when we are specifically discussing databases, we will return to this topic with more vigor.

Isolating the Application

One of the biggest problems, after you notice some drastic disk I/O, is figuring out what is causing the problem. The Performance Monitor can tell you there is a problem and even point you to the partition, but that is as close as you are going to get. To determine the files and the exact application, you can use the FILEMON utility discussed in Chapter 4, "Freeware/Shareware Tools." This will allow you to identify the files and the application that is producing the I/O problem. Keep in mind, that it is necessary to understand the nature of the problem by reviewing the Performance Monitor statistics. You might see a program that is accessing a very large file, but it might not be the problem. A program accessing numerous and scattered small files or highly fragmented files can be just as damaging to performance.

Network

The network is the most problematic component of any computer system. The real issue is that the problem is not localized to the machine that you are working on. When reviewing network problems, you must consider the server side and the network components in between the systems, as well as the performance of the networking systems on the local workstation.

The Network's Effects on the System

We already know that the network components can affect the processor by generating excessive interrupts that the processor has to deal with. Knowing this, we can understand the concern over broadcasts and multiple protocols. A broadcast is a network packet sent out with no particular destination in mind. Every system on the network must react to the broadcast, and thus, must interrupt the processor to interpret the information, even if it is little more than to determine that the packet of information is of no value. Therefore, the health of the network and the general reduction of broadcasts on the network are very important. For the local machine, the reduction of protocols and unnecessary services is a good place to start. Multiple protocols on a workstation or server will do the following:

Use memory resources

Use processor resources

Result in increased reactions to broadcasts

Increase general I/O delays

For the most part selecting a single protocol for the enterprise is very important. From a development perspective, making sure that the size and frequency of network communications is efficient is paramount to an efficient application.

Tracking Bandwidth

When you are reviewing the performance of the network, you will start by analyzing how much of the bandwidth on the local segment is being utilized. Using the Network Segment: %Utilization will show you the amount of bandwidth currently in use. Should you see more than 67% in use, you should start to worry that there is a problem on the network somewhere. Checking the %Broadcasts will show you if the traffic is related to broadcasts on the network. More than likely, it is. Broadcast storms, as they are sometimes called, occur when systems are mis-configured on the network but can also come about when server-side applications are poorly written. If you have an application that is going to rely on large amounts of data being moved across the network, you will want to observe the general health of the network and gather some overall statistics about your system performance in general. Much like the Disk I/O, the use of baselines will be important to any analysis of network performance.

Note: For more demanding analysis of the networking, you might want to get a hold of a copy of the Network Monitor to collect the packets themselves as well as the statistics about the network's health and performance.

The exact counters you will want to examine will be dependent on the protocol that you are using. Most sites are using TCP/IP, so that is what we will speak to in most of this text. In the case of TCP/IP, you will want numbers related to the amount of traffic and the efficient packaging of the information for transport. This type of information can be found in the following counters:

TCP: Segments Sent/sec

TCP: Segments Received/sec

TCP: Segments/sec

TCP: Segments Retransmitted/sec

When you are observing the network interface, you will also want to watch the following processor counters:

Processor: %Processor Time

Processor: %Interrupt Time

Processor: %DPC Time

Recall that the NIC and the network traffic can have a dramatic affect on the processor. The TCP counters that you see will describe the amount of TCP traffic going in and out of the system on the network. Most of the other statistics we have discussed, regarding the network, have been related to overall network utilization. These counters focus on the individual server or workstation that you are monitoring. When looking for issues related to the transmission of the information on the network, you should look at Segments Retransmitted/sec. This will be an indication of the amount of traffic that is causing problems on the network. Retransmissions are packets of data that can not make it to the other nodes on the network and need to be retransmitted. You will usually want to perform a small calculation:

Segments Retransmitted/sec / Segments Sent/sec

This will give you the percentage of the segments that were transmitted due to previous transmission failures. You will notice that we focused on the receipt and transmission of data and not necessarily the frame sizes. When analyzing the network traffic, your focus will be on the affect the traffic is having on the endpoints, although we cannot completely ignore what is occurring on the network.

When reviewing the traffic, you will usually want to analyze some of the chief components involved in the transmission and receipt of data. In many cases, this communication is between a couple of basic components on Windows NT, the Server Service, and the Redirector.

The Server Service is the networking component responsible for responding to connection requests to a system that has some set of resources to share. This is typically a shared folder or perhaps a printer. Certainly there are other types of connections that might or might not involve the components directly. However, the general concept is the same. The Redirector is the service that is on the requesting workstation. When a request is made, for example, for a file that actually exists on another server, the Redirector jumps in and routes the communication to a remote system. The remote system's Server Service picks up on the request and processes the information. To monitor the performance of this communication, you will need to monitor both the Redirector and the Server. Performance issues could be related to either one or both of these services, as opposed to the network in general. The counters that you want to use to monitor this type of communication are

Server: Bytes Total/sec

Network Interface: Output Queue Length

Redirector: Bytes Total/sec

Redirector: Current Commands

Redirector: Network Error/sec

The counters with the Bytes Total/sec values give you an idea of exactly how much traffic is being sent and received by the system. The Server's queue is estimated by the Network Interface: Output Queue Length, which shows you how much of a queue there is. The value of Current Commands for the Redirector displays the number of transactions in the queue for the Redirector. Recall that a queue of more than two on any transaction based system is considered a bottleneck. Last, the Network Error/sec counter will give you an indication of problems on the network or with one of the services involved in the communication.

Remember that you should be examining the server side as well as the workstation (Redirector) side of the communications. Usually, errors will be an indication of server timeouts on connections or repeated transmission failures.

Networking performance monitoring is more specialized than the monitoring of any of the other resources. There are many factors and external components that can affect the network operations. We have no way to cover all the potential scenarios here. However, in Chapter 7, we will be covering the communications issues directly related to the Web servers. You will need to remember that, as we have done here, you will need to monitor the components on both sides of the communications as well as any potential issues related to general network bandwidth availability.

Summary

In this chapter, we walked step-by-step through each of the various resources that a computer system is managing. You learned about some of the issues with monitoring and analyzing each of the various components. In addition, you learned about the objects and counters that you will want to use in your analysis. These counters have been organized for your convenience in Table 5.3. All this was built upon the knowledge of the tools and techniques that you read about in Part I, "Arm Yourself!: Tools for Performance Monitoring." Again, the best way to really get to know the material is to try it out. Review the counters for the various computer components and do some analysis on your workstation or server. See how various operations affect the system performance and compare them to some of your applications. This will get you closer to understanding how your programming style affects the performance of the application and the system overall. The next chapter, Chapter 6, will expand on this knowledge to tie it closely with the application operations and investigative tools in Part I.

Table 5.3 Summary of Problems and Applicable Counters

Indications of a Problem

Counters to use in Analysis

Memory issues: Pages/sec less than 16, regular reboots required, high disk activity.

Memory: Available Bytes
Memory: Pages/sec
Memory: Page Reads/sec
Physical Disk: %Disk Read Time
Physical Disk: Ave Disk Read
Queue Length
Physical Disk: Disk Reads/sec
Physical Disk: Avg Disk
Bytes/Read

Memory issues for drivers: Pages/sec less than 16, blue screens, regular reboots of system required.

Memory: Available Bytes
Memory: Pool Paged Bytes
Memory: Pool NonPaged Bytes
Memory: Cache Bytes
Process: Working Set : _Total

Processor issues: %Processor Time less than 80%, slow response to interactive user.

Processor: %Processor Time
Processor: Interrupts/sec
System: System Calls/sec
System: Processor Queue
Length

Processor issues: %Processor Time less than 80%, slow response to interactive user. These counters offer details into hardware causes such as an over-active network.

Processor: %Processor Time
Processor: %Privileged Time
Processor: %User Time
Processor: %Interrupt Time
Processor: %DPC Time

Disk issues: Analyzing how the disk is being used. Ave Disk Queue Length of less than two is an indication of a problem.

Logical Disk: Avg Disk
Bytes/Read
Logical Disk: Avg Disk sec
/Read
Logical Disk: Disk Read
Bytes/sec
Logical Disk: Disk Reads/sec
Processor: %Interrupt Time
Processor: %Processor Time

General process tracking. These counters are used whenever you are tracking any of the other issues for a particular process or thread.

Process: %Processor Time
Thread: %Processor Time
Process: ID Process
Thread: ID Thread

Network: %Processor Time less than 80%, Network Utilization less than 60%. These counters give you an indication of how much data is actually being sent and received via TCP. Processor counters are present to measure NIC's effect on the processor.

TCP: Segments Sent/sec
TCP: Segments Received/sec
TCP: Segments/sec
TCP: Segments
Retransmitted/sec
Processor: %Processor Time
Processor: %Interrupt Time
Processor: %DPC Time

Network: %Processor Time less than 80%, Network Utilization less than 60%, slow interactive user response. These counters look at the another layer of the network communications. Viewing them allows you to analyze the need for adjustments to system parameters.

Server: Bytes Total/sec
Network Interface: Output
Queue Length
Redirector: Bytes Total/sec
Redirector: Current Commands
Redirector: Network Error/sec
Processor: %Processor Time
Processor: %Interrupt Time
Processor: %DPC Time

About the Author

Paul Hinsberg, MBA, MCSE, is the owner and operator of CRDS Inc., a computer consulting company in the Silicon Valley region.

Copyright © 1999 by MacMillan Technical Publishing

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice. International rights = English only.

International rights = English only.

Link
Click to order