Microsoft TechNet
1 out of 1 rated this helpful - Rate this topic

Performance Tuning for Windows NT Workstation 4.0

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.
By Charles Perkins, Matthew Strebe, and James Chellis

Archived content - No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Performance Tuning

Chapter 14 from MCSE: NT Workstation 4, published by Sybex Inc.

Windows NT implements a number of automatic performance optimizations to ensure that any Windows NT Workstation will operate very well. However, as with an automobile, understanding how and why resources of the system function (and knowing how to measure their performance) will help you tune your system for optimal performance.

Performance tuning is finding the resource that slows your system the most, speeding it up until something else has the most impact on speed, and then starting over by finding the new slowest resource. This cycle of finding the speed-limiting factor, eliminating it, and starting over will allow you to reach the natural performance limit of your computer in a simple, methodical way.

In this chapter we first cover the automatic optimizations that Windows NT performs to ensure that a system will operate smoothly and respond quickly to user requests at almost any load level. Then we dig into performance-tuning theory and definitions, explaining how the different software and hardware resources interact to achieve the smooth, responsive system performance you expect from Windows NT.

Next we cover the performance monitor, the tool that implements most performance-tuning procedures in Windows NT. After you understand how the performance monitor works, we show you how to ferret out processor, memory, and hard disk bottlenecks. Finally, we discuss how to speed up specific applications in order to improve system responsiveness for your specific needs.

Microsoft Exam Objective

Optimize system performance in various areas.

Microsoft Exam Objective

Implement advanced techniques to resolve various problems.

Bottlenecks

Bottlenecks are factors that limit performance in a computer. For instance, slow memory limits the speed at which a processor can manipulate data—thus limiting the computer’s processing performance to the speed at which the processor can access memory. If the memory can respond faster than the processor, the processor is the bottleneck.

Note: The terms processor, microprocessor, and central processing unit (CPU) are synonymous throughout this book.

There is always a bottleneck in system performance. You may not notice it because your computer may be quite a bit faster than you actually need for the work you perform. Chances are, if you use your computer only for word processing, the speed of your machine has never slowed you down. On the other hand, if you use your computer as a CAD workstation or to compute missile trajectories, chances are you’ve spent a lot of time waiting for your computer.

Note: There is always a bottleneck that limits system performance when you use your computer. Ideally, it's you.

Performance tuning is the systematic process of finding the resource experiencing the most load and then relieving that load. You can almost always optimize a machine to make it work better for you. Although tuning a server for maximum network performance is more crucial (and more difficult) than tuning a workstation, understanding how Windows NT Workstation achieves its performance and how you can increase its performance is important. Even if you don’t need to make your computer any faster, understanding performance tuning can help you diagnose problems when they arise.

Before we get too far into our discussion of computer performance, you should understand a few of the terms we will be using in the context of performance tuning.

  • Resources are hardware components that provide some quantifiable work capacity in the context of performance tuning. Software processes load down hardware resources.

  • Bottlenecks are resources with performance limitations that affect the responsiveness of a computer. When used singularly, bottleneck refers to the most limiting component of the system.

  • Load is the amount of work that a resource has to perform. For example, the microprocessor is "under heavy load" if it is performing a number of complex math operations. The disk drive is under load any time files are read from or written to it.

  • Optimizations are the measures taken to reduce the impact of a bottleneck on performance. Optimizations may include eliminating unnecessary loading, sharing loads across devices, or finding ways to increase the speed of a resource.

  • Throughput is the measure of information flow through a resource. For instance, disk I/O throughput is the measure of how much data can be read from or written to a disk in a given time period, usually one second.

  • Processes are software services running concurrently on your computer that perform a certain function. Drivers and file systems are processes. A process has its own address space and is therefore protected from other processes in Windows NT. Refer to Chapter 15 for more information on processes.

  • Threads are software chains of execution that run concurrently to perform the functionality of a process within the address space of that process. A process is one or more threads. Threads are the basic unit of division among processors in a multiprocessing environment.

Note: The term bottleneck comes from the observation that the neck of a bottle limits the flow of water through it. To visualize a bottleneck, imagine the difference between turning over a cup of water and turning over a bottle of water.

Exercise 14.1 will help you see the difference between threads and processes by introducing you to the Task Manager. If you are running any other applications, you can leave them running, but the Task Manager will display more information than the exercise describes.

Exercise 14.1

Viewing Applications, Processes, and Threads

  1. Select Programs Accessories Paint from the Start menu.

  2. Select Programs Accessories WordPad from the Start menu.

  3. Press Ctrl+Alt+Del.

  4. Click Task Manager.

  5. Select the Applications tab.

  6. Notice the number of applications running. You should see Paint and WordPad listed in the Task list box.

  7. Select the Processes tab.

  8. Notice how many processes are running. Find the MSPAINT.EXE and WORDPAD.EXE processes in the list. In this case each application has only one process. The other processes you see are system processes that run all the time.

  9. Select the Performance tab.

  10. Notice how many threads are running in the Totals box.

  11. Close the Windows NT Task Manager.

Now that you understand the terms used in performance tuning, we can discuss how performance tuning works. A slow hardware resource, such as a hard disk drive, causes the microprocessor and system RAM (both fast) to wait for it to complete I/O requests. Thus during disk I/O, the speed of the hard disk is the speed of the computer.

Although you cannot make your hard disk faster (unless you replace it with a faster one), you may be able to reduce the number of times the computer needs to access it or limit the amount of information transferred. You may also be able to spread the load across many hard disk drives, thus dividing the time you spend waiting for drive access by the number of drives available.

You will reach a point when you have a limitation you cannot overcome. This point is the natural limit of your machine and the ultimate goal of performance tuning. If you find you need speed beyond the natural limit of your machine, you will need to upgrade the hardware resource causing the limitation.

Finding Bottlenecks

Ferreting out bottlenecks involves a little understanding of how computers work, and it requires some software. Without proper monitoring tools, even the best system engineers can only guess at what causes a complex system to run slowly. Windows NT provides a comprehensive set of tools for finding and eliminating bottlenecks.

Microsoft Exam Objective

Identify and resolve a given performance problem.

To find a bottleneck, you must be able to measure the speed of the different resources in your system. Measurements enable you to find the one resource that is performing at its peak thereby causing the bottleneck.

Note: The hardware resource that is operating at its maximum performance level is the bottleneck.

The measurements you will need to make differ among resources. For instance, disk throughput is measured in megabytes per second, whereas interrupt activity is measured in interrupts per second. To compare resources you must use measurements that are equal. In most cases Windows NT provides a basic "percentage of processor time spent doing this" metric that you can use to compare very different resources.

The first step to finding a bottleneck is to run the performance monitor application. You then have to put your computer under the load that causes it to perform more slowly than you want. Run your CAD program and import a file from another format. Attach to your network file server and start copying a lot of files. Run that graphic-intensive game. Run whatever software you want to make run faster.

Using the performance monitor, you will then look at a few broad measures that will show you where to search more deeply to find the exact bottleneck. For example, if after showing processor time and disk time, you see that the disk is running at its peak, you know to concentrate on disk-related measurements to find the bottleneck.

Note: Make certain you've found the bottleneck before concentrating on detailed performance monitoring. Since performance-limited resources hide behind other, slower resources, you won't be able to see the difference if you make changes to objects that are not truly the bottleneck.

Eliminating Bottlenecks

Finding a bottleneck is only half the battle. Eliminating it (making it fast enough that something else is now the primary bottleneck) may involve changing a Control Panel setting or replacing an old, slow hard disk. You will have to determine how to relieve the load placed on the resource.

Most of the time you will be able to look at more detailed measurements to determine the specific activity that is loading your system down. For instance, if you determine that your microprocessor is the bottleneck, you can look at the time spent in each process to determine exactly which process is causing the most load. Discontinuing the use of the application that relies on that process, or replacing it with equivalent software that creates less load, will relieve your bottleneck.

Note: When troubleshooting, make only one change at a time. Otherwise, you will not be able to tell which change fixed the problem.

The Perpetual Cycle

You can achieve maximum performance from your hardware through a continuous cycle of improvement. Once you’ve eliminated the major bottleneck in your system, start over and eliminate the next new bottleneck. There will always be a bottleneck in your system because one resource will always cause other resources to wait for it.

Keep eliminating bottlenecks until you either make your computer so fast that you never need to wait for it, find the component to replace or upgrade, or realize that you can’t afford to buy any new components and settle for what you have, knowing that your system is running as fast as it can.

Windows NT Self-Tuning Mechanisms

You may never have to deal with manual performance tuning because Windows NT tunes itself very well for most users and for most situations. Unlike many operating systems, you will not have to manually adjust arcane environment variables to improve Windows NT performance. Windows NT takes care of that for you. The tuning you will do to optimize Windows NT performance involves determining which hardware resources are under the greatest load and then relieving that load. Windows NT comes with some very powerful tools to assist you, but because of the system’s self-tuning nature, you may never have to use them.

Windows NT implements a number of automatic performance optimizations. They are

  • Multiprocessing

  • Avoiding physical memory fragmentation

  • Swapping across multiple disks

  • Prioritizing threads and processes

  • Caching disk requests

Multiprocessing

Multiprocessing divides the processing load across several microprocessors. Windows NT uses symmetric multiprocessing, a technique in which the total processor load is split evenly among processors. Simpler operating systems use asymmetric processing, which splits the processing load based upon some non-load-based metric. Those operating systems usually put all system tasks on one processor and all user tasks on the remaining processors.

Note: Windows NT Workstation ships with support for two microprocessors. If you have a computer that uses more than two microprocessors, contact your OEM vendor for the support files for your computer.

Scheduling and resource assignment between processors takes computing time. Because of this load, two processors are not twice as fast as one. Windows NT with two processors generally runs at about 150 percent of the speed of one, depending upon the type of programs run. An application that has only one thread cannot run on more than one processor.

In many computing problems the result of one thread depends upon the results of other threads. This circumstance is like a baton race in which a runner (thread) must wait for the baton (results) before taking off. Obviously, splitting these threads among processors will not make the application faster. Multiprocessing works best with large computing data sets that can be broken into chunks and solved independently.

Symmetric Multiprocessing

Symmetric multiprocessing shares the total processing load among all available processors as equally as possible. When processor time becomes available, a routine determines which thread gets that processing time, depending upon its priority in the thread queue. Figure 14.1 shows a hypothetical four-processor computer running two multithreaded applications. The height of the bars indicates the total computing capacity of the processors. The shaded areas indicate how much of that load a process uses.

Figure 14.1: Symmetric multiprocessing

Figure 14.1: Symmetric multiprocessing

Asymmetric Multiprocessing

Asymmetric multiprocessing dedicates certain threads to certain processors. For instance, all system threads and drivers might be run on one processor, and user threads may be run on another. Asymmetric processing does not allow the operating system to make the most effective use of processor time. Figure 14.2 shows another hypothetical four-processor computer running two multithreaded applications. Compare Figures 14.1 and 14.2 to see why symmetric multiprocessing works better. Notice that in Figure 14.2 the second CPU is at maximum performance (making it the bottleneck), while the third and fourth CPUs are not working at all.

Computer designers often use asymmetric multiprocessing to give the system processor hardware-level access to input/output devices and deny that access to the user processor(s), thus protecting system resources and reducing the need for security in the operating system.

Figure 14.2: Asymmetric multiprocessing

Figure 14.2: Asymmetric multiprocessing

Memory Optimizations

Windows NT performs a number of optimizations to make the most effective use of random access memory (RAM). In Windows NT, memory is divided into 4KB chunks called pages. Each page can be used by only one thread. A thread may be stored in any number of pages. Therefore, a 13KB thread will actually take 16KB of physical RAM because the remaining 3KB in the last page cannot be used by anything else.

Some operating systems use 64KB page files in order to maximize swapping speed (64KB is the maximum size of a single block transfer to SCSI and IDE hard disks). Unfortunately, this optimization forces each thread to use a minimum of 64KB. If the average size of an executing thread is 96KB, 25 percent of physical RAM would be wasted on unusable excess storage. Windows NT loses the performance benefit of 64KB page sizes in favor of leaving more physical memory available to reduce the necessity for swapping.

The system must have enough memory to store all the executing threads. If the amount of memory is insufficient, Windows NT uses a portion of the hard disk to simulate memory by swapping memory pages not currently in use to a special system file called the virtual memory swap file (PAGEFILE.SYS). When the system needs the pages that were swapped to disk, Windows NT trades pages in RAM for pages on the hard disk. This process is completely hidden from the threads, which do not need to know anything about the memory swapping process.

The more memory you have, the less time the system spends on page swapping. Windows NT systems having less than 32MB of memory will spend a significant amount of time swapping pages to the virtual memory page file, especially if they are running more than one application at a time. This swapping activity slows the computer dramatically, since hard disks are very slow (but very cheap) compared to physical RAM.

The faster page swapping can be made, the lower its impact on system responsiveness. To speed this process, Windows NT supports simultaneous writing to more than one hard disk for its virtual memory paging file. Since physical drives can perform simultaneously, splitting the virtual memory swap file among different disks allows Windows NT to divide the time spent processing virtual memory swaps by the number of physical disks. Exercise 14.2 shows you how to split your swap file among more than one disk. (You must have more than one hard disk drive to perform this exercise.)

Exercise 14.2

Splitting the Swap File among Disks

  1. Select Settings Control Panel from the Start menu.

  2. Double-click the System control panel.

  3. Select the Performance tab.

  4. Click Change in the virtual memory area.

  5. Select the primary volume on the first physical disk.

  6. Set the Page file Initial size to 16MB.

  7. Set the Page file Maximum size to 48MB.

  8. Click set.

  9. Repeat steps 5–8 for the primary volume on each physical disk.

  10. Click OK.

  11. Click Close.

  12. Answer Yes to restart your computer.

Windows NT allows you to split your swap file among volumes on the same physical disk, but doing so will not improve disk performance. In fact, splitting the files increases swap time by forcing the drive head to move a great deal more than normal during swapping. You should set only one swap file per physical disk.

Prioritizing Threads and Processes

In a multitasking operating system, if each thread of each process got equal processor time round-robin fashion, the computer would respond to user requests very slowly. Some system processes, such as moving the mouse cursor or updating the screen, must happen all the time—far more often than most other system processes.

Windows NT prioritizes each thread based upon its importance to system responsiveness or any requirements it may have to respond to external (real-time) events in a timely fashion. Windows NT does a good job of setting thread priorities by default. However, Microsoft cannot predict exactly how you will use your computer, so it leaves you some ability to tune priorities.

Processes start with a base priority of 7 on a scale of 0 to 31. Each thread of a process inherits the base priority of the process. Windows NT can automatically vary priority levels up to two priorities higher or lower as the system runs, allowing the system to prioritize as it sees fit. Users can also start processes with higher than normal priorities. Figure 14.3 shows the Windows NT thread priority scale.

Figure 14.3: Thread priorities in Windows NT

Figure 14.3: Thread priorities in Windows NT

Real-time applications start with priorities higher than 15. These real-time processes require processor time quite frequently to ensure that they can respond to external real-time events. Drivers, which must respond to hardware events very close to the time the device demands attention, run in these priority levels.

Only administrators may start processes with a priority higher than 23. These processes demand so much processor time that they can make all other processes run very slowly. Starting a regular application with a priority this high will make even moving the cursor slow and laborious. Starting processes with other than normal priorities is shown in Exercise 14.15 in the "Application Performance" section of this chapter.

You can also use the Task Manager to increase the priority of an already running program. This step will normally not be necessary, but it is a good way to test the demands a process will make on the system at different priority levels.

Caching Disk Requests

Windows NT uses disk caching to reduce the amount of input/output traffic to the hard disk drive. Caching works by reserving a portion of memory as a staging area for hard disk reads and writes. When data is read from the disk, it is stored in the cache. If the same data needs to be read again, it is retrieved from the very fast memory cache, rather than from the disk.

Note: In this book, the term memory is synonymous with random access memory (RAM), not with hard disk space.

Actually, disk read operations don’t just bring in the data requested. Entire clusters are transferred from the hard disk to the memory cache because read and write operations are most efficient at the cluster size. Consequently, a good portion of the data on the hard disk located immediately after the data that is requested also comes into the memory cache. Since read accesses tend to be sequential, chances are good that the next read request will also be in the cache.

The disk cache is also used for write operations. The Windows NT file system (NTFS) doesn’t write data to the hard disk immediately. It waits for system idle time so as not to impact the responsiveness of the system. Data writes are stored in the memory cache until they are written to disk. Often, especially in transaction-oriented systems like databases, write data in the cache will be superseded by new changes before being written from the cache to the hard disk—meaning that the write cache has completely eliminated the need to write that data to disk.

Data writes waiting in the cache can also be read back if they are subsequently requested, which allows yet another cache-related optimization. The type of caching used in Windows NT is called write-back caching, as opposed to write-through caching, which immediately writes data to the disk while preserving it in the cache for subsequent rereads. Write-through caching is used in operating systems that cannot otherwise guarantee the integrity of data on the disk if power is lost while data is in the cache waiting to be written to disk.

Caching is analogous to using your refrigerator to store food rather than going to the grocery store each time you need an egg or a glass of milk. By estimating your future needs, you are able to make one trip out to the slow resource (the grocery store) and store the data (food) you need very close to you in the cache (refrigerator). (Don’t try to extend this analogy to write-back caching, though.)

Note: The caching schemes used in hardware to make your microprocessor run faster operate on exactly the same cache theory as presented here.

Windows NT uses all the memory that remains free after the running processes have the memory they need. Windows NT dynamically changes the amount of memory assigned to the disk cache as new processes are started to ensure the optimal performance boost from caching. Windows NT balances the amount of disk cache and the amount of virtual memory page swapping to optimize the use of physical memory.

Although you cannot change any software parameters to impact caching performance, you can add more memory, up to the limit your motherboard will support. Windows NT Workstation runs best when used with 24MB of RAM or more. Windows NT can make good use of all the RAM you give it.

Performance Monitoring

The Windows NT performance monitor is an amazing tool, unique to the Windows NT operating system, that provides the ability to inspect the performance of just about every process and resource that occurs in your computer. The performance monitor allows you to determine the exact cause of every performance-related problem your computer experiences. Figure 14.4 shows the performance monitor running with some processor and disk activity showing.

Microsoft Exam Objective

Monitor system performance by using various tools.

Cc722566.f1404(en-us,TechNet.10).gif

Figure 14.4: The performance monitor

Performance and the performance monitor are broad topics. An entire book could be dedicated to the various features and the work flow theory used to discern where and why bottlenecks occur. Windows NT automatically makes most adjustments for you though, so that level of detail is not required to make your computer run well for most tasks.

This section explains how the performance monitor works and tells you which indicators to watch in order to quickly narrow down performance problems. You should feel free to play with the performance monitor to see the effect of the different low-level indicators. You cannot harm your system by experimenting with the performance monitor.

Heisenberg’s uncertainty principle states that to measure quantum phenomenon is to change it. This principle is also true of performance monitoring. Running the performance monitor takes a small amount of CPU time, and enabling disk monitoring will slow input/output requests slightly. Therefore, you cannot measure system performance without causing the performance to change slightly. In almost every case, this change in performance is slight and will have no real effect on your measurements or the validity of your conclusions, but you should be aware that it is happening.

Note: Be sure to let your computer finish the various logon processes before using the performance monitor to measure performance. A number of services are started in the background after logging on that will affect performance measurements taken right after booting.

Exercise 14.3 shows how to start the performance monitor. The remaining exercises in this chapter will assume you have the performance monitor loaded before beginning the exercise.

Exercise 14.3

Starting the Performance Monitor

  1. Select Programs Administrative Tools Performance Monitor in the Start menu.

  2. Size the Performance Monitor window so that it takes up about one-quarter of your screen.

  3. Select Add to Chart in the Edit menu.

  4. Click Add when the drop-down box opens with %Processor Time selected.

This value is the measure of how busy the microprocessor is. Leave this measurement running throughout the remaining exercises.

Object Counters

The performance monitor doesn’t actually measure anything. It is only a graphical tool used to inspect the measurements that occur constantly throughout the running processes in Windows NT.

Counters associated with each Windows NT software object are incremented every time that object performs a function. For instance, each time a network device driver reads a packet, the device driver increments the packet read counter by one and the byte read counter by the size of the packet. Or each time the processor switches threads, it updates the time spent in that thread in a counter used for that purpose.

These counters permeate all Windows NT objects, and they allow meaningful measurement to occur by accounting for everything that happens that may be of interest. Windows NT uses many of these counters to measure performance for its own automatic optimizations and is the first PC operating system to include this level of support for performance monitoring. Table 14.1 shows the built-in objects that you can monitor with the performance monitor.

Table 14.1 Windows NT Object Counters

Object

Purpose

Cache

Microprocessor level 2 cache performance

Logical disk

Mass storage performance, including network storage

Memory

Memory performance and usage

Objects

Process and thread counts

Paging file

Virtual memory usage

Physical disk

Hard disk drive performance

Process

Process performance

Processor

Microprocessor performance

System

Windows NT performance

Thread

Individual thread performance

In addition to these, you will see objects for each network service you have installed. Actually, any software can be written to register performance monitor counters with the system, so you may see even more counters than are shown here.

Network Performance is monitored through the Network Segment counters. However, these counters are not gathered until the Network Monitor Agent is installed onto the Windows NT Workstation. Once installed, these counters can be read locally with Performance Monitor or remotely with Performance Monitor, NT Server’s Network Monitor, or SMS’s Network Monitor.

Processor Performance

The microprocessor is generally the fastest component in a computer. In Pentium class and higher computers, the microprocessor is rarely the cause of a bottleneck unless you are running scientific, mathematical, or graphical software that puts a heavy load on the floating point unit of the microprocessor.

Windows NT was designed to run on fast microprocessors. If you are using a computer with a processor slower than a Pentium, you may be experiencing processor bottlenecks routinely.

Monitoring Processor Performance

Monitoring processor performance is simple in Windows NT. As with all performance objects, a few measurements will give you a good idea of whether the processor is a bottleneck in your system. Important processor-related counters are:

  • Processor: %Processor Time

  • Processor: Interrupts/sec

  • System: Processor Queue Length

Processor: %Processor Time

The microprocessor does not become a bottleneck until you see a sustained 80 percent or better level of utilization when watching the Processor: %Processor Time counter in the performance monitor. If after tuning your computer to eliminate processor bottlenecks, your computer still runs in this zone, you need to upgrade to a faster (or another) microprocessor. This counter shows how busy the microprocessor is. The processor will spike to 100 percent at times—this spike is normal and does not indicate a bottleneck. As long as the processor normally runs somewhere between 0 and 80 percent, your processor is sufficient for the work load. Exercise 14.4 shows you how to add this counter to the performance monitor.

Exercise 14.4

Adding Processor: %Processor Time to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Processor in the Object drop-down list.

  3. Select %Processor Time in the Counter drop-down box.

  4. Click Add.

  5. Close the Add to Chart window.

After adding this counter, let the computer sit idle for a moment. Now move your mouse around on the screen and notice the effect on the Processor: %Processor Time measure. Dramatic, isn’t it?

Processor: Interrupts/sec

Processor: Interrupts/sec measures the rate of service requests from peripheral devices. An unusual amount of activity on this counter without a corresponding increase in activity indicates that a hardware component is malfunctioning and is sending spurious interrupts. This counter should operate continuously between 100 and 1,000, but spikes up to 2,000 are acceptable. Exercise 14.5 shows you how to add this counter to your system.

Exercise 14.5

Adding Processor: Interrupts/sec to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Processor in the Object drop-down list.

  3. Select Interrupts/sec in the Counter drop-down box.

  4. Select 0.1 in the Scale drop-down list.

  5. Click Add.

  6. Close the Add to Chart window.

System: Processor Queue Length

System: Processor Queue Length counts the number of threads waiting for attention from the processor. Each thread requires a bit of microprocessor time. A large number of running threads may exceed the supply of processor time, causing the microprocessor to become a bottleneck. A sustained thread queue greater than two indicates a processor bottleneck; too many threads are standing in line awaiting execution, which bogs down the processes that rely upon those threads.

If you try to watch only the processor queue length indicator, you will notice that it always sits at zero. This reading occurs because the performance monitor must be monitoring a thread-related counter in order to determine how many threads are awaiting execution. To see the true value of the processor queue length counter, you must also be monitoring a thread counter of some sort. Exercise 14.6 shows how to monitor the processor queue length.

Exercise 14.6

Adding System: Processor Queue Length to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select System in the Object drop-down list.

  3. Select Processor Queue Length in the Counter drop-down box.

  4. Click Add.

  5. Select Thread in the Object drop-down list.

  6. Select Context Switches/sec in the Counter drop-down box.

  7. Leave Total selected in the Instance drop-down box.

  8. Close the Add to Chart window.

Remember that in order to monitor the processor queue length, you must also be monitoring a thread-specific counter. Context Switches/sec shows how many thread switches occur each second.

Troubleshooting Processor Performance

If you have determined that your processor is truly a bottleneck, you may not be able to find an inexpensive way to fix your problem. Before you run out and buy a new processor though, check your computer for the following common problems:

  • Do you have sufficient external processor cache?

  • Are your internal and external caches enabled?

  • Is the BIOS processor startup speed set to Fast?

Sufficient Processor Cache

Do you have sufficient level 2 cache? Reboot your computer and enter the BIOS. Find the area that describes the amount of external cache your computer uses. Your system should have at least 256K external cache. Some Pentium-class computers ship with less than this amount. If your computer does not have at least this much cache memory, you need to increase it at least to this amount.

Note: Some computers ship with EDO RAM (which is faster than normal memory) in order to eliminate the necessity for an external cache. Unfortunately, EDO RAM does not speed your computer as much as a 256K external cache. Even if you have EDO RAM in your computer, you should add an external cache if you can.

Enabling Caches

Are your processor level 1 and level 2 caches enabled? Using the manual that came with your computer or motherboard, enter the BIOS settings when you reboot your computer and verify that the CPU internal cache is enabled and that the external (or level 2) cache is enabled. If they are not, enable them.

Note: Changing settings in your BIOS without knowing exactly what the setting does may cause your computer to become erratic or fail to work. If you are not an absolute computer genius, have an experienced PC technician make these changes for you.

Deciding What to Upgrade

If after checking both of these things your processor is still a bottleneck, you will need to upgrade to a newer microprocessor or computer. If you can’t get a microprocessor that is twice as fast to work in your computer, don’t bother upgrading the microprocessor. Upgrade the entire computer.

Disk Performance

Disks are the biggest single bottleneck in your computer. Booting, application loading, data storage and retrieval, and swap file performance are all tied to the speed of your disk because disks are so much slower than the processor or memory. For these reasons, the speed of your disk(s) impacts the overall speed of your computer.

As with all performance monitoring in Windows NT Workstation, you can use the disk monitor to profile your disk activity. However, your computer also comes with a performance indicator that works in any operating system: the hard disk drive light. If your disk light is on most of the time under normal working conditions, you need to add RAM. You can’t avoid this solution, and all the performance monitoring on the planet isn’t going to uncover a different answer.

Physical versus Logical Disk Performance

In Table 14.1 you’ll notice two disk-related objects: logical disk and physical disk. Logical disk is used to measure performance at a higher level than physical disk.

The logical disk object can measure the performance of network connections that are mapped as drives and the performance of volume sets and stripe sets that cross physical disks. You will use the logical disk object to uncover bottlenecks initially and then move to the physical disk object to uncover the reasons why that bottleneck is occurring.

Physical disk measures only real transfers to and from actual hard disk drives (or a RAID set in the case of RAID controllers, discussed later in this chapter). This object is used only when you want to isolate performance differences between disks in your system or when you want detailed information about the specific performance of a certain disk.

High-Impact Counters

Disk counters cause a measurable performance degradation by distracting the processor at critical input/output periods. These counters are disabled by default. If you attempt to monitor physical or logical disk performance without enabling these counters, you will not see any disk data.

On Intel i386-based computers, the disk counters cause about a 2 percent degradation in overall performance. You should enable them only when you need to monitor disk performance and disable them when you are finished. Enabling the disk counters is shown in Exercise 14.7.

Exercise 14.7

Enabling the Disk Performance Counters

  1. Type diskperf-y in the input line and press Return. A message will indicate that disk performance counters on the system are set to start at boot time.

  2. Restart your system.

When you have finished monitoring disk performance, remember to disable the disk performance monitors. Leaving them enabled serves no purpose and slows down your machine. Exercise 14.8 shows how to disable them.

Exercise 14.8

Disabling the Disk Performance Counters

  1. Choose Programs and Command Prompt from the Start menu.

  2. Type diskperf-n in the input line and press Return. A message will confirm the change.

  3. Restart the system.

Monitoring Disk Performance

Once you’ve enabled the disk performance monitors as shown in Exercise 14.7, you’ll be able to make meaningful disk throughput measurements.

Important counters you’ll want to watch are:

  • Memory: Pages/sec

  • %Disk Time

  • Disk Bytes/Transfer

  • Current Disk Queue Length

Memory: Pages/sec

Why a memory indicator in the disk performance section? Because the pages swapped in this indicator are written to disk. Leave this counter showing in the performance monitor while watching the % Disk Time to see how dramatically page file performance affects your overall performance. Add Memory: Pages/sec to your performance monitor graph using Exercise 14.9.

Exercise 14.9

Adding Memory: Pages/sec to the Performance Monitor

  1. Select Programs Administrative Tools Performance Monitor from the Start menu.

  2. Click + in the performance monitor toolbar.

  3. Select Memory in the Object drop-down list.

  4. Select Pages/sec in the Counter drop-down box.

  5. Click Add.

  6. Close the Add to Chart window.

%Disk Time

This counter shows how much processor time is spent servicing disk requests. It is a good broad indicator for determining whether or not your hard disk drive is a bottleneck during activities when you would not normally expect to wait for it. Note that this counter is a processor metric, not a physical disk metric. Measure this counter against Processor: %Processor Time to see if disk requests are eating up all your processor time. Use Exercise 14.10 to measure the amount of time used servicing disk requests.

Exercise 14.10

Adding Logical Disk: %Disk Time to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Logical Disk in the Object drop-down list.

  3. Select %Disk Time in the Counter drop-down box.

  4. Click Add.

  5. Close the Add to Chart window.

Disk Bytes/Second

This counter shows how fast your hard disks are transferring data. Turn this counter on and then copy a large directory of files between disks to get a good baseline of the speed at which your disk(s) runs. Exercise 14.11 shows how to monitor this counter.

Exercise 14.11

Adding Logical Disk: Disk Bytes/sec to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Logical Disk in the Object drop-down list.

  3. Select Disk Bytes/sec in the Counter drop-down box.

  4. Click Add.

  5. Close the Add to Chart window.

Average Disk Bytes/Transfer

This metric shows how large the average transfer is. Larger average transfers make more efficient use of disk hardware and execute faster. Looking at this metric will tell you if small transfer sizes are causing your computer to work too hard to write them to disk. Perform Exercise 14.12 to monitor this counter.

Exercise 14.12

Adding Logical Disk: Average Disk Bytes/Transfer to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Logical Disk in the Object drop-down list.

  3. Select Avg Disk Bytes/Transfer in the Counter drop-down box.

  4. Click Add.

  5. Close the Add to Chart window.

Current Disk Queue Length

The Current Disk Queue Length shows how much data is waiting to be transferred to the disk. Many processes must wait for disk requests to be serviced before they can continue. A long disk queue indicates that many processes are being delayed by disk speed. Exercise 14.13 shows how to monitor this counter.

Exercise 14.13

Adding Logical Disk: Current Disk Queue Length to the Performance Monitor

  1. Click + in the performance monitor toolbar.

  2. Select Logical Disk in the Object drop-down list.

  3. Select Current Disk Queue Length in the Counter drop-down box.

  4. Click Add.

  5. Close the Add to Chart window.

Troubleshooting Disk Performance

The best way to eliminate disks as bottlenecks is to use them as little as possible. Add a lot of RAM to your computer to increase the size of your disk cache and reduce the need for swapping pages to disk. This improvement will increase the performance of your computer more than any other.

If you cannot add more memory or if your computer already has all it can use, you will need to take other measures to improve disk performance. Your options are:

  • Use a newer, faster, or higher capacity hard disk

  • Move to a faster hard disk controller interface

  • Create stripe sets across multiple disks

  • Use a redundant array of inexpensive disks (RAID)

Upgrading Your Disk

If your hard disk is more than two years old, you can probably increase your performance by upgrading it. New hard disk drives, especially hard disks larger than 1GB, transfer data quite a bit faster than the drives of just a few years ago. However, if your disk is relatively new, replacing it won’t speed up you system much. Good, fast hard disk drives can transfer data at between 1.5 and 2MB per second. This speed is generally faster than a single hard disk controller, but two or more fast hard disks running on a slow controller can easily swamp it, causing the controller to become the bottleneck.

Faster Hard Disk Controllers

Hard disk controllers impact the speed at which data can be transferred from your hard disk. Original SCSI and IDE both have a maximum limit of 5MB per second per controller bus. New hard disks can exceed this limit. Synchronous SCSI runs at 10MB per second for devices that support it.

Hard disk controllers running in ISA slots also have a hard limit of about 8MB per second. Also, since ISA controllers can address only the bottom 16MB of RAM, disk requests from regions higher in memory must be moved by the processor, creating an additional load.

If you have a SCSI or IDE controller running in an ISA slot and you have a PCI slot available, you should replace the ISA controller with a PCI controller.

Finally, if you are using a PCI controller and you need more speed, consider moving to wide or ultra SCSI. These technologies transfer more data by increasing the width of the SCSI bus from 8 bits to 16 or 32, which doubles the amount of data that can be transferred on the bus. Your disk must support wide or ultra SCSI, or upgrading the controller will have no effect. Table 14.2 shows the performance maximums for various types of hard disk controllers. Note that in all cases the hard disk drives run slower than the maximum speed of the controller but that the controller can be loaded at the sum of the sustained transfer rates of all attached drives.

Table 14.2 Hard Disk Controller Technologies.

Controller Technology

Max Transfer Rate

Devices

BIOS Hard disk (MFM, RLL, ESDI)

8MB/s*

2

IDE

5MB/s

2

SCSI

5MB/s

7

SCSI-2 Fast

10MB/s

7

SCSI-2 Wide

20MB/s

7

SCSI-2 F/W

40MB/s

7

Ultra SCSI

80MB/s

15

*This rate is the theoretical maximum for a BIOS-controlled hard disk running in an ISA slot. In practice, you will not achieve this result. Controllers running in local bus slots may achieve higher burst throughput, but these types of drives will have sustained transfer rates less than 1MB/s.

Stripe Sets

Stripe sets increase the speed of a logical disk by splitting it across many physical disks. Since disks can operate simultaneously, striping allows you to multiply the speed of a logical drive by the number of physical drives it comprises up to the maximum speed of a shared bus. For example, Figure 14.5 shows how Windows NT splits data across physical drives to improve performance. Creating stripe sets is covered in detail in Chapter 5 under the section on the disk administrator utility.

Figure 14.5: A stripe volume across three disks

Figure 14.5: A stripe volume across three disks

RAID

Redundant arrays of inexpensive disks (RAID) works on the same theory as stripe sets. The difference is that a RAID controller replaces your regular SCSI controller and makes the stripe set look like one physical disk to Windows NT.

RAID controllers include a microprocessor that handles breaking up and recombining the disk data so that the computer’s microprocessor doesn’t have to. Most RAID controllers also have some RAM used as a cache to increase the speed of transfers to and from the controller. This cache works the same way as the Windows NT cache described in the memory optimization section.

RAID controllers essentially perform the same service as stripe sets, but because they relieve the computing burden of stripe sets from the processor and add a memory cache dedicated to disk transfers, they can help relieve processor bottlenecks. Unfortunately, they are very expensive. RAID controllers are generally used only in servers. RAID is covered in depth in the companion book MCSE: NT Server Study Guide.

Application Performance

You can change the performance of applications running in Windows NT to optimize the responsiveness of applications for your situation. Windows NT automatically changes priorities for processes (and therefore, their descendant threads) based upon what the user is doing. When you bring an application to the forefront, Windows NT automatically raises the priority levels of its processes to ensure a quick response to your requests.

Microsoft Exam Objective

Start applications at various priorities.

Remember that boosting priorities changes only the way the processor divides time among running processes. If you are using only one application, you are competing for processor time with system processes that must be serviced in a timely manner. Raising priorities when you are running only one application will not make the application run faster because the processor is already dedicating all of its free time to the application.

Changing Default Application Responsiveness

If you normally run many different applications and you need them to all operate simultaneously with the same speed regardless of which one is in the forefront, you can change Windows NT Workstation’s default behavior. You can also manually launch applications with higher than normal priority if you need to increase the time spent in that process.

The performance boost slider has three settings. Maximum boost provides the best foreground application responsiveness by increasing the foreground application’s processes by two priorities. The middle setting makes foreground applications somewhat more responsive than background applications by increasing the priority level by one. None (No boost) makes foreground and background applications run with the same priority. Exercise 14.14 shows how to change default application responsiveness.

Exercise 14.14

Changing the Default Application Responsiveness

  1. Select Settings Control Panel in the Start menu.

  2. Double-click the System control panel.

  3. Select the Performance tab.

  4. Slide the Performance slider from Maximum Boost to None.

  5. Click OK or Apply.

  6. Answer Yes when asked if you want to restart your computer.

Remember to slide this Performance tab back to maximum when you are working under normal circumstances.

Launching a High-Priority Process

You can launch an application with a higher-than-normal priority using the Start command at the Command prompt or in the Run dialog box. Start also allows you to run Win16 applications in their own memory spaces so that if one 16-bit application crashes, it does not affect other 16-bit applications. Exercise 14.15 shows how to start processing with other-than-normal priority. ~MS

Exercise 14.15

Starting Processes with Other Than Normal Priority

  1. Select Programs Command Prompt from the Start menu.

  2. Insert the CD-ROM that came with this book into the CD-ROM drive. Change drives to the CD-ROM drive. (For example, if your CD-ROM shows up as drive F: you would type F: and then press Enter at the command prompt.)

  3. Type start /low /exercise/globe32.exe.

  4. Select the command prompt window and type start /normal /exercise/globe32.exe.

  5. Select the command prompt window and type start /high /exercise/globe32.exe.

  6. Select the command prompt window and type start /realtime /exercise/globe32.exe.

  7. Select the command prompt window. Notice the difference in execution speed among the four instances of the program.

  8. Close each instance of PROGXX.EXE.

Chapter Summary

Windows NT provides low-level support for performance monitoring by including counters in every object that can be meaningfully measured. Windows NT uses these counters to perform a number of automatic optimizations, such as multiprocessing, spreading virtual memory swap files across multiple disks, prioritizing threads, and caching disk requests.

Windows NT also provides a performance monitor tool to allow you to measure system performance through object counters. You can use the performance monitor to find bottlenecks, or performance-limiting resources, in your computer. The performance monitor allows you to inspect the value of the object counters in real time so you can watch the effect that various activities have on the resources of your computer.

Tuning a computer’s performance is a perpetual cycle of finding performance bottlenecks, eliminating them, and starting over with the next most limiting factor. When a computer can no longer be tuned for greater performance, it is at its natural performance limit for the software being used.

To effectively find bottlenecks, you must look at the overall performance of your computer under a typical load. Using more general counters and averages will give a good indication of where to look for specific bottlenecks. Processor performance, memory performance, and disk performance are the three major capacities that should be checked for performance.

You can use the System control panel’s Performance tab to change application performance to meet your specific software requirements. This tab allows you to change the priority of foreground applications to more effectively share processor time for your specific needs.

Review Questions

  1. Your computer system is a Pentium II 300 MHz CPU, with 32MB RAM, and a 4GB IDE hard drive hosting Windows NT Workstation. Using the performance monitor, you discover that the disk queue length metric is high and that your system often waits for the disk to retrieve or write files. Which of the following changes to your computer system is most likely to result in improved performance?

    1. A second CPU.

    2. Replace the current drive with a high-speed SCSI drive.

    3. Add another additional duplicate IDE drive and create a stripe set.

    4. Add more physical RAM.

  2. You have several utilities on your Windows NT Workstation computer that analyze data and perorm complex calculations. The data from these utilities is saved in a text file. While the results of the utilities are important, they can take hours to complete. You notice that if you attempt to perform other normal activities such as check e-mail, type a document, and download files while these utilities are executing, the system is very slow. You need to be around while the calculations are performed, just in case the utilities encounter errors or require user interaction, but you also need to perform other work. You really can’t afford to sit around and wait, so what can you do to improve your situation?

    1. Increase the foreground priority boost on the Performance tab of the System applet.

    2. Launch the utilities in a separate memory space.

    3. Launch the utilities at a lower execution priority: start /low <application>.

    4. Decrease the size of the paging file.

  3. What performance monitor counter will read zero until another thread-related counter is watched?

    1. Processor: Interrupts/sec

    2. Physical Disk: Current Disk Queue Length

    3. Memory: Pages/sec

    4. System: Processor Queue Length

  4. Installing a high-speed disk controller and establishing a multi-volume stripe set on your Windows NT Workstation computer is a possible solution for what?

    1. An underused CPU

    2. A storage device bottleneck

    3. A slow network connection

    4. Excessive paging

  5. What does the following command do?

    start /low <application>

    1. Adds an application to the Start Menu on the lowest level.

    2. Launches an application minimized.

    3. Launches an application at low execution priority.

    4. This command is not correct, it should be: NET USE START /R:low /P:<application>.

  6. Your Windows NT Workstation computer seems to be thrashing your disk rather often. You use the Performance Monitor to determine that the virtual memory manager is swapping memory pages so much that it accounts for more than 30 percent of the disk activity. Your system is a Pentium II 300 MHz CPU, with 16MB RAM, and a 4GB Fast-Wide SCSI hard drive. Which of the following is mostly likely to improve the performance of your system?

    1. Add an IDE hard drive.

    2. Increase the size of the paging file.

    3. Increase the physical RAM.

    4. Install a second CPU.

  7. Your computer system is a Pentium II 300 MHz CPU, with 32MB RAM, and two hard drives (a 2GB IDE hard drive used for the system files and a 4GB high-speed SCSI drive used only for data) hosting Windows NT Workstation. A current project requires you to use four memory intensive programs all at the same time (this allows you to cut and paste material from one to the other). Which one of the following changes to your computer system is most likely to result in improved performance?

    1. Adding a second CPU

    2. Moving the paging file to the faster hard drive

    3. Installing a faster video card

    4. Increasing the size of the paging file on the IDE drive

  8. The performance monitor is a utility of Windows NT Workstation that is used for what purpose?

    1. To filter network packets

    2. To monitor user’s access trends to identify security breaches

    3. To inspect the performance activity of software and hardware

    4. To directly improve the performance of a system’s CPU through x86 emulation

  9. Using performance monitor, you monitor your Windows NT Workstation during normal activities. You notice the following counters and their average values:

    • System: Processor Queue Length - 0 or 1

    • Memory: Pages/sec - 130

    • Processor: % Processor Time - 20%

    • Physical Disk: % Disk Time - 90%

    What problem do these values diagnose?

    1. The CPU is too slow.

    2. There is insufficient storage capacity.

    3. The system is paging too much, thus there is too little physical RAM.

    4. The CPU’s math co-processor has failed.

  10. Your computer system is a Pentium II 300 MHz CPU, with 64MB RAM, and a 4GB Fast-Wide SCSI hard drive hosting Windows NT Workstation. You are using a multi-threaded database program to perform some complex cross-referencing searches. Each task takes upwards of 10 minutes to complete. You use the Performance tab of the Task Manager to discover that the CPU utilization is around 98 percent during each database task. Which of the following changes to your computer system is most likely to result in improved performance for this application?

    1. A second CPU

    2. More physical RAM

    3. Increasing the size of the paging file

    4. Adding a second hard drive

  11. While using the performance monitor, you notice that your %Disk Time remains at 75 percent or more and the Current Disk Queue Length is often above 5. What does this indicate about your system?

    1. You have a processor bottleneck.

    2. You have a memory bottleneck.

    3. You have a storage device bottleneck.

    4. Your system is performing optimally.

  12. Launching an application with what level of priority can cause the mouse and keyboard inputs to be significantly delayed or have sporadic activity and the screen fails to update smoothly?

    1. Low user

    2. High user

    3. Realtime

    4. Normal

  13. One of your applications seems to be running slower than usual. You suspect that it is so slow due to the size of the database that it must manage to produce the desired output. You close all other applications but notice no significant improvement. You terminate the application and re-launch it with the following command:

    start /realtime <application>

    What efect does this have?

    1. Nothing different; this command launches the application at its default priority.

    2. The application may demand so many system resources that mouse movements and keystrokes are significantly delayed.

    3. The application is launched at the highest possible priority level.

    4. This command simply re-configures the application’s PIF; you must launch the application with its shortcut to see the effect.

About the Authors

Charles Perkins is an MCSE with years of experience managing local and wide area networks. Co-author of four Network Press MCSE Study Guides, he is now a consultant specializing in Windows NT.

Matthew Strebe, MCSE, is co-author of four Network Press MCSE Study Guides and owner of Netropolis, a network integration firm specializing in high-speed networking and Windows NT.

James Chellis, MCP, is President of EdgeTek Technical Education, a national network training company and Microsoft Solution Provider specializing in Windows NT.

© 1999 Sybex Inc. All Rights Reserved.

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice.

International rights = English only.

Link
Click to Order


Did you find this helpful?
(1500 characters remaining)