NT Performance Tuning Techniques: Practical Applications
|Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.|
By Mark T. Edmead and Paul Hinsberg
Chapter 10 from Windows NT Performance: Monitoring, Benchmarking, and Tuning, published by New Riders Publishing
This chapter covers the following topics:
Guide to Performance Analysis. This section summarizes many of the rules and tips discussed throughout the book.
Performance Tuning: Staying Ahead of the Game. This section takes performance monitoring to a different level, using it to anticipate need instead of resolving immediate problems.
Performance for the Small Business Office. This section discusses some of the issues that small business offices face.
Additional Resources on NT Optimization. Many places offer additional help, sometimes for free. This section reviews some of my favorites and discusses their merits.
In this chapter, you will find a review of sorts, bringing together the tips and techniques explored throughout this book. You will combine the techniques to provide a rare service to yourself and your company, proactive performance maintenance. You will also have an opportunity to see how techniques might differ in a small business or single office scenario. Lastly, because no one can know everything, I'll mention some of the Web sites and information resources used in troubleshooting and writing this book.
On This Page
Guide to Performance Analysis
Throughout this book, you have been subjected to all sorts of information regarding the analysis of NT and the techniques for bottleneck detection. This chapter attempts to describe the basic framework that you have already put to use throughout this book. tPerhaps you will learn a few new things, and perhaps you will just be reminded of a few old ones you forgot. Reviewing these topics certainly can't hurt. Let us begin with what I, along with my seventh-grade science teacher, like to call the scientific method.
The Scientific Method
Whenever you are attempting to figure out a problem or expose a model of behavior for some process or mechanism, you use a process. Some people are consciously aware of this process; others call it intuition or guesswork. In any case, each of us has a process for figuring out how the world around us works. When troubleshooting computer problems, you use a process as well. However, as problems and variables become more complex, the process of figuring out the behavior of the problem must rise to the occasion. It is no longer sufficient to "guess," rely on old methods, or try a couple things and then give in or reinstall. If you really want to be good at troubleshooting, you must first instill within yourself a desire to never surrender. I like to think of it as a "savage pursuit of the truth." When you have such a desire, you do not expect or accept the response "it just works that way." Such an answer must be first backed by hours of relentless testing to ensure that no answer can be deduced or exposed through experimentation. I can promise you that the problem to which you simply say, "It just works that way," will be the one that pops up repeatedly and at the most awkward moments.
The scientific method is a simple set of rules roughly applied to problem solving. If you follow the rules, you can solve almost any problem or at least be sure that you have exhausted all the avenues to a solution, which then warrants calling in other resources if the problem is worth solving. The model for the scientific method follows:
Get the overall picture.
Formulate a hypothesis.
Test the hypothesis.
Refine the hypothesis.
Design a solution.
Test a solution.
Reevaluate the system (return to step 1).
Getting the Overall Picture
Think about what the problem is and why you want to solve it. For example, developers are having a problem connecting the SQL database system. Their connections periodically drop while they are using TCP/IP sockets. The named pipes appear to be working correctly, however. They could use named pipes connections without worrying further. Is there a problem?
Your first reaction should be to start asking questions. Why are they trying to use TCP/IP sockets at all if they know named pipes work? What is it costing the developers in time and effort while this solution is not working? These questions are important. They help to describe the motivation and effort level you might need to put into a solution. Perhaps the developers are using TCP/IP sockets because clients use that type, or perhaps TCP/IP sockets offer other features that the other connection type doesn't. Cost can never be ignored. Every problem and every downtime has a cost. If a development team sits idle for 30 minutes, you can bet that is going to cost the company in real dollars and cost the developers in frustration and motivation. However, suppose only one developer is experiencing the connection problem. He is trying to use the TCP/IP sockets to satisfy his own curiosity and not for any job requirement. This puts the issue in a different light.
Getting the big picture is more than just assessing cost and how much effort you must put into a solution. After you decide to actually tackle the problem, you must assess the entire situation, not just the details. All too often, we lunge at the problem without first looking at what is going on with the entire system or network. It is like chasing your child's pet rabbit in a field because she left the cage door open again. You get so focused on trying to grab the rabbit you don't realize you are rolling around in poison ivy.
In a more topical analogy, consider a client I was working with. The client had a server failure. The application that the client was trying to make work simply was not working. The application was a client/server database application. The person troubleshooting the problem focused on the server and the network card. He replaced the card, reconfigured the server, and reconfigured the software until he had exhausted all options. This server had given us network problems in the past that were hardware related. The issues had concerned the chipset on the motherboard for the PCI interface. The server was old and was under consideration for replacement in the next budget cycle. However, with this sudden failure and the support person failing to find a solution, the company decided to buy a new server. While the buyers were out shopping, a more senior engineer came in, looked at the system, and plugged the ethernet cable back into the hub where it had become loose. Wham! Everything started working. The lesson is that you always evaluate the entire situation.
In the world of optimization and bottleneck detection, what this means is that you want to get a complete picture of the computer, the purpose the computer was built for, and the environment where the system is running. If the server is an application server, you expect certain resources, such as the processor, to be more heavily utilized than the disk. A file server, as you have seen, is more active on the network and the disk than the memory or processor. (Chapter 6, "Optimizing CPU Performance," presented this information originally.) You will want to know about the network and the client software that the system is exposed to. These components affect the server directly. Chapter 9, "Optimizing Network Interface Performance," demonstrated how the network and NIC interrupts can affect the processor. After collecting all that information, which actually takes only a few moments, you will want to start the Performance Monitor. Take a look at the four basic computer groups:
CPU. Processor object : % Processor Time counter
Memory. Memory object : Pages/sec counter
Disk. PhysicalDisk/Logical Disk object : Disk Queue Length counter
Network. Network Segment object : % Network Utilization counter
Taking a look at these counters will give you an overall picture of what resources are being utilized.
Formulating a Hypothesis
Formulating a hypothesis is basically a fancy way of saying that from what you have learned, you take a guess at where the problem lies. From reading Chapters 6–9, you should have a good idea where the numbers are leading you. For example, if the Memory Pages/sec is 23 and is sustained over the duration of your observation, memory issues are the likely target. At this point, don't make your guess, or hypothesis, too refined. You do not want to box yourself into a cause without further investigation.
Testing the Hypothesis
Testing the hypothesis is little more than taking the next step. You adjust the counters on the Performance Monitor to turn a critical eye on the component that is probably the problem. In the preceding example, memory seemed to be the issue. What do we know about memory problems? Chapter 7, "Optimizing Memory Performance," indicates there are basically three possibilities:
You just don't have enough memory.
You have a memory leak in some application or service.
You have a misconfigured application (or poorly written one).
At this point, you should investigate all the possibilities using the techniques of this book. Based on what you learn, can you guess what you will do next?
Refining the Hypothesis
Adjust your hypothesis to better suit what you have observed. Let us continue the memory issues scenario. You see that the available memory is low and decreasing ever so slightly as you observe the system. It might be a memory leak. Next step: Find out which application is creating the problem. At this point, you will end up spending a few cycles in step 3, "Testing the Hypothesis," and step 4, "Refining the Hypothesis." You will continue to refine your guess until it is no longer a guess. You end up with the solution to which application is causing the problem. You may even end up with a pretty good guess of what type of code the programmer wrote to produce the problem. Furthermore, because you went through this process step- by -step, you have proof or validation of your claim. All right, now what? You design a solution, of course!
Designing a Solution
Based on all the information you have collected, you will usually have a pretty good idea how to fix the problem. You may be able to adjust some configuration of NT; adjust the program, service, or driver; or simply stop using the application causing the problem. Keep in mind everything that you do will affect the system somehow, usually in more than one way. You might reconfigure the application based on information from a vendor white paper. You might make a Registry change based on a Microsoft white paper. You might upgrade software components or drivers. You might do a BIOS upgrade on an adapter card or system BIOS. You might do nothing more than put a service call into the vendor regarding the problem.Whatever solution you apply, you want to make sure you do it gradually. Avoid trying multiple things at once. For example, the vendor says updating the driver might fix the problem, but you also find a Microsoft article that says changing a Registry setting may alleviate the problem. Do you do both? No, of course not. You test one possible solution at a time.
Testing a Solution
You may stay in a bit of a loop between step 5, "Designing a Solution," and step 6, "Testing a Solution," for a while. However, I am sure you want more than to simply fix the problem. You must know exactly what the problem is so that in the future you can recognize it and resolve it more quickly. Computers, software, and users come in all sorts of configurations. The same problem may pop up in slightly different forms within your enterprise. If you understand the true nature of the problem, you will be able to detect it even though it may be camouflaged in the dense brush of varied software configurations and user preferences. At some point, you will get to a solution. Then, you are done, right? Wrong!
Reevaluating the System: Back to Step 1
It's time to return to the big picture. More than any other step, the reevaluation step is often missed in a rush to fix a problem and move on to the next item on the agenda. Remember that everything you change on a computer is bound to affect some other component or resource. You must reevaluate the entire system before dusting off your hands and calling it a done deal. You may have fixed the memory issue but caused processor performance to suffer because of it. Say you decide to add more memory because the system was simply being overworked by the growing use of the applications by the users. If you don't reevaluate the system, you might not realize that the processor is running less efficiently. You forgot that you should check your L2-Cache levels, and you have a new problem with L2-Cache/memory ratios that is creating overhead for the processor (see Chapter 6).
Using a scientific process of setting a hypothesis and experimenting will help you treat problems consistently and achieve accuracy in finding solutions the first time. You will also begin to see your hypothesis/testing and solution/testing loops get shorter as you improve in your ability to guess or hypothesize. No matter how good you think you are, never skip a step. The one time you skip the step, you'll end up having to go back and fix a problem you caused because you did not properly reevaluate or test.
Now that you have a method of problem evaluation and resolution, let's review some general tips and tricks for performance monitoring.
Performance Monitoring Considerations
The last section discussed the rules for working through a problem. The topics in this section deal with a more specific issue of performance monitoring. You must keep in mind some general bits of wisdom while you are performance monitoring. You will perhaps guess some of the tips from reading the rest of this book; others may not have been so obvious.
One Bottleneck Masking Another
Masking is generally the principle behind "getting the big picture." Remember to think about how the various hardware and software components relate to one another. Memory problems might really be memory problems, but they might also be manifestations of a disk problem. If the VMM cannot get information to the disk quickly enough because of contention or bad pagefile configuration, don't blame the memory. Also, if you notice that the disk has an excessive disk queue, your first thought might be that you need a better disk subsystem (controller and disk). Of course, after reading this book, your first thought should be to make sure that the disk queue isn't being caused by excessive page faults.
How about this one: A user complains that since she installed IE 4.0, her system has been exceptionally slow. She knows a little about the NT Performance Monitor, so she shows you how the system uses almost 75% of the CPU time since adding IE 4.0. She has a Pentium 100MHz machine—not the fastest, but such awful performance can't be related to speed. Are you thinking of the big picture yet? You start Performance Monitor and look at all the counters. You notice that not only is the processor at 79%, but also the Processor: % Interrupt time is 55%. Did IE 4.0 do that? No.
However, you find out that the user installed the full version. You start IE 4.0 and point it to the user's machine because she installed Peer Web services. She built her own Web page to offer a cute little game for downloading. Needless to say, eight other folks downloaded the little game, which wasn't so little. This older machine had an ISA NIC and an ISA EIDE controller. The NIC and disk controller interrupted the processor so much that it had a noticeable effect on the performance of the system.
Seeing a Bottleneck Because of When You Are Looking
This book provides many case studies. Don't be fooled by these seemingly straightforward examples. It takes time to track down a bottleneck on a system. NT is as complex an operating system as any other, so the bottleneck causes can be elusive. Become familiar with not only interactively observing performance, but also logging data. The Performance Monitor will log information to a file so that you can view it later. This feature is extremely helpful when you have more than one user.
Understanding Performance Calculations
Your mother told you that you need rudimentary math skills. Although most of the performance-monitoring calculations you need to understand are rudimentary, you should get used to looking at fractions, percentages, and ratios. You may also want to get a small book on statistics. If you are really going to get down and dirty in the dark regions of NT performance, you may find yourself in need of a correlation coefficient or two. You might even find yourself running an all-out linear regression test on some data. Hey, it could happen! Luckily, some software packages are readily available to do this kind of analysis for you. Spreadsheet programs such as Lotus 123 and Excel can do the job. Other programs offer more sophistication, but spreadsheets usually are sufficient for the task.
Ratios and Minor Calculations and the Details of a Bottleneck
Speaking of calculations, some of the most minor calculations often have revealing results. Often, it is difficult to conceptualize some of the absolute numbers that you will get from the Performance Monitor. For example, if you are getting 24 Page Faults/sec, is that good or bad? Comparing the Input Pages/sec with the _Page Faults/sec gives you the percentage of hard page faults. This percentage usually has more meaning than the raw number, but it also leads to the next tip.
Averages and Generalities
You no doubt recall that among the available counters in Performance Monitor are average values. You must also be aware that some counters are averages that don't explicitly say that they are averages. This is especially true of some of the raw counters, such as Disk Queue Length. Averages will reveal trends and general activity but will hide details about what is going on.
Using the Time Window in Performance Monitor
As much as generalities lack details, using too many data points while viewing a chart also hides the details. Remember that the chart will display only 100 data points across. If you have 10,000 data points in a performance log file, you will see only every 100th data point. You could be missing valuable details. To avoid this, you use the Performance Monitor Time window. The steps for using the Time windows effectively follow:
Set the alert for appropriate values (such as % Processor Time > 80%).
Adjust the Time window to focus on the value in question.
Adjust the width of the Time window so that it includes 100 data points or fewer to see more details. When you have fewer than 100 data points, the data ends before the right side of the screen.
Understanding the Heisenberg Principle
Roughly speaking, the Heisenberg principle states that the act of measuring something inherently changes the results. Whenever you use Performance Monitor, you are adding to the drain on resources. The trick is to not affect the same component that you are trying to analyze. For example, suppose you want to check the throughput on the disk drive you just installed. While doing this, you do not want Performance Monitor to be logging data to the same physical drive on which you are testing throughput. This would skew the data. No matter what you do, you will affect the performance. Even if you monitor the system from a remote location, you are still affecting memory, network, and processor resources.
Bottlenecks at Less than 100 Percent Utilization
As documented in Chapter 6, "Optimizing CPU Performance," and Chapter 8, "Optimizing Disk Performance," many components are based on servicing a queue. Remember the clerk at the grocery store? The effects of queuing theory basically dictate two important truths:
Random use of a resource for irregular intervals can form long queues at less than 100% utilization.
The more regular the intervals and the more consistent the duration of use, the more you can accomplish at a lower utilization.
Testing Incremental Changes and Controls
I know you got an earful of this in the last section, but I just think that it is exceptionally important in troubleshooting, performance monitoring, and life in general. When you are trying to correct a problem, make sure you follow a set of rules, as outlined in the previous section. Incremental changes will allow you to exactly determine what fixed the problem and also help you confirm the cause of the problem in most cases.
Systems and Their Differences
Machines all operate differently. Considering all the hardware options, software revisions, BIOS revisions, configuration options, and just plain human fallacy, there seems actually to be little chance that any two systems will behave in an identical fashion. Although you can expect some system to operate within certain parameters, you must avoid hastily making generalized statements about what is good and bad performance. A machine's performance depends on what it is being used for. You may recall that Chapters 8 and 9 place a lot of emphasis on getting performance baselines. Baselines serve as a reference point for comparing other values. The concept of baselines and their uses is covered in more detail in the next section.
Performance Tuning: Staying Ahead of the Game
You have read about baselines and profiling your systems several times throughout this book. In this section, the concepts are brought into full bloom. Almost every situation with computer performance tuning requires you to have a reference point, a place where you completely understand the conditions that the machine was working under. When you have this information, you have a baseline of what the performance of the machine is supposed to be. When you compare a more recent collection of performance data with the baseline, you can tell how the usage of your system has changed over time. You can also tell where you are getting into trouble with potential bottlenecks. The process is collectively called capacity planning.
The goals of capacity planning depend on the goals and needs of your company and department. In general, however, the idea is to provide a structured approach to collecting data, performing regular analysis, and maintaining records. The goals ensure that you can predict and provide for future system and resource demands prior to the demands affecting the performance of the systems and the business in general.
The first action to perform in capacity planning is to make sure that the personnel with a stake in the operations of the computers are aware of the goals. System administrators have long fallen into the pitfall of being reactive. System administrators react to problems. Your printer does not work; call the system administrator. You can't connect to a network share; call the system administrator. You just installed a new game and it messed up your entire system; call the system administrator and go to lunch before he gets there. System administrators are the fixit folks. It breaks; they fix it. For most cases, this might be a decent concept for the end user. For a company's administration to have the same concept is detrimental to the department and the health of the company's critical computer systems. For this reason, capacity planning goals and procedures should be documented and regularly put in front of the company administration.
Capacity planning is proactive. It is a process whereby the company's business side and computing side attempt to work together to achieve a common goal of a stable computing environment that leads to a successful business. Yes, I know you probably cringed at documentation and becoming involved with the business side of the company. All that I can say is, too bad. It is no longer sufficient, if it ever was, for a company simply to add computers as the business grows. A concept of the system's current capacities and abilities is required as a first step in knowing what needs to change in the coming years to accommodate a growing and changing business. Adaptation to adversity seems to be a key phrase separating the successful business from the failing business. Computer systems in many cases play a key role in a company's ability to adapt. After you have the first documentation, you will want to continue a trend of documenting and record keeping, which make up the next step to capacity planning.
Write it down. Whenever you make a change to a server, router, or other network component, you should write it down. Basically, you should have a binder containing a collection of information about each system. Whether this binder is a physical folder, a three-ring binder, or a set of electronic documents is irrelevant—as long as there is one for each component on your network. You need this binder system no matter how many administrators there are. For a group of many administrators, you want to make sure that each person is aware of changes made to servers or other components. This alleviates the problem of trying to find out who changed whatever is making the system perform so much differently than it did before rebooting. For a site with only one administrator, record keeping is especially critical because people move on, sometimes abruptly. Without information about each of the systems, the new administrator has no starting point. What should the binders contain?
Log pages. Log pages are useful so that people can record changes to anything on the system.
Printout of IPCONFIG.EXE/All. This printout will show all the computers' TCP/IP configuration settings.
WINMSDP.EXE. This is a utility from the NT Resource Kit. It pipes the contents of the Windows NT Diagnostic Tool into a text file. The information will contain hardware settings, services loaded, drivers loaded, the OS version information, and more.
Screen capture of Disk Administrator. This record gives you an idea about the disk's configuration in case the system crashes hard. You can find this information in other places, but the graphical display seems easiest to understand.
Now that you have a binder, you will need to create at least four disks for _emergencies:
Emergency repair disk
NT boot disk
Disk configuration disk
Boot sector disk
You need an emergency repair disk for each machine because they tend to be pretty particular. You use a utility called RDISK.EXE to create them. RDISK.EXE is installed when you install NT, but no icon is created for it. You will find it in the %SystemRoot%\System32 folder. When you run the utility, make sure that you click update the repair information prior to making a new disk (see Figure 10.1).
This updates the information from the Registry in the emergency repair file before updating the disk.
The NT boot disk is a necessity if you are using any kind of mirror set. This disk must be formatted on an NT Workstation or Server and contain the following files:
The idea is that you can have an NT boot disk on a FAT file system that you can carry around and edit if necessary. Suppose a mirror set fails. You need to boot from the alternate disk. If you had not configured the BOOT.INI on the hard drive to do so, you would not be able to reboot the machine. If your boot partition is NTFS, you would not be able to edit the file easily. You can, however, boot the system from a floppy that contains the right files. You have to modify the BOOT.INI on the floppy to make sure that the path names correctly point to the system partition where you installed the Windows NT. Details on the path names are supplied in Chapter 3, "Simulating System Bottlenecks," and Chapter 7, "Optimizing Memory Performance."
You will generate the disk configuration disk from the Disk Administrator to contain information regarding the current systems and disk configuration information from the Registry.
For the boot sector disks, if you want to get a little fancy, you can use the Disk Save (DISKSAVE.EXE) or Disk Probe (DISKPROBE.EXE) programs to save the boot sector from the hard drive. You can avoid going through the NT install and using the repair option, which is often time consuming. The two utilities are both from the NT Resource Kit. The Disk Save program is DOS-based and thus is useful when the system cannot boot at all and you don't have an NT boot disk. Disk Probe is a graphical tool that allows you to perform low-level editing on the disks, which is useful in a couple of ways. You can make repairs to the Master File Table (MFT) on the NTFS partitions. It is also handy for creating images of floppies. You can create the image of a DOS boot floppy and save it as a file. Then, if you need to make a DOS disk, you can do it from NT.
So far, I've mentioned a lot of information for record keeping, but it is still not enough. As silly as it may sound, keeping a regular journal is a good idea. It isn't a personal diary; it is a log of the events that affect the computer systems currently or in the future. Note business activities that you have been informed of. Note when you have installed new machines on the network. Write notes about moving hardware around. This journal doesn't have to be an extravagant service call system, although it can contain service call information as well. Typical journal entries might resemble the following:
Company announced a merger with another small company 40 miles north of us. No further announcements indicated; however, we anticipate having to integrate the two networks.
This entry is simple, yet says a lot. Behind these lines is a host of preparation to do. Another example might be:
Date: Another Day
Connected WAN through router-5. No further user or server configuration changes were needed. Server seemed to be able to communicate now.
Clearly, this entry records a critical change to the network that would not get recorded in a server log or anywhere else. You capture such events in the journal. It is handy to refer to your journal if you require information about other events or what has occurred in the past. You can always count on some user announcing two months after you made a change, "Oh yeah, my computer hasn't worked ever since I saw an email from you about some WAN thing." Your log should say what, when, and where things changed that may have upset the user's connection or workstation.
Keeping records is hard work. It is simple but hard. Sticking to your guns and recording information in binders, updating disks, and keeping journal entries can seem more like interruptions. However, when you need them, you will be pleased with yourself for making the effort.
After finishing all of the other information collection, you are finally ready to consider collecting performance data. In this case, NT's Performance Monitor will do most of the busy work, but setting it up to collect the information will take some thought.
The starting point of collecting the data is the baseline. The baseline is the performance of the system under controlled circumstances. This is your reference point. When you first install a server, you will get a baseline of the server before you ever put it into production. You will install all the services and software that you are going to run on the machine and then establish a baseline for the performance of the system. At that point, you will get an idea of how the system will perform under stress. Remember Chapter 3? We talked about simulating activity on a machine. The time to do it is before the system ever gets into the greedy paws of the users. You can then compare the results of the system under no load, under a regular load, and under extreme duress. This exercise will help you see signs of overuse or unusual usage before they become a problem. How to collect this data is the topic of this section.
Data collection is a theme throughout the book. In your efforts to maintain the performance of your systems, you need to constantly monitor them. You could spend your entire day staring at the Performance Monitor and waiting for something bad to happen, but somehow I don't think you'll have the time to do that. Performance Monitor offers a couple methods for collecting the data.
The first method simply uses a remote machine to monitor the system or systems that concern you. You might have a low-powered machine such as a 486. You probably also have a spare disk or two. That is all that you need to collect information from your machines. Install NT on the machine. Start Performance Monitor in Log view. From here, you will collect the data to a log by following these steps:
Add objects to the Performance Monitor for logging.
Set the log file storage options.
Calculate the log file size.
Start the logging procedure.
Store log file information and relog if necessary.
Adding Objects to the Performance Monitor for Logging
First, you press the plus sign (+) and add some objects to the Performance Monitor (see Figure 10.2). Make the computer selection in the dialog box first. Other counters will show up only if you have the proper computer selected. The most basic baseline or performance statistics include the following objects:
PhysicalDisk or LogicalDisk
You will set more objects based on the type of system that you are monitoring. For example, you might want to include SQL Server objects or Exchange objects according to the primary function of that particular machine.
Setting Log File Storage Options
The next step involves setting the Performance Monitor options to store a log file in a location of your choice (see Figure 10.3). You will also set the interval for collecting data. For now, set the collection time to manual. At this point, you do not want to start the collection.
Calculating the Log File Size
You will want to calculate the size of the log file prior to actually doing the collection. The calculation is simply:
# of Samples x Size of the Samples = Disk Space Required
The number of samples is a combination of how often you are going to take samples and how long you plan to take the samples. For example, you are going to take samples every 15 seconds for the next hour:
1 sample/15 seconds x 60 secs/1 minute x 60 minutes/hour x 1 hour = # of Samples = 240 samples
You then need to know how big a sample will be. First, check the size of the log file now in the Performance Monitor Log interface. Record the value. Recall that you set the interval to manual. In the Performance Monitor Log interface, click the snapshot icon (the camera). Click the camera four times and record the result. Then, subtract the first value from the last and divide by four. (Every fourth collection contains a little more administrative data that adds to the size of the log.) Multiply the new value by the number of samples to get the amount of disk space that you need.
Starting the Logging Procedure
The next procedure involves simply logging the data. Access the Log options and adjust the interval. Fifteen seconds is a good interval for general logging. For more specific processor or memory problems, you will adjust the collection interval to every second. Keep in mind that such a setting will result in an increased number of samples and thus a larger log file. Then, start your logging.
From a single computer, you may log multiple machines at the same time. All you have to do is start multiple instances of the Performance Monitor. Performance monitoring is not that taxing on a machine, with the exception of the disk activity. You can usually monitor five or six servers from a single 486MHz machine with a PCI Disk controller without fear of crippling the performance to the point of jeopardizing the statistics you are collecting. Remember that the Performance Monitor is just another program. If the machine where you are running it starts to have problems, the collection process may experience problems as well. While you are doing this kind of monitoring, you need a user to stay logged into the NT Workstation that the collection is running from. You can log in and lock the workstation.
The other way to perform the collection is to run a service to take care of the process. You can use the Monitor service from the NT Resource Kit, which consists of two pieces. The first piece is DATALOG.EXE, which is the service that does the collection. The second piece is MONITOR.EXE, which is a command-line interface to the service. Because MONITOR.EXE is command line, you will run it from a batch file. You should be aware of a few procedural items when using the performance monitoring service:
The service runs on the server whose data is being collected.
The service will only be able to log data to a local file on the server. You want to make sure that you are logging to a disk where you are not watching the performance and that you have the appropriate amount of disk space.
The MONITOR.EXE interface can be run from any remote workstation.
To set this up, you will first install the service as follows:
From the NT Resource Kit, copy the files DATALOG.EXE and MONITOR.EXE to a floppy disk or a network share.
Copy DATALOG.EXE to the %SystemRoot%\System32 directory on each server where you want to monitor the service.
On the server where you want to install the service, open a command prompt and run C:\> MONITOR SETUP to install the service. The service is set to a manual startup, which is appropriate in most cases.
The service is now ready. You must now prepare the Performance Monitor for collecting data. These steps are similar to the steps in the first method used to log data, with a couple of exceptions:
You want to visit the actual server where you are going to collect the data. On the server, start the Performance Monitor and configure all your settings. Save the settings to a *.PML or a *.PMW file. Save the files to the %Systemroot%\System32 folder. Don't forget to consider the disk space that will be used for the collection process.
At a command prompt from any machine, enter C:\> MONITOR [\\COMPUTERNAME] [MYFILE.PML]. COMPUTERNAME is the name of the computer where you want the service to run. The MYFILE.PML or MYFILE.PMW is the configuration file that you created in Performance Monitor.
Start the service.
To start the service, you use the MONITOR \\COMPUTERNAME START and STOP commands. You can control the DATALOG.EXE service of several machines from a single server or workstation that basically acts as a scheduling machine. Using the Command Scheduler from the NT Resource Kit, you can set the services to start and stop at certain times throughout the day.
For the sake of preserving disk space, you probably do not want to run the service for an entire 24-hour period because the systems are usually not busy during non-business hours. You should pick particular times when the system is expected to be busy or when you are noticing regular performance problems. Keep in mind a couple of things about the scheduler:
If the server running the scheduler is down for any amount of time and passes the time when one of the services was supposed to stop or start, the scheduler will not perform the scheduled task. The service only performs tasks as it encounters them. If for some reason, it never encounters the time when the action was to be performed, it will not perform the action during that cycle.
Scheduling too many jobs too close together can cause the scheduler to miss a job. This is a manifestation of the first point. The scheduler waits for a job to start, and if a time passes before it returns to see what task is next, it will miss a job.
Storing Log File Information and Relogging
As you collect information on a regular basis, you will want to review that information. In addition, you need a method for storing the information. For example, you might be collecting the statistics of a particular server on a daily basis. However, you do not want to stockpile logs of data. This situation is when relogging becomes important. When you log information to a file, you can read it back into Alert view, Chart view, and Report view. However, you can also relog the data by reading it into the Log view. This arrangement offers the following benefits:
You can reduce the size of the logged data by adjusting the objects you want to save. Select the objects from the available objects that appear in the original log. Only the objects selected are collected to the new log.
You can reduce the size of the logged data by adjusting the time interval for collecting. If originally you were collecting the data at 1-second intervals, you could set the new log to collect the data at 15-second intervals. You get every fifteenth point from the original log, which is usually sufficient for standard record keeping practices. Of course, if you have some event of interest in the log, you want to preserve the interval.
You can adjust the time window. That is how you limit the data screening from one log to the next to the data collected within the smaller time window that you specified. Adjusting the time window is an excellent way of collecting only the data of specific interest. Perhaps you examined the logs and noticed a 5-minute interval in which the processor was at 95%. You might want to save this time slice and compare it to future similar events. Also, if you were doing a simulation, you might want to save particular segments of the simulation as notes about how the system would react under particular circumstances. Then, if you would run across similar activity in the production world, you could compare your notes and diagnose the situation.
Of course, implementing relogging implies that you are examining the data on a regular basis, which is the topic of the next section.
You may be collecting data for your capacity planning on a regular basis; however, you cannot forget to analyze the data. This is clearly an important step but one that takes some discipline to achieve on a regular basis. Before getting into the technical methods for collecting data, you should understand the general methodology for analyzing the data:
Filter log file data with alerts.
Adjust the Performance Monitor time window.
Perform a preliminary data analysis.
Export the data for critical analysis.
Filtering Log File Data with Alerts
After you have the data you want in a log, start your analysis with the Performance Monitor Alert view. Often, only some of the data points within a log are interesting enough to warrant your attention. Ferret out the interesting points with alerts. The alerts that you will start with will probably resemble the following:
Process Object : % Processor Time > 80%
Memory Object : Pages/sec > 16
Physical Disk Object : Disk Queue Length > 2
Network Segment Object : % Network Utilization > 50%
The preceding items should look familiar to you. They are the basic objects and counters for most of our initial analyses throughout the book. Some other counters may interest you, depending on the type of software you have installed. After you have a nice set of alerts, save them in the Performance Monitor to a *.PMA or *.PMW file. The *.PMA is a Performance Monitor Alert file. The *.PMW is a Performance Monitor Workspace file. The difference is *.PMA files will only save information regarding the settings for the alerts, whereas *.PMW files will save the settings for all views within the Performance Monitor.
Adjusting the Performance Monitor Time Window
Write down a few of the alerts that are revealed. Adjust the time window around the alerts to get 100 data points. You can figure this out because you will know the collection interval, and the time window displays the time stamps for each data point. Let's say that you were collecting a data point every 15 seconds. You would adjust the time window start and stop times so that they were 150 seconds apart.
Remember that the Chart view can display only 100 data points across, which is why it is important to adjust the time window to view only 100 data points. Also, understand that the time window displays time stamps to the hundredth's of a second. Don't confuse the seconds with minutes or hundredth of a second.
Performing a Preliminary Data Analysis
After you have a nice section of data, you will look at it with a critical eye. Using all the techniques in this book, determine whether the data you see is a critical problem or simply a manifestation of some user performing an unusual activity. In addition, you'll look at the intervals to see which resource is being used to a critical limit and why. Even if a sign is just a fluke, it is good practice to perform a quick process of figuring out the bottleneck and the possible cause. This exercise will keep your skills sharp.
Exporting the Data for Critical Analysis
For more serious analysis, you might want to export the information to another file type so that it can be loaded into a database, spreadsheet, or statistical package. This step will allow you to analyze the data in a more critical manner. Often, one of the easiest options to use is a spreadsheet program. To get the information to another program, export the data from the log to a tab-delimited file that can be retrieved into almost any package. The steps are simple:
Reduce the data set to a section that interests you by adjusting the Performance Monitor alerts and the time window. Of course, if you are looking for more general descriptive statistics on the usage patterns of your system, you will use the entire data set.
Click File and Export from the Performance Monitor drop-down menu. Give the file and name, and click OK (see Figure 10.4).Figure 10.4: Exporting the Performance Monitor data to a tab-delimited file.
Now, choose an appropriate tool for editing the file, such as a spreadsheet or a word processor. You especially want to edit the information if you are going to import it into a statistics package or database. The Performance Monitor exports a lot of header information with the data (see Figure 10.5), which could interfere with the import functions of other packages.Figure 10.5: Removing extraneous header information from Performance Monitor exported data in a spreadsheet or word processor.
After you have successfully cropped the header information, you will also check the data points themselves. Performance Monitor is only a software application. It can have errors in retrieving data from the local or remote machine's Registry. You will do a quick search of all the data points for blanks. Also, consider the existence of 0 values. Sometimes, the values are significant, and other times, they simply mean that the system could not retrieve a value. You will have to consider this on a case-by-case basis. In general, for raw counts that are 0, you have to consider what you are counting and whether it is reasonable to have a 0. For example, it is certainly plausible to have a Disk Queue Length of 0 at a particular point in time. You may also notice that the number of users connected to SQL or to an Exchange post office is 0. However, it is very unlikely to see Available Memory at 0, which would be a situation of severe consequences.
After you have sorted the data, you can load it into your favorite statistics package. Then, perform your analysis. For a taste of what types of statistics you might perform, here is a list I have used in the past:
Descriptive statistics on users connected to Exchange, SQL, and IIS servers. These are the basics of figuring out system usage patterns.
Regression analysis (correlations) on various memory and processor counters. This is a discovery session for learning how certain counters and NT components are related.
Trend analysis. This is a common step for determining how memory and disk resources usage trends are changing.
Data analysis is more intensive than the other aspects of capacity planning and performance monitoring in general, mostly due to the amount of time you must contribute to the task. The other tasks involve investments in time primarily to get set up and operational. After that, tasks such as record keeping and data logging are relatively modest in terms of the amount of time you have to spend on them. The analysis will require your attention and sometimes deep thought. I find it best to work on such tasks either away from the office or outside normal hours when interruptions are fewer.
Performance for the Small Business Office
Many of the discussions in this book have considered large enterprises with multiple sites and a number of servers. This will not always be the case. You may be running a startup business, a single office of a larger company, or simply a small business franchise not in need of large-scale computing. In any case, the techniques of this book will still work for you. You still need to make sure that the server you have can run at the best performance. This performance may even be more important when you have only one server. There is no option to offload some of the workload to another machine. The machine you have must do it all. For a small office situation, I have a few extra tips:
Do not ignore capacity planning and regular monitoring simply because you figure your office will never grow.
Analyzing a small system doesn't mean you can forget the scientific method.
Be creative when testing.
There is typically a limit to the amount of performance you get by upgrading a server.
Keep the users informed.
Ignoring capacity planning and regular monitoring simply because you figure your office will never grow is not a good idea. As systems change and the applications that you use change, the resource usage will change. Perhaps the business will alter critical processes that affect the usage of new software or old software in a different way. Without monitoring and capacity planning, you could be caught off guard by the changes. Even in a small business, it is sometimes difficult to predict how changes can affect a system. I suggest reducing the monitoring of the system. I recommend picking one day out of the week to collect data on the system. Rotate this day if possible. If you always monitor Monday, you'll have a very good idea about what occurs on Monday, but not Tuesday. Rotating the days when you collect data will help.
A small business environment does not permit you to omit the proper steps of troubleshooting. The scientific method was not built with a size difference in mind. The same techniques work for small and large businesses.
A small business environment enables you to be creative when testing. You might want to install NT twice on your server. In one situation, you will have a test server, and in another, you will have a production server. They can share the same common boot partition (typically the C: drive), but you will put the operating system partitions on separate logical drives if possible. This might cost you some money for a little more disk space, but it is cheaper than buying a second server. In this situation, you can install software to the test version of NT and mess with it without hurting the production version. Of course, this means that you need to do some off-hours work. However, you still should fight the temptation to test on the production server. Eventually, you will cause a problem.
There is typically a limit to the improvement in performance you will see by upgrading a server. At some point, after you have added several major applications and your server is starting to feel the sting of resource contention, you might want to consider buying another machine instead of pumping more resources into the existing server. Even a cheaper or low-powered machine can be a greater benefit than adding more memory or another CPU to a system. This is especially true if you are looking at adding more server applications. Even in a small office scenario, SQL Server will run better on its own than when sharing resources on a high-powered server that also doubles also as a file and print server.
Most importantly, in a small business environment, you need to keep the users informed. You probably don't have a lot of users, but you need to make sure you keep them informed of your efforts, anyway. Large or small, periodic changes to the system can annoy users. However, I have found that if you keep them well informed of the benefits of optimizing the system, periodic intensive systems data logging, and configuration changes, the users will find it easier to accept. In a small company, you usually don't have any bureaucracy to hide behind.
You'll find that, as an administrator, the small business is usually a good place to learn. Often, you will have the chance to sit down with a book such as this and try a lot of the suggestions and examples. Make a couple goals for yourself and improve your own performance as well as your system's.
Microsoft Operating System Options
When selecting Windows NT, you have several options. You must figure out how you are going to use the server prior to actually purchasing the hardware and software. You must consider how you are going to use the server now and later down the road. Considering this will help you configure a machine that will be useful past its original purpose. When considering a Microsoft operating system such as NT, you should consider the variety of ways to purchase it. Each way has its benefits and limitations. The sections that follow discuss the variety of ways NT is packaged and include some comments on why you might make each selection. When purchasing any of these packages, you should always consider how the company is going to grow and not just the immediate needs.
Windows NT Workstation
No one said your system has to be big to be effective. For a small office of fewer than 15 users, Windows NT Workstation may be a viable solution for a server. NT Workstation will provide many of the security features and other features of Windows NT Server without the big price. Also, the resource requirements of the system aren't quite as intense. Windows NT Workstation has a limitation of 10 concurrent user connections. If you are purchasing a system so that users can connect to it and stay connected, and if you have more than 10 users, consider NT Server instead. In addition, if you are looking for more than file and print services, be cautious about purchasing Windows NT Workstation. Many server-based components detect that they are running on NT Workstation and refuse to install.
Windows NT Server
By itself, NT Server is an excellent choice for almost any type of service. NT really shines as an application server for other products such as Web, database, and email services. How many users NT will support is primarily dependent on the nature of the services and the number of users; this determination of course is the whole nature of optimization and is the topic of this book.
Small Business Server
Microsoft put together a nice bundle of services built on top of Windows NT Server in a single product called the Microsoft Small Business Server (SBS). This product is suitable for businesses with 25 users or fewer. A software limitation on the server keeps it from supporting more than 25 users. However, if the numbers are right, SBS can be a real value for a business. SBS offers NT Server as well as Exchange for email, IIS for Web services, and SQL for database products. Keep in mind the 25-user limit when considering these other powerful components. In addition to these components, SBS includes components geared toward Internet connectivity and communication. One product sadly not found in BackOffice or NT by itself is the Shared Fax Services. This service allows users to send faxes via Exchange through the NT Server and a single fax card. This product is also provided by third parties at quite a cost, in most cases.
Microsoft BackOffice is a suite of products like Microsoft Office. Whereas Office contains Word, Excel, and PowerPoint, BackOffice contains Exchange, SQL Server, Systems Manager Server, IIS, and SNA Server. This extremely powerful combination of software can fulfill many needs of a medium-sized business. Larger, more complex companies often buy the individual products and put them on NT, due to the often highly intensive nature of a larger company's resource usage. Any company with 100 to 1,000 users should consider the BackOffice suite as an excellent beginning.
Additional Resources on NT Optimization
Is this book the end-all-be-all of optimization? I daresay my credibility would suffer greatly to even suggest it. I have provided you with all the information you need to take a hard look at Windows NT and its internals. However, you have much more than simply NT to worry about. You must consider Web services, database services, email servers, and other application and networking services that can run on Windows NT. Many services carry their own set of objects, counters, and methods of optimizing. With the methods outlined in this book, you can easily work with many other products, but to get deep into those other products, you need some other resources.
Many other books offer valuable NT optimization information; however, you should also consider several Web sites for some excellent information. Some tried and true sites include:
http://www.winntmag.com. Windows NT Magazine's Web site as well as the periodical itself, have proven worthy of mention. Despite the name of the magazine, it also provides fair presentations of non-Microsoft products that run on NT. More than once, I have relied on and been satisfied with the analysis and comparisons of the various software products examined by the folks at Windows NT Magazine. The well-developed Web site contains a host of information regarding past articles and software comparisons.
http://www.infoworld.com. Another magazine also worth mentioning, InfoWorld has presented a large variety of ideas consistently throughout its publication. It has a fine staff of professionals who analyze software and the industry in general. Although the magazine and Web site are more industry-news related, they still can be mined for some good analysis of emerging technologies.
http://www.sysinternals.com. If you are looking for a tool to do a nifty little job on NT and it's not in the NT Resource Kit, visit this site. These folks have put together an excellent collection of NT tools and tips that are beyond the norm of freeware and shareware.
http://www.jerold.com. This guy has put together an impressive collection of how-tos and tips. Although he is a little overzealous in the promotion of his own site, it certainly deserves a look. Solutions to many commonly reported problems (and a few rare ones) are posted on this site.
http://www.compaq.com. These guys know their hardware! At their site, you will find all sorts of white papers on the newest chip technologies, integration of the technology into servers and workstations, and how operating systems (NT included) work with the new technology. In addition, they have performed a lot of analysis on their own hardware that can be very informative about how NT works in general.
I am sure that even more sites offer excellent information if you can stomach a little Web searching. Most of the products I have noted are Microsoft BackOffice products. With this in mind, I want to offer a few Microsoft-specific resources.
First off, buy TechNet. It is one of the best resources you can get for a decent price. For a few hundred bucks, you get a stack of CDs with the following items:
Current NT/Win95/Win98 Resource Kits
Office/IE and other Resource Kits
All the service packs on CD
A software library of all sorts of tools
The Resource Kits mentioned have electronic versions of the books on TechNet, as well as copies of all the latest tools. For the price (currently $299.00 U.S.), it is hard to beat. Consider also that when you call Microsoft's front-line support, they generally check TechNet for your problem. If they find it, they'll ship you the article and charge you a service fee.
Another subscription of excellent value, but a little narrower in its scope, is the Microsoft Development Network (MSDN). This CD set in its most basic form contains several CDs with all sorts of sample, solutions, and documentation for writing code for the Microsoft platform. There is some overlap between TechNet and MSDN, but MSDN will contain much more programmer-oriented information, so it isn't for everyone. However, if you are doing some serious development work internally or externally, you'll want to consider at least the basic subscription to MSDN. The various levels of MSDN subscription range from the basic to the universal. I won't bore you with all the details, but the universal subscription contains all the MS platforms, BackOffice products, and development platforms.
This is all well and good, but where are the freebies? Those more monetarily challenged may try the Microsoft Web site (http://www.microsoft.com). Microsoft has posted a significant portion of the MSDN library on the Web site, although searching there is more tiresome than using the CDs.
One newsgroup to consider is msnews.microsoft.com, hosted, obviously, by Microsoft. You can typically post to the newsgroup and get an answer to your question. Microsoft engineers do not answer the questions, but a host of other people, such as consultants and other professionals in the industry, review the newsgroups on a regular basis. It's free and if you have Internet access, you can usually gain access to a newsgroup. Other Internet service providers, such as AOL and CompuServe, also host forums where you can discuss technology issues. In any case, it's free, and if you have access, you can often get an answer to an important question.
In this farewell chapter, you have reviewed the various performance tips that were previously mentioned throughout the book. In addition, you have had the opportunity to learn more about methodology and the application of all that you had learned in previous chapters. You were able to take your knowledge one step further and apply it to the process of capacity planning. Capacity planning was the process of staying ahead of the game. The process involves obtaining baselines for performance under controlled conditions. I also presented techniques for data collection.
I have also given you a series of resources where you can find other information. Newsgroups are a favorite resource of mine because I can get answers as well as share my own experience. Finally, I offered more information on the choices available to you when you are purchasing Windows NT. You can purchase a Windows package fit for an enterprise of any size.
About the Author
Mark T. Edmead is president of MTE Software, Inc., a San Diego Microsoft Solutions Provider specializing in Windows NT BackOffice consulting.
Paul Hinsberg, MBA, MCSE, is the owner and operator of CRDS Inc., a computer consulting company in the Silicon Valley region.
Copyright © 1998 by New Riders Publishing
We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice. International rights = English only.
International rights = English only.