Chapter 5 Keeping Connected

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.
On This Page

Troubleshooting and Performance Tuning Windows NT
Read This First
Performance Tuning in Windows NT
Windows NT Configuration Registry
Other Tools
Resources
Crash Recovery
Troubleshooting Hit List
Conclusion

Troubleshooting and Performance Tuning Windows NT

When you have finished reading this chapter, you will understand:

  • The principles of preventive maintenance

  • Performance monitoring and tuning procedures

  • Basic mechanisms of Windows NT troubleshooting

  • Windows NT Registry

  • Tools provided with Windows NT 4.0 Workstation and Server

  • Third-party tools and resources

  • Getting technical support

    You are not expected to feel comfortable facing the diagnosis of a fault in a Windows NT system on your own. No competent technician ever feels so confident. But you should feel comfortable taking a crack at it. You will understand the preventive maintenance techniques that will help you avoid trouble whenever you can, and you should know when to cry "uncle!" and call for professional help.

Read This First

The odds are quite good that if you've turned to this page, you're faced with a system that is not operating as it should and you are desperately seeking help. This is the worst possible time to read about troubleshooting procedures, but we're all too aware that it's often the only time we do. If you look carefully at the edge of the book, you will see that some pages have been tinted. These pages, later in this chapter, constitute a troubleshooting section listing the most common errors in Windows NT, their symptoms, and the steps you need to take to correct them. So, read the rest of this paragraph and then go ahead to the colored pages and the best of luck to you. But when you've finished that, when your bug is fixed, come back here and read the rest of this chapter because it will tell you how to avoid having to go through this again.

The preceeding sentence will strike some readers as an appallingly bad joke. It is not!

In many situations a complex piece of equipment or complex piece of software (such as Windows NT) is installed by someone whose most urgent consideration is bringing the thing up as fast as possible. Once installed it will run until it breaks, at which time that same individual will be desperately looking for help, and that's the reason for that first paragraph. But those who have taken the trouble to read a chapter like this ahead of time will know that there's a much better approach. This approach, taught by the United States Air Force among others, is called preventive maintenance or PM. The principle of PM is simple: Don't wait until the system breaks—fix it before it breaks. Replace parts that you know will wear out before they wear out.

How do you find out which parts of the system are wearing out and need replacement? By applying actuarial statistics and the mathematics of fault prediction (see Appendix 6 for details). Basically, you need to keep a maintenance log for the system, recording how performance varies over time, along with the date and time of any failures. By examining the log, you can generally predict the overall reliability of the system and perform maintenance tasks in advance of an actual failure.

There's a second benefit to PM. Because it forces you to undertake regular, scheduled maintenance, it also gives you the foundation for performance tuning—keeping throughput as high as possible by "tweaking" the system to eliminate bottlenecks. Windows NT gives us some particularly sophisticated tools with which to determine system throughput. For example, it's not necessary to go through any complicated calculation to determine the packets per second the server is handling. It is necessary only to go to the Performance Monitor and look at it. With this theory under our belts, we'll now take a look at the specifics in performance turning and troubleshooting in Windows NT systems.

Performance Tuning in Windows NT

As discussed in Appendix 6, the overall throughput of a system is an end-to-end process, a chain in which total system throughput is no greater than the throughput of the slowest individual component. So performance tuning generally amounts to the process of determining this component, referred to as a bottleneck that's "bogging" the system, and increasing its throughput either by changing system settings or by replacing the component with a faster one. In individual Windows NT systems the components that can be performance tuned (aside from components that will be tuned to suit individual preferences, such as the keyboard and the mouse) include the central processor, memory, disk, video, and network.

General Methods of Performance Tuning

The principal tools an administrator or technician will use to perform routine performance monitoring/tuning on Windows NT systems are the Performance Monitor (covered in Chapter 3), Configuration Registry Editor (covered later in this chapter), and Event Viewer (see Chapter 3). For version 4.0, Microsoft introduced a powerful new tool: Network Monitor (covered in this chapter). In the sections that follow we discuss which performance monitor counters to track, what threshold values to look for, and what steps you should take when a threshold value is reached. In some cases there will be little that you can do short of moving the user to a faster machine, for example: if you detect a CPU speed bottleneck. In other cases it may be possible to modify various Window NT configuration values to produce a performance improvement. You will generally do so using the Windows NT Configuration Registry Editor (a.k.a. REGEDT32.EXE), illustrated in Figure 5.1.

Be forewarned that the Configuration Registry has some features in common with a nuclear reactor. It is potentially an immensely powerful tool. It is also fairly dangerous. No, it won't irradiate you, but if it's not used with care, it can render a system unusable (effectively irradiating your career!). So always take great care when making a configuration change using the registry. In particular, make sure you have the emergency repair diskette for the system you are working on close at hand.1 (This diskette is created during the installation process and may be recreated or updated using the Rdisk utility described later in this chapter.)

Performance Monitor

In what follows, we constantly refer to Performance Monitor (see Figure 5.2) objects. To review (Performance Monitor is covered in detail in Chapter 3), these are selections from the Objects pull-down list that appears in the Add to Chart (or Add to View) dialog box after you select Add to Chart (or Add to View) from the Edit menu. The pull-down lists all system objects that have registered themselves with the Performance Monitor service. Each object has an associated set of counter variables that can be charted or on which alerts can be set. In the sections on subsystem tuning that follow, we refer to these counters and to their parent objects.

CPU Tuning

Since the central processing unit (CPU) is the "brains" of the system, it is not surprising that monitoring CPU performance is one of the most important functions an administrator can undertake. Windows NT provides a very high degree of capability to monitor the CPU, including measuring total CPU utilization, percent of time in privileged (operating system) mode, percent of time in user (application) mode, and frequency with which the system is context switching between tasks. All of these measurements can be extremely useful, and most can be monitored not only for the entire system but on a per-processor basis on symmetric multiprocessor (SMP) machines. The relevant counters to monitor for the System Object are:

  • % Total Privileged Time—This is the percentage of the total system time (time for all processes in the system) that is being spent in "privileged" (that is, in operating system) mode. This measurement generally is a reflection of how much time the system is expending performing system-level tasks such as disk I/O and video display operation. If the system is bottlenecked at the CPU and this counter is high, there is a configuration problem in your system. To diagnose the problem further, see %Total DPC Time.

  • *% Total User Time—*This is the percentage of system time that is being expended running user-level or application code. If the system is bottlenecked at the CPU and this counter is a high percentage, it may be possible to improve performance by changing the way applications are being used on the system. You can consider having in-house vertical applications rewritten in a more efficient way, for instance, or you may want to examine the way a user is operating on the system to see if some additional efficiency can be achieved.

  • % Total Processor Time—This measurement indicates the percentage of system time the processor is spending doing useful work and is effectively the total of the percent privileged time and the percent user time. When this figure approaches 100%, it indicates that the processor has become a bottleneck in the system. Windows NT will then be forced to suspend certain tasks to give others time to run, and the system will slow down in much the way a time-sharing system slows down when too many users are logged into it. At this point, you have two alternatives: increase the number or speed of processors in a scalable processor system or move the user or server, as the case may be, to a faster CPU.

  • % Total DPC Time—This percent measures the time the processor is spending in Deferred Procedure Calls (DPCs). DPCs are mechanisms for efficiently handling interrupts. Rather than executing interrupt code immediately, NT may elect to handle it in a DPC. DPCs run at a lower priority than hardware interrupts, so deferring execution can allow higher interrupt rates to be handled, but a very high interrupt rate can still bog the processor. Related counters worth checking include Processor Queue Length and Interrupts/sec.

  • Context Switches/Sec.—This counter indicates how frequently Windows NT is performing a context switch between tasks. By default, Windows NT will task switch several times each second to give every task in a system a chance to run. If this counter become very high (around 1,000 context switches per second), it may indicate that Windows NT is blocking on one or more shared resources in the system—quite possibly a video resource. To diagnose this, observe the % Total Privilege Time and % Total User Time counters of the System object. If both of these are at or near 50% and the total processor time is at or near 100%, multiple threads within the system are contending for a single shared resource and are doing so with such frequency that the resource can't keep up (a form of contention, a topic described more fully in Appendix 5). This can happen, for example, if intensive use is being made of a video application and the video card is not fast enough to keep pace.

  • Processor Queue Length—This measurement indicates the number of threads queued for execution on a processor (you must also monitor at least one Thread counter to generate Queue Length data; otherwise, it always indicates zero). Sustained values higher than two indicate congestion. You'll need to identify which process is causing the congestion, then reconfigure the process, switch to a faster system, or (if you have the capability) add a processor to your system.

  • *System Calls/Sec.—*This counter indicates the frequency of calls to Windows NT system routines—not counting the graphical routines. If the preceding values are high—including Processor Queue Length and % Total Privileged Time at or near 50% and % Total Processor Time at or near 100%—but the System Calls/Sec. is low, in all probability, you have a video problem, particularly if you are running graphically intensive applications. See the section on video performance troubleshooting for more information.

  • Total Interrupts/Sec.—This counter indicates the rate at which interrupts are being generated by hardware in the system for all processors. This indicator should tend to closely track with the System Calls/Sec. (with the exception of high mouse, keyboard, and serial port activity). If it does not, it may indicate that some hardware device is generating an excessive number of interrupts. Attempt to determine whether the device in question is the video card, the network interface card, the hard disk driver, or perhaps some other device, such as the mouse.

  • % Registry Quota in Use — This indicator shows the percentage of registry quota currently in use by the system. This is a critical item to monitor on Primary and Backup Domain Controllers (PDC/BDC) because user accounts, system policies, and related information can cause a registry quota to become exhausted, especially on large networks. If this value begins to approach 100%, it's time to increase the total registry size (set in Control Panel/System's Virtual Memory tab). If this happens on an NT Workstation (or a Server not functioning as a PDC/BDC) you probably also want to examine the Registry to determine why it has grown so large.

Like the System object, the Processor Object provides indications of % Privileged Time, % Processor Time, % User Time, and Interrupts/Sec. However, it does so on a per-processor basis rather than on a system-wide basis. On a single-CPU system, the Processor counters should yield the same results as the System counters. On a symmetric multiprocessor (SMP) system, the Processor object will have multiple instances, and you can examine these instances (in particular, % Processor Time for all processors) to check the load balancing of applications across processors. All processors in the system should tend, on average, to report approximately equal utilization. If this isn't happening, you likely have a problem with one of your processor boards (or if you observe an imbalance only when running certain applications, such as older versions of Microsoft SQL Server, it may be a programming problem) and you need to investigate further.

Floating Point Performance

Unfortunately, Windows NT does not provide a direct counter for floating-point (FPU) operations, which would be useful in determining whether the system is being bogged by floating-point performance when running applications such as computer aided design (CAD). However, in general, if a system is performing an application known to be floating-point intensive and is indicating a CPU bogging condition (% Processor Time at or near 100%) with no other indication of a bogging condition (such as a high number of System Calls/Sec., high number of Interrupts/Sec., etc.), the odds are quite good that the system is floating-point bogged.

You need to investigate to see whether the system in question, in fact, includes floating-point processor hardware.2 No 386-based or 486SX series Intel computers have built-in floating-point hardware, but all 486DX computers, all Pentium and Pentium Pro processors, and most RISC processors will have it built in. If a user is experiencing a CPU-bogged condition of this type and is operating on a 386 or a 486SX workstation, you may want to consider moving that user to a 486DX, Pentium, or RISC-based workstation to see if the problem clears up.

Windows NT does provide a performance counter for floating-point emulation; it's the System object's Floating Emulations/Sec. If the system shows signs of processor bogging (high %Total Processor Time) and Floating Emulations/Sec. is high, you are running a floating-point-intensive application on a processor that lacks hardware floating-point support.3

Memory Tuning

The Memory Object has the following counters to monitor:

  • % Committed Bytes in Use—New for NT 4.0, this counter gives the ratio of Committed Bytes/Commit Limit, expressed as a percentage. If you observe virtual memory thrashing (see Pages Per Second, below), this is the counter to check: if it's running close to 100%, you need more memory! See the next item for an explanation of committed memory and the commit limit.

  • Commit Available Bytes, Committed Bytes, and *Commit Limit—*These three counters indicate the state of the virtual memory management subsystem. Commit Available bytes is an instantaneous indicator of the available virtual memory in the system (i.e., virtual memory not being used in the system). This value fluctuates with time and is interesting to monitor but does not provide a reliable indicator of total memory available. The Committed Bytes value, on the other hand, is an instantaneous indicator of the total amount of virtual memory committed—reserved memory space for which there must be backing store available. Commit Limit is the total amount of space that is available for committing and is generally equal to slightly less than the size of physical memory plus the size of the page file (just slightly less because of memory the system reserves to itself).

Note:* *If the Committed Bytes counter approaches the Commit Limit, the system is running out of virtual memory, and it will become necessary to expand the page file. You can use this as an indicator to expand the page file manually, avoiding an automatic page file expansion and the associated deterioration of system performance.

  • Pages per Second—This is an indicator of the total paging traffic in the system—the rate at which memory pages are being swapped between the paging file and physical memory. Systems with lots of physical memory will tend to show a zero value for Pages per Second. Systems operating with a minimal amount of physical memory (16MB in workstations, 24MB in servers) will generally show zero Pages per Second in an idle state but may show paging activity (100 Pages per Second or less) as applications are opened and closed in the system. A rise in Pages per Second to a sustained value above 100 indicates a thrashing condition, meaning that the system has reached a state in which the demands made on the virtual memory manager exceed its capacity—so more RAM is needed. Therefore, when the Committed Bytes indicator approaches within 10% of the Commit Limit, begin watching the Pages per Second to see if the system is thrashing.

  • Pool Nonpaged Bytes—This counter measures the total number of bytes in the pool of nonpaged memory. Nonpaged memory is reserved and cannot be paged out into virtual memory (disk space) on demand. In effect, it's the total amount of memory the system is using that must at all times remain in the physical RAM. If this value rises to within 4MB of the total amount of memory in the system (for example, if it rises to over 12MB in a system that contains only 16MB of memory), performance is compromised.

    Whenever an application is launched from Windows NT, Windows NT temporarily requires a substantial amount of space for buffers, for loading subsystems (such as the 16-bit WOW system for 16-bit applications), and other activities. In an instantaneous state wherein less than 4MB of nonpaged pool is available, Windows NT will begin to "swap" severely in an effort to free up enough memory to get a new application started. In this situation, the best thing to do is provide the user with more memory in the system. You can also use this value in conjunction with the Working Set and Working Set Peak counters of the Process object(s) to determine the total amount of memory required by a particular user, which brings us to the Process Object.

  • Working Set—This counter measures the total memory used by an application. It's particularly helpful in detecting memory hogs, as illustrated in Figure 5.3.

This seems as good a place as any to take a bit of time out and explore the entire subject of virtual memory in a bit more detail.

Memory hogs have been, unfortunately, all too common in Windows NT—until quite recently, Microsoft's own 32-bit VC++ compiler for Intel CPUs implemented a run-time memory allocater that did not return memory allocated by applications to the OS unless specifically instructed to do so. As a result, applications could exhaust system memory—even virtual memory—if they continually allocated and de-allocated large memory blocks.

If you encounter a memory hog (the symptoms are obvious: excessive memory paging when you are doing normally innocuous things such as moving the mouse, appallingly low Memory/Available Bytes, appallingly high Memory/Committed Bytes, and possibly also the dreaded "Low Virtual Memory" message discussed later in this chapter), you can determine which application is causing the problem with Process/Working Set, then simply shut that process down. *It is not necessary to restart Windows NT.*4

Virtual Memory and Swapping As described in Chapter 1, Windows NT is a virtual memory operating system, meaning that it can employ hard disk space as auxiliary memory to hold information that is not immediately required in RAM. The strategy that Windows NT uses to do this depends on the operation of several sections of memory known as memory pools in conjunction with the cache manager. To begin with, there is a non-paged pool that stores memory that cannot be paged out to disk—that is, memory required to be immediately on hand in order for Windows NT system components and applications to perform their functions. This memory generally appears to run in a pool of 2 to 3MB in most configurations. There is also a paged pool of memory that is pageable and can be swapped to disk but is kept ready for immediate access. This generally will contain the memory pages that are being most frequently requested by system components or applications. Paged pool may vary in size from a few megabytes up to the total capacity of physical memory, depending upon the configuration and available free space.

Windows NT also caches disk activity within the virtual memory space and can employ up to one-half of the physical memory's space to store disk cache information. That is, on a 16MB system, up to 8MB will be employed for cache. When so many applications and system components are running and requesting memory that the system cannot fulfill those requests from within the range of pages available in the Physical Page Pool—the system will begin to page less frequently used pages out to hard disk, freeing them to fill those requests. This process will continue until the commit limit is reached. The commit limit specifies the total amount of memory that can be committed—that is, for which data space is required in either the physical memory or the virtual memory paging file—without expanding the paging file. When the commit limit is reached, Windows NT will attempt to expand the paging file.

Notice that we have two separate threshold situations involved here where the paging file becomes a consideration. In the first, Windows NT is paging information into the file without the commit limit being affected. In this situation disk I/O is special cased in a manner analogous to that used by Windows 3.1's permanent swap file. That is, if you have a 16MB system with 24MB set as the initial size for your paging file, the commit limit for the memory system will be about 37MB (24MB plus the physical memory, 16MB, less the space reserved for the Paged and Nonpaged Pools, which must be retained in physical memory). Until that commit limit is reached, Windows NT will perform special case I/O—essentially, raw reads and writes within the paged file space—a relatively efficient process. Paging will occur, but the impact on system performance will tend to be minimal.

When the commit limit is reached, however, Windows NT is forced to expand the paging file, and a completely different situation occurs, analogous in many respects to the temporary swap file in Windows 3.1. It is now necessary for Windows NT's system software to carry out create operations in an attempt to find more room on the disk. As a result, once the commit limit begins to increase, performance becomes abysmal. This a situation to be avoided at all costs, particularly in file servers, because it can rapidly reach a point where the system becomes totally bogged and almost useless. But we haven't quite hit the ultimate limit. That happens when Windows NT either reaches the maximum size of the paging file (set in the Control Panel/System/Virtual Memory) or worse, if Windows NT runs out of physical disk space because application and data files on the disk partition containing the paging file don't leave enough room for the page file to grow to its maximum size.

At this point it becomes impossible for Windows NT to fill the application and system requests for memory and you may expect a series of events, beginning with a "System low on virtual memory" alert that will escalate through various error messages until the system crashes. This need not happen. When multiple page files are available, Windows NT will distribute paged virtual memory more or less equally across all of them, allowing for more total paging and improving performance, provided that each swap file exists on a separate physical disk. Note, however, that creating multiple paging files on a single physical disk will slow the system down—page file I/O alternates between two separate locations on the same disk, keeping the disk head in constant motion.

The best performance can be achieved if the page file is on a partition or disk by itself—indeed, the ultimate performance can be achieved if a separate controller is available for the page file because this will allow page file operations to occur independently of other disk operations, which is something to consider when you are setting up large, multivolume file servers.

Why Not Just Add More RAM and Forget About Paging? You might think that the solution to all these paging problems is simply to add enough RAM to the machine to prevent it from ever carrying out paging operations, on servers particularly. We know from experience that this is probably not a wise strategy where Windows NT is concerned. Windows NT has been designed to be efficient—nay, stingy—in its use of memory resources. It likes to run with just a few megabytes of RAM available as a ready reserve pool for emergency use to maximize disk performance, which in Windows NT is outstanding.

Essentially, the Windows NT cache manager takes over as much as possible of the free physical RAM to use for disk caching. Even on systems with what one would expect to be rather large amounts of memory (e.g., 32MB) it turns out to be relatively easy to force Windows NT to engage in some swapping behavior, particularly during application start. When applications are loaded, Windows NT attempts to load the full binary image of the application in memory and in doing so begins to release pages from its pageable pool (with resulting flush operations on the disk cache). This is one reason that first-time users of Windows NT may think it's slower than Windows 3.1 (or OS/2 2.1). It really is slower, where application launch is concerned. Steady-state performance of applications after they're launched, however, is quite another matter.

It's not possible to configure Windows NT so that it won't engage in this behavior (although you can minimize it by adjusting the Control Panel/Network/Server configuration). As long as sufficient virtual memory is available to handle peak cache loads without exceeding the commit limit, this doesn't have any significant impact on performance. In fact it will not be noticed at all unless applications are continually started and stopped. Applications that just run in a steady state for the most part will be completely unaffected—indeed, they benefit from significantly higher effective disk performance because of the large disk cache size.

The one major performance situation to watch out for is that in which page file limits are not sufficient and Windows NT starts raising the commit limit. To avoid this hazard run Windows NT systems during a burn-in period for the first few days (or weeks) of operation, observe the commit limit, and note any increase in the page file size. If the page file size increases over and above the preset size during the burn-in, reset the Initial Page File Size in the Control Panel, increasing it by 20%. This strategy will take care of most peak loading situations, give you a little "head room," and minimize any performance impact due to further page file growth. You needn't do this if Windows NT did not expand the page file during the burn-in period, because it's probably already big enough.

In either case, observe the Commit Limit using Performance Monitor. Add 10% to that value and set it as a performance monitor alert on all servers and workstations. As an example, if the Commit Limit is 60MB, set an alert at 66MB. Make sure, of course, that the maximum page file size is more than 66MB and that there is sufficient free space on the partition containing the page file to store the additional space if it becomes necessary.

The steps outlined will, essentially, set a trip wire. When the system begins to expand its paging file, as soon as that 10% threshold is crossed, the alert will be transmitted and you'll likely have a chance to react to the problem. You'll want to react quickly, particularly if it happens on a server. Expansion of the commit limit doesn't indicate an imminent crash, but it indicates a fairly severe problem that will become very severe if you leave it alone.

Paging on Workstations Paging on workstations is a little different. The most common situation encountered is one in which a Windows NT workstation over time starts seeing a sufficient load so that the page file starts to increase, and an adequately configured system starts subjecting its user to severely frustrating behavior because whenever the user does anything, the page file grows (with associated thrashing).

Again, you can anticipate a problem situation by setting an alert based on a 10% growth in the Commit Limit. This isn't a crisis. For example, you have a basic Windows NT workstation outfitted with what would appear to be plenty of memory, say 24MB or double the 12MB Microsoft recommends. Initially, the system's user will to be delighted with its performance and may be running a suite of applications. Initially, the user will employ the system very much the way one would Windows 3.x. That is, the user will perform task-switching rather than multitasking on the system.

Over time the user finds that it's more convenient to start all of applications first thing in the morning, iconize the ones not immediately being used to buttons on the task bar, and just work away with the one on top, switching from application to application with the taskbar buttons as needed. This works fine, of course, in Windows NT. It is a preemptive multitasking system, and the intelligence built into NT's virtual memory subsystem is such that the applications that are iconized (and not in use) take up a minimal amount of memory.

At some point however, your user will find the threshold for the commit limit, regardless of how high you set this initial threshold. Even on a system with plenty of memory and a Pentium Pro CPU, which you would expect to be an excellent performer, you will find that System/% CPU Time is relatively low but that System/Pages per Second is intermittently hitting a relatively high value (in the hundreds of Pages per Second, at least) nearly every time a new application is started, often when an application is closed.

Problems occur because the working set for the user's applications exceed the memory available in the system with the page file at its default size. Windows NT then starts expanding the page file. It does this in a very stingy manner, expanding only a little bit at a time, which means it buys just enough room to have the crisis come again 10 seconds later (it would be awfully convenient if the system were designed so that administrators could selectively control the growth of the paging file or cause an alert to be displayed suggesting that the user resize the paging file or call the administrator).

Unlike the server situation, in which this condition presages a crisis, for end users it's probably not urgent. Moreover, it's likely that the Commit Limit problem will grow slowly over time. Because Windows NT workstations can be inspected remotely, you can sit on any workstation, and (with administrative privileges) open a Performance Monitor session on any other user's station. The most desirable approach is probably to log Commit Limit and Working Set sizes for users on an infrequent basis, such as once a week, observe users who are approaching their commit limit, and (when time is convenient) expand their page file for them. In this way they will never see the problem. You can also take advantage of this situation to observe the free space availability on the disk that holds the paging file and suggest that users move files as necessary to save enough room for the page file in case it needs to expand. In this way you achieve invisibility, that ultimate goal of administration discussed in Chapter 3.

Controlling Memory Use In most respects, Windows NT is a self-tuning operating system. At installation, certain configuration settings will be made to optimize performance for the amount of memory in the system. In most cases, these settings will provide the best performance, but there are exceptions.

By default, Windows NT Servers run a Large System Cache model, in which all available RAM not otherwise used by applications or the system is available for disk caching. Windows NT Workstations, by contrast, run a Small System Cache, in which the cache manager will page out least-recently-used memory in an attempt to keep 4MB of RAM free for application launch.

In some circumstances, you may want to change this behavior. For instance, if an NT Server is being used in nondedicated mode by someone running it as a desktop system, using the small cache model may speed local application performance (at the expense of Server performance). Likewise, NT Workstation users who spend most of their time running a preloaded set of applications, but rarely launching new ones, may benefit from a large cache model (especially on systems with limited RAM).

To control which model is set, use the NT configuration registry editor, and reset HKEY_LOCAL_MACHINE \System \CurrentControlSet \Control \Session Manager\Memory Management\LargeSystemCache (this is a REG_DWORD value). A value of 1 sets large cache mode, a value of 0 sets small cache mode.

Virtual Memory Settings Aside from large/small cache mode, you can tune NT's virtual memory subsystem using the Control Panel/System icon's Virtual Memory settings (see Figure 5.4). This lets you set the initial and maximum page file sizes, determine on which disk(s) page file(s) reside (as mentioned earlier, systems with multiple physical disks can benefit from having multiple page files), and control growth of the Windows NT configuration registry database.

Control Panel/Server To further refine NT memory use, you can select any one of the four optimization settings for Server operation (Minimize Memory Used, Balance, Maximize Throughput For File Sharing, and Maximize Throughput For Network Applications) from Control Panel/Network's Services tab. Select Server from the list of installed software and then click the Properties button, as illustrated in Figure 5.5. The first setting is obvious; it is designed for a maximum of 10 network connections and is suitable only for lightly used workstations. This setting should never be selected on a file server (unless it's doing local file services on a very small—10 clients or fewer—network). The Balance setting allocates memory for up to 64 sessions and is useful primarily for departmental servers. Maximize Throughput For File Sharing allocates as much memory as is required for file sharing (it has no inherent upper limit) and is the basic setting for Windows NT Servers. Maximize Throughput For Network Applications de-tunes the Windows NT Virtual Memory System to be less aggressive in reserving physical memory to provide a buffer for application launch. This setting reduces swapping in systems and is a good choice for servers that run primarily network applications (such as SQL server). Indeed, this is probably the optimal setting for Server installations that have adequate memory (greater than 32MB).

Video Performance

The Windows NT Performance Monitor includes no specific video object. It is nonetheless possible to get an indirect indication of video activity in the Windows NT system. If the majority of video activity is in text-mode, the best way to do this is with the Process Object, which has the following counters of interest:

CSRSS is a Windows NT Executive subsystem that carries out graphical activities on behalf of text-mode applications (in versions of NT prior to 4.0, it did so for graphical applications as well). It contains one thread for each application. If CSRSS % Processor Time is continually absorbing a very high proportion of the overall system activity—that is, if one observes a high percent of processor time on the system and then traces this high percent processor time to CSRSS—in all probability, your system is being bogged by excessive text display. If you have multiple open text windows that display fast-changing data, consider minimizing them or at least reducing their size.

For graphical applications, the situation is more complex. With NT 4.0, Microsoft modified the video architecture to eliminate CSRSS as an intermediary process for graphics-based applications. All graphic operations are now carried out in the NT Executive. You can, however, get an indication of video load on your system by tracking the System Object/ % Total Privileged Time. If that's consistently very high while graphical applications are running, you are probably being bogged by a slow video card.

Disk Performance

Microsoft recommends monitoring two counter values when you attempt to determine disk performance. The first is Average Disk Sec./Transfer from the Logical Disk Object on any logical disk. The second is Current Disk Queue Length. Average Disk Sec./Transfer gives a direct measure of disk access speed, although determining a transfer rate will also require you to look at the Average Disk Bytes/Transfer to estimate the size of the block being transferred. Current Disk Queue Length gives a direct indication of the number of disk transfer requests that are being stored temporarily because the disk is unable to respond to the request. A sustained Current Disk Queue Length above one probably indicates that the disk is becoming a bottleneck in the system.

Note:* *It is not possible to measure any of these values without turning on disk counters (see the next paragraph).

Because disk performance monitoring incurs a 10% to 15% overhead, it should not be permanently turned on unless it's absolutely necessary. Disk performance monitoring is something that you want to do only during maintenance intervals or when problems are suspected. It may be left on permanently on servers if, in fact, you can determine that a 10% disk performance hit will not materially affect overall responsiveness of the system. NT 4.0 implements an enhancement to disk counters that allows performance of individual drives in a RAID array to be measured and allows you to turn disk counters on for remote systems on the network. To see the options, type *diskperf -?*at the command prompt. It will show the following display:

DISKPERF=====================

Starts and stops system disk performance counters.

Used without the command switches, DISKPERF reports whether disk
performance counters are enabled on the local or specified computer.

Enhanced Disk performance counters can be specified to report the
performance of the individual physical drives in a software striped
disk set. Normally software striped disk sets are reported as a single
logical and single physical drive. Note that when using the Enhanced
Disk performance counters, the Logical drive counters will not be
correct when measuring software striped disk sets.

DISKPERF [-Y[E] | -N] [\\computername]

-Y[E] Sets the system to start disk performance counters
when the system is restarted.

E  Enables the disk performance counters used for measuring
performance of the physical drives in striped disk set
when the system is restarted.
Specify -Y without the E to restore the normal disk
performance counters.

-N    Sets the system disable disk performance counters
when the system is restarted.

\\computername Is the name of the computer you want to
see or set disk performance counter use.

Thus, typing diskperf -E \\MIPS40 will turn on enhanced disk counters the next time server MIPS40 is rebooted.5

Once performance counters have been enabled, you can begin to monitor counters from the Logical Disk Object:

  • The % Free Space and Free Megabytes counters indicate respectively the percentage of disk space that is not filled and the number of megabytes of disk space that are not filled. If Free Megabytes falls near or below the space needed to hold the page file at its maximum size, the system might be unable to grow the paging file and will start giving you "Out Of Virtual Memory" indications. In general, it is probably wise to set alerts on % Free Space less than 5% on all drives on servers.

  • % Disk Time—This counter indicates the activity of the disk drive, including both reads and writes as a percentage of total elapsed time. It is a good indicator for excessive disk activity. If this value achieves a sustained level greater than 50%, the disk is approaching a full duty cycle, and you may have a thrashing condition, indicating that some corrective action needs to be taken. You may wish to examine % Disk Time on all the volumes of a server to see how load is being balanced across the disk drives and consider moving files as necessary (particularly in database server applications) to try to equalize load on the drives on the system.

  • The Physical Disk object provides a set of counters similar to those used for the Logical Disk objects. These will give you information about performance of a physical disk platter but will not give you information that can be broken down by partition and therefore is probably less useful in most circumstances. However, Microsoft does make one interesting recommendation,6 which is to observe Average Disk Access Time for physical disks. If you have multiple platters available, particularly in a SCSI disk system in which the disks could be striped, striping will probably improve disk performance if average disk access time for the physical disk is less than average disk time divided by the number of disks available striped.

With respect to setting alerts on disk performance counters, again, bear in mind that turning on disk performance counters (using the disperf -y command syntax) will extract a 10% to 15% performance penalty on disks for which performance monitoring has been enabled. However, on servers for which you suspect that disk performance may represent a system bottleneck, it might well be advisable to turn on disk performance monitoring as a debugging aid and then set an alert on the Disk Queue value in the Logical Disk Object for any disks on which you suspect that performance may be a problem. Set the alert to trip if a sustained value greater than one is achieved. This will indicate that disk transfer requests are being received faster than the disk can accommodate them. Monitoring this value might indicate when a particular disk is accessed more frequently than the physical disk hardware can sustain, in which case you need to consider moving files around on the disk or replacing the existing disk setup with a stripe set.

You should also be concerned if you see a Disk Queue higher than one and cannot account for it. If the level of traffic is such that the disk ought to be able to handle it, consider monitoring Average Disk Bytes/Transfer and Average Disk Sec./Transfer. You can use this information by dividing Average Disk Bytes/Transfer by Average Disk Sec./Transfer. You will get a transfer rate in Bytes/Sec. Comparing this with the specifications for the disk drive may indicate if a disk drive is starting to lose performance due to wear, fragmentation, and so on. Periodic monitoring of this value and historical logging of this information on a month-to-month basis may help you determine when a disk needs to be reformatted to eliminate fragmentation or when the disk hardware is beginning to have problems.

Network Performance

Up to now we've been concerned with monitoring other parts of the system to detect and overcome system bottlenecks. But this is a book about networking, and as any network administrator knows, the odds are much higher that you will experience performance bottlenecks on your network than on almost any other component. The classic approach to this problem (other than guesswork, jiggling the network cables, and so forth, which are always good ideas if you're having a network problem on a workstation), is to break out the Protocol Analyzer, and this remains the preferred method of dealing with a wide variety of network problems (with NT 4.0, you might try Network Monitor first).

Where NetBIOS is used (NetBEUI, NBT, NBIPX), Windows NT actually provides built-in performance tuning that will give you almost (but not quite!) the same information you'd get from a protocol analyzer. You can't get down into the wire and actually look at the bits in the packets, but you can look at data rates, and collisions. You can in fact perform a sophisticated level of system performance monitoring in the software itself. There are also performance counters that can be used in monitoring performance of some of the critical software components, including the LAN Manager workstation and the LAN Manager server. We examine all of those in what follows.

As you will recall from Chapter 1, the Redirector is a software component in the Windows NT Executive, which essentially acts as a traffic cop and determines when data transfers need to be handled by local resources (such as hard disks) and when they need to be handled over the network. It is, therefore, the component that sits nearest the center of the Windows NT network and is a good place to look for network bottlenecks.

Several parameters of the Redirector Object can be monitored here that may prove useful in problem detection and network turning:

  • Bytes Total/Sec.—This value provides an overall indication of how busy the redirector is and provides the simplest direct measure of network performance (in combination with the same counter for the Server object, below).

  • Current Commands—This counter is the number of redirector commands waiting in queue to be serviced. If it rises to a value significantly higher than the number of network cards in the system, you're dealing with a severely bottlenecked network server.

  • Network Errors/Sec. —This counter indicates the number of serious network errors (generally collisions) being experienced in the system. You can look for further information in the System Error Log (using Event Viewer) because there will be an entry every time a network error is generated. In any case, if Network Errors/Sec. rises above zero on a well-behaved network (or above some small background value in a heavily loaded network), you have a problem somewhere in the subnet, and you'll need to trace it down.

  • Reads Denied/Sec. and *Writes Denied / Sec.—*These counters indicate that a remote server's refusing to accommodate requests for raw reads or writes. Raw reads or writes are techniques that Windows NT uses to increase data rates in large data transfers. Instead of transferring packet frame information for each data packet, a virtual circuit connection is opened and a whole stream of raw data packets is transmitted, maximizing the throughput rate for the duration of the virtual circuit connection. If the server is running low on memory, it may refuse to participate in this kind of a connection because it cannot allocate the necessary local buffer space. Therefore, the Reads Denied/Sec. and Writes Denied/Sec. counters are direct indications of memory problems at the file server.

    Obviously, the preferred solution to this problem is to increase the memory in the server (or at any rate, examine the file server and determine why it is running so low on memory that it's refusing to allocate space for raw reads and writes). If it is impossible to fix this problem promptly (i.e., you don't have extra RAM to put in the server or cannot immediately take it off-line), you can add UseRawReads and UseRawWrites values to the Parameters sub-key of the LANManWorkstation entry in the system registry and set them to False. This action will stop futile attempts to use raw I/O, thus increasing throughput. Again, however, the preferred method is to correct the problem at the server. One further registry setting that might help where networks are heavily used is to create a UseNTCaching value in the Parameter sub-key of the LANManWorkstation registry sub-key to True. This will cache I/O requests during file writes, reducing the number of requests transmitted across the network. In effect, repeated writes will be cached locally, and then a single request for transfer will transmit all the information across the network. When a network is heavily loaded, this setting may improve performance.

All Windows NT systems are to some extent servers, whether they are dedicated as file servers or functioning as desktop workstations. And operations in which services are provided and resources are shared are managed by the Server Object. This setup can be monitored from the Server Object in the Performance Monitor. Appropriate counters and indicated performance are as follows.

  • Bytes Total/Sec.—This value provides an overall indication of how busy the server is and should probably be monitored on file servers because an increase over time indicates a need to expand server memory (or perhaps even to consider upgrading your server hardware).

  • Errors Access Permissions, Errors Granted Access, Errors Logon—All of these indicate security problems. These may be as innocuous as someone forgetting a password but could indicate that someone's attempting to "hack" your system. In particular, a high value for Errors Logon may indicate that someone is trying to hack the system using a password-cracking program. You will want to examine the system security log (using Event Viewer), and you may want to enable auditing (from User Manager) to track what's happening. This is also a classic application for a protocol analyzer (sniff the LAN and see where those errors are coming from!), and in NT 4.0 you may want to fire up Network Monitor.

  • Errors System—This counter will show the number of unexpected system errors that the server is experiencing and indicate that there is a problem with the server. Check to see whether the server is running out of memory and check the system error log to see if you have a hardware problem. If neither is indicated, call a Microsoft-certified professional technician or Microsoft technical support.

  • Pool Nonpaged Bytes, Pool Nonpaged Failures, and Pool Nonpaged Peak—These counters give an indication of the physical memory situation with respect to the Server Object. Pool Nonpaged Bytes indicates the amount of non-pageable physical memory that the server is using; Pool Nonpaged Failures indicates the number of times it attempts to allocate memory that is not available. Any value above zero for the latter indicates that the physical memory in the system is too small. Pool Nonpaged Peak tracks the maximum value that Pool Nonpaged Bytes has reached since the server was started—a direct measure of how much memory the Server object needs. If you get an indication that the server is running out of memory, reset the Server Object in the Control Panel/Network settings and consider using the Minimize Memory Used optimization setting. However, doing so will reduce system performance and will likely prove inadequate where you are attempting to establish connections with more than five systems at once. Increasing the physical memory is always the preferred solution to this problem.

  • Pool Paged Bytes, Pool Paged Failures, and *Pool Paged Peak—*These parameters give a similar indication for pageable memory used by the server. In this case, the solution to the problem may be to increase the page file size on the system (set in Control Panel/System's Performance tab—press the Change button in the Virtual Memory section).

  • *Server Sessions—*This parameter counts the number of sessions currently open on the server—a direct measure of server activity (note that individual users can have more than one session open at a time).

  • Sessions Errored Out and *Sessions Timed Out—*These parameters give an indication of the number of times that network errors are causing a session to be disconnected or, alternatively, the number of times that an administrative auto disconnect setting (from User Manager) is disconnecting users with idle connections. Sessions Timed Out may be useful on a system with a heavily loaded server that's experiencing memory problems.

  • Sessions Logged Off and *Sessions Forced Off—*These parameters count the number of users who have logged off normally and those who were forced to log off (either by active intervention of an administrator or because of the time limits set in their profile). The latter counter may be useful on a system with a heavily loaded server that's experiencing memory problems.

There are also various NetBEUI, NBT Connection, Appletalk, NwLink, and NetBIOS/IPX/SPX Objects—which all provide similar counters, most notably Bytes Total/Sec. and Packets/Sec., measuring respectively the total data transfer for all packets containing data and the total number of packets transmitted. You can work out the packet size by dividing an average of the Bytes Total/Sec. by Packets/Sec. If that number begins to change (particularly if it begins to drop), it probably indicates a collision condition in which you have a large number of packets that don't contain any data. Some of the protocol objects present additional counters that may be helpful in diagnosing specific problems. To see the counters and a brief explanation of what each does, start Performance Monitor, select Add to Chart, select the object, and press the Explain>> button. A Counter Definition will appear, as illustrated in Figure 5.6, and you can scroll through the list of counters to see what each indicates.

  • Application-Specific Performance Counters—In addition to the standard NT counters, many applications—including those in Microsoft's BackOffice family—export their own counters that can be charted, logged, and tracked in exactly the same way as the built-in ones. Among the most valuable of these are the counters for IIS (covered in Chapter 7) and those from Microsoft SQL Server. If you have a server application, check the documentation for Performance Monitor support.

Performance Monitor-Logging

Besides using Performance Monitor to examine instantaneous counter values for troubleshooting, you can use it to create performance data logs over extended periods of time—which is particularly useful on servers. To do so:

  1. Start Performance Monitor, and select View\Log.

  2. Select Edit/Add to Log, and add the Processor, Logical Disk, Memory, Redirector, and Server objects. (Note that using the Logical Disk object requires you to start disk performance counters as described earlier in this chapter.)

  3. Select Options/Log. Specify a full pathname for your log file and how often you want to update the log (for example, once each 3600 seconds, which is once per hour).

  4. Click the Start Log button. Performance Monitor will start collecting data, and as it goes, it will display the file size. You can minimize it to a button on the task bar and go on with other work. When you're finished collecting data (for example, after 24 hours), select Options/Log again, and press the Stop Log button.

To view the resulting data: select View/Chart, then Options/Data From..., and a file selector will appear. Type in the name of your log file. You may now add counters to the chart just as you would for a regular chart, but the data will come from the log file (you can also export data in comma-separated-variable format, which can be read in by most spreadsheets). For example, see Figure 5.7. This log was taken on a busy corporate e-mail server. It shows a typical diurnal cycle, with logons peaking in the morning and afternoon. Logging information like this can be a huge help in tracking how your system performs over time.

Some things to look for include Excessive Memory/Page Faults per Second (if the number is consistently 100 or higher, you need more RAM) and Processor/% Total CPU (if it is consistently less than 100%, you are not CPU bound and do not need to buy a faster system to improve server performance). Peak and average network throughput (Server and Redirector/Bytes Total per Second) will tell you if you're saturating your network and need to consider upgrading to 100-base-T or FDDI.

A Final Word About Performance Tuning, Logging, and Maintenance History

The built-in tools (such as Performance Monitor and the configuration Registry Editor) in Windows NT are quite powerful and can make life much easier for a support professional who needs to maintain multiple servers and workstations. They can also, however, lead you into making a grave mistake. It's all too easy to install a Windows NT system, conduct some initial performance tuning, and then forget about it until something breaks, at which point one is left with no record of how well the system performed when it was installed.

Whenever a server is put in, you will most likely carry out an initial performance measurement (and tuning, if necessary). At that time, record the performance results you achieve in a performance history. This can be either a log document that is kept on the server (although if it is in electronic form, keep a copy somewhere else because even if the server goes down, you may need to access the maintenance information) or a separate physical record.

The point of the maintenance history is that the next time you need to conduct a performance tuning or routine check on the system, you have a base of comparison. That is, you know what the system performance was when you conducted the initial tuning and you know how it differs when you look at it later. This base can be enormously valuable in detecting problems. A routine performance tune-up once per month, for example, is probably a good idea. Values for basic performance criteria such as Nonpaged Pool and Paged Pool sizes from the Memory Object, Total Processor Time from the System Object, Logical Disk Available Space, Free Space, % Free Space, Average Disk Bytes/Transfer, Disk Queue and Average Disk Sec./Transfer, Netbeui Bytes Total/Sec., and Packets/Sec., among others, will make it possible when you compare these values to identify when something that is going on with the system will need to be corrected eventually.

For example, if you find that the Nonpaged Pool is rising continuously, you know that eventually you must increase the physical memory in the system. If you find that the Paged Pool is rising consistently, you might need to expand the size of the paging file, consider adding more virtual memory to the system, consider distributing the paging file over multiple disks to improve performance, etc., etc., etc. Use your common sense. Keep a record of this information, and look at it periodically, think about it. That way you will not have to resort to using the troubleshooting information presented later in this chapter.

Windows NT Configuration Registry

Windows NT provides an advanced approach to configuration tracking and maintenance that can be an absolute godsend to system administrators. This approach is mediated through a special tool called the Configuration Registry Editor (REGEDT32.EXE), which is a full-featured database editor for the examination and manipulation of configuration registry information.

Warning:* *The Registry Editor is one of the most powerful administrative tools provided with Windows NT. It is also potentially one of the most dangerous. Editing registry entries and making changes to the registry blindly may render the system completely unstable. Use this tool with care.

The Configuration Problem

How many times have you been faced with this problem? A Windows user comes to you and says, "My system won't work." You ask, "What did you change?" Your user says "Nothing!" You examine the system and find that it won't boot. You know that however sincere the user may be, something changed in the system because it booted before. After further discussion, you find that the user recently added some applications, removed others, and in all probability edited the CONFIG.SYS file, AUTOEXEC.BAT file, and/or any of the dozen or so *.INI files in the Windows\SYSTEM directory (or the PROTOCOL.INI file on a Windows for Workgroups or LAN Manager system). You are now faced with the nightmare of system administrators the world over—trying to correct configuration problems in the absence of any backup information at all. The odds are quite good that the solution to the problem will be to reinstall Windows, reinstall networking, and reinstall applications, because there really isn't anything else you can do.

Windows NT attempts to solve this problem with a configuration registry, a true database organized as a multiple tree structure and maintained individually on every Windows NT server or workstation. This database contains all (well, in theory all, but in practice most) of the information that is contained in the AUTOEXEC.BAT, CONFIG.SYS, *.INI files of a Windows system, or in the enormous CONFIG.SYS file of an OS/2 system, or in the PROTOCOL.INI file of a LAN Manager system. Furthermore, the data is inherently backed up. Multiple copies are maintained, and a special tool is provided for manipulating the data, which, among other things, organizes the data in a logical structure and makes it possible to access the data remotely—a dream come true for many system administrators. This tool is called the Configuration Registry Editor (REGEDT32.EXE).

The Bad News

The availability of a centralized configuration database and a proper tool for managing it is a dream come true for system administrators, up to a point. Unfortunately, the current implementation of the Registry Editor is less than perfect. It looks and behaves much like the Windows 3.x File Manager—neither the best nor the worst thing that one could think of to use as a model—but its most unfortunate feature is that (much like the various *.ini files it replaces), the Registry continues the system management tradition of providing configuration information in the form of thousands of incomprehensible key values that are not documented anywhere.7 This situation is extremely frustrating and potentially dangerous. It means that when you first examine the Registry, you need to be very careful not to change anything. If you do, it's almost impossible to get the initial value back because there's no place to go look it up. It also means that finding the appropriate values to modify in a system is difficult.

Configuration Registry Structure

As mentioned above, the Configuration Registry is organized as a multiple tree database. This is stored in such a manner that it is fully backed up in a system, as we will see. Changes to the Registry are made through a Registry Editor, which enforces a high degree of atomicity in the database—you are guaranteed to see either an old or a new value for any registry key. You will never see a mixture of old and new values even if a system crash occurs. That's the good news.

Physical Data Structure

The registry is organized as a series of hive files, stored (with associated logs) in binary form on your computer's hard disk.8 You can also back up the registry manually using REGBACK and REGREST from the NT resource kit, thereby providing yourself with a fallback in case the files become corrupt.

Fortunately, Windows NT goes to considerable lengths to make sure that the Registry doesn't become corrupted, and it provides a last known good configuration recovery menu during system start. So you will usually be able to recover at least to a previous known state in a system reboot (provided, of course, that nobody has been making dramatic Registry changes in an ill-thought-out manner).

Logical Data Structure

Because you will invariably access the Registry through the Registry Editor, the data structure of most importance is the logical data structure that you see when observing the Registry Editor. This is organized at the top level into five9 registry keys, or five entry points into the four major tree structures that contain the system Registry information. HKEY_LOCAL_MACHINE is the tree structure describing the hardware and software configuration of the machine whose Registry Editor you are running or whose Registry you have loaded remotely. HKEY_CURRENT_USER is the Registry information applying to the currently logged-in user of the system. HKEY_CLASSES_ROOT is Windows NT's OLE database. HKEY_USERS maintains the list of users in the local machine's local login database and the security identification number (SID) for each user along with the program groups, control panel settings, environment variables, and so forth associated with each user's login. HKEY_CURRENT_CONFIG stores settable parameters (video display settings and network enabled/disabled) for the hardware profile currently in use.

Of the keys, by far the most useful for system maintenance is HKEY_LOCAL_MACHINE, which contains, again, the actual description of the system and the settings that would formerly have been found in CONFIG.SYS, AUTOEXEC.BAT, or *.INI file. This is the Registry key with which we are most concerned in this chapter.

The HKEY_LOCAL_MACHINE Key

Starting from the HKEY_LOCAL_MACHINE entry there are five sub-keys: HARDWARE, Security Account Manager (SAM), SECURITY, SOFTWARE, and SYSTEM. Of these, the SAM and SECURITY sections are of interest to us only insofar as we know that they exist. They cannot be accessed except through the appropriate APIs (in the case of SAM—the SECURITY entry cannot be accessed at all). These registry entries contain the security information used to validate logons into the system and to validate privileges and user access rights. They cannot be edited manually.

The HARDWARE key contains a description of the system, which is updated every time the system restarts. This is done through use of a hardware recognizer, one component of the Windows NT boot process. Examining the HARDWARE key, you'll find sub-keys for DESCRIPTION, DEVICEMAP, and RESOURCEMAP. A sub-key of the DESCRIPTION will be System, which will contain information about components such as the central processor (or processors) and the various adapters in the system. The DEVICEMAP sub-key will contain a list of the I/O devices in the system, as will the RESOURCEMAP sub-key. This information is used by the various Windows NT system software components, such as the network components and the Control Panel, which will examine the HARDWARE key in the Registry to identify any or all network cards in the system and test their settings. It can be used by an administrator to determine what hardware is in the system and the status of the hardware (this is better done using the Windows NT Diagnostics tool described later in this chapter), but obviously it can't be changed (other than if you change the hardware and restart the computer).

The SOFTWARE sub-key contains, first of all, the sub-key called Classes, which provides the software class associations used by Windows Explorer (the same data is pointed to by HKEY_CLASSES_ROOT); that is, it associates a three-letter file extension with a program. This is followed by a Description sub-key that appears to be used currently only as a temporary repository for Microsoft Remote Procedure Call (RPC) addresses and sub-keys for each vendor that supplies software to the system. In Windows NT systems today you are certain to find a sub-key called Microsoft—and there is some small probability that you will see sub-keys called Lotus, Borland, or whatnot in the future (if you have the NetWare Requester for Windows NT installed, for instance, you'll see a Novell sub-key).

Within each vendor (such as Microsoft) sub-key you will see sub-keys for programs or components, and within those component sub-keys are sub-sub-keys for versions of the products. Within those sub-sub-keys you might find information about the product and product settings. From an administrator's point of view, the value of this information lies solely in that it does provide a central resource for determining the versions of software currently installed in the system. You can examine the SOFTWARE entries for each vendor, and if you click, for example, on the LAN Man Server entry under Microsoft, you'll see a sub-key called Current Version. Clicking on that will list description and installation date, major version, minor version, and other data. This information can be used by software such as Microsoft's SMS to automatically track and update software versions across the network.

If ODBC drivers are installed on your system, there will be an ODBC sub-key containing information about the drivers that are installed and the servers that are supported.

The SOFTWARE key will also contain a Secure sub-key (the purpose of which is not clear at the moment), a Program Groups sub-key listing any Windows (or NT) 3.x program manager groups that have been converted to links on the NT 4.0 desktop, and a Windows 3.1 Migration Status sub-key. This sub-key will indicate the status of any migration information for systems providing dual boot between Windows 3.1 and Windows NT that have been upgraded from a Windows 3.1 or Windows for Workgroups installation to a Windows NT installation (this key is really obsolete in NT 4.0 and may appear only if you've upgraded from NT 3.x).

After the SOFTWARE sub-key, there is only one more sub-key of the HKEY_LOCAL_MACHINE, the SYSTEM sub-key. This is the one that contains practically everything of interest to a support professional.

Opening the SYSTEM sub-key, we find a number of sub-sub-keys. The most important are the ControlSets: CurrentControlSet, ControlSet001, and ControlSet002. A ControlSet is a tree structure containing information on all the main services of a Windows NT system, including parameter settings. The system maintains a CurrentControlSet, which is the one currently being used in the system, and two fall-back copies representing previous configurations. During shutdown the CurrentControlSet will be copied into ControlSet001, so that it always contains the ControlSet in use when the system was last shut down. That, in turn, replaces ControlSet002 during system start if the system starts correctly. If the system fails to start correctly, an attempt will be made to start it using the earlier configuration. You could also have the option of doing this manually using the last known good configuration menu, which comes up during a Windows NT system start. This feature alone is immensely valuable to system professionals because it means the system automatically protects users from themselves. If you have a system that starts to misbehave, there is a very good chance that by reverting to one of the two last known good configurations, you will be able to recover.

In addition to the control sets, the SYSTEM key contains DISK, Select, and Setup sub-keys. The DISK sub-key contains a binary disk signature. The Select sub-key tells you which of the Control Sets is in use. Examining this list, you'll see entries for Current, Default, Failed, and LastKnownGood, which (by default on a system operating normally) will have a Current value of one, Default value of one, LastKnownGood value of two, and a Failed value of zero. If a configuration corruption is detected during startup, the Failed value will rise, and the system will attempt to use the last known good entry as the current entry instead of using the default entry.

The Setup sub-key of the SYSTEM key contains information about the Window NT system setup that was performed when the system was installed. This includes the network card, the type of setup performed, and the setup command line employed. There is an entry for system setup in progress. If you ever examine this entry and it is other than zero, something has gone dreadfully wrong, and it will indicate the path to the system setup files.

By far the most important information, again—from a support professional's point of view—is the information contained in CurrentControlSet, which we examine next.

The CurrentControlSet Key

CurrentControlSet contains four10 sub-keys: Control, Enum, Hardware Profiles, and Services.

  • The Control sub-key contains information such as the load order for the device drivers and services (in the GroupOrderList and ServiceGroupOrder sub-sub-keys) along with much of the Control Panel and Setup data. This will rarely be edited directly by an end user or administrator, but will simply reflect the settings set for the computer using other tools. So from an administrator's standpoint it is the Services sub-key, finally, that contains the parts that are a matter of concern.

  • The Enum sub-key contains information used by NT's Hardware Enumerator at boot time to determine what hardware devices are attached to the system. The most useful portions of this key are found in the Enum/HTREE/ROOT/0 sub-key, which will contain two entries: a multi-string list called AttachedComponents and a dword value for FoundAtEnum (normally 1). This might be useful in troubleshooting a system that refuses to identify a peripheral. If it's not in the list, it wasn't enumerated, which means NT didn't recognize it. Enum/ROOT contains a series of entries listing all the devices NT looks for at boot time. Currently, all are listed as Legacy devices, presumably in preparation for the introduction of plug-and-play support in a future release.

  • The Hardware Profiles sub-key contains numbered entries for every hardware profile on the system, each of which will have a Software and System sub-key of its own. These keys indicate only those items that have profile-specific settings—typically the display driver settings and settings for disabled services.

  • The Services sub-key provides individual sub-keys associated with each subsystem or hardware device driver. Within each sub-key the linkage of the subsystem or driver to other devices appears in a sub-key, and there may be a parameters sub-key that will have any user-set parameters for the component. Some sub-keys will also have an auto-tuned parameters key associated with them, which will incorporate parameters dynamically tuned by the component itself.

If you start the Registry Editor (by typing REGEDT32 from the command line) you will see the Registry Editor display containing within it the five windows containing the four Registry keys. Select the one called HKEY_LOCAL_MACHINE and double-click the HKEY_LOCAL_MACHINE key entry to list its sub-keys; double-click the SYSTEM sub-key; double-click the CurrentControlSet sub-key; double-click the Services sub-key. This will give you a list of all of the services and hardware components in the system. If you now double-click on the Browser sub-key, you'll see Parameters, Linkage, and Security. Double-clicking on Parameters will give you a list of parameters for the sub-key.

Note that this list is not necessarily complete—and this is one of the problems with the Registry as it currently exists. It's possible for a parameters entry in a sub-key entry for a component to be empty. This does not mean that there aren't any parameters. It means that the component is using the default parameters, whatever those might be.

On a particular Windows NT Server system, the Parameters for Browser are IsDomainMaster, which is a parameter of type REG_SZ (a string data type), is set to False; and the parameter MaintainServerList, which is again of type REG_SZ and is set to Yes (there is also a DirectHostBinding value listing the protocols to which the browser service is bound). Possible values for IsDomainMaster would be True and for MaintainServerList would be No. What these settings do, in fact, is determine the operation of the system browser, the component that determines the response to a net view command or to clicking to the Connect Net Drive icon in File Manager. IsDomainMaster determines whether the system in question stores the browse list or the list of systems that can be accessed on the local workgroup or domain.

In this case, even though the system in question is a backup domain controller for the Windows NT Server logon domain in question, it is not the domain browse master. In fact, one of the workstations on the system is functioning as browse master. However, because MaintainServerList is set to Yes, the system does maintain a list of the available systems and can act as fall-back to the browse master if it does not respond to a browse request from other workstations. (See the section on browsing in Chapter 9.)

To edit any of these entries, such as the IsBrowseMaster entry, it is necessary only to double-click on it. Because these entries are of the type REG_SZ, the String Editor will then appear, allowing you to type in a character string. Again, at this point we have one of the unfortunate problems with the Registry database. Obviously only certain strings will provide acceptable entries for string data types, yet there's nothing to indicate how a string should or should not be typed. In fact, the TRUE and FALSE values are uppercased, yes and no values are lowercased. You must find this kind of information by examination (for that matter, as this is written, we are unsure whether the choice of case is even significant—it may not be).

Another data type is REG_DWORD, the double-word data type, which contains a 32-bit binary value. Double-clicking on one of these, such as the LMAnnounce parameter in the LAN Man server sub-key, you will be presented with a Dword Editor, which will show the data in question in your choice of a binary, decimal, or hexadecimal representation. This can be of some use to you in setting a particular value because you can type it in using the most convenient form. Again, however, there is no explanation of what the acceptable values are. The LMAnnounce value, in point of fact, has legal values of zero or one, a one indicating that the system is to perform LAN Manager 2.x-compatible system announcements and a zero indicating that it is not. Fortunately, as with most entries in the system sub-key, it is not necessary to edit this value from the Registry Editor. You can edit the value, in fact, by using the Control Panel/Network Settings, Services tab: select Server from the list of installed network software, and click the Properties button. You will then see a screen offering a choice of four possible optimizations and a checkbox titled Make Browser Broadcasts to LAN Manager 2.x Clients. Checking this box and clicking OK will change the Registry value from zero to one, and if you return to the Registry Editor, you'll see, in fact, it updates itself and the LMAnnounce value will now be set to 0x1 as type REG_DWORD.

You will also notice a Size value in the LAN Man server sub-key. Size, which is a REG_DWORD, represents the server optimization value that has been selected from the Control Panel. Because the four possible values are one through four, it is obvious that a value of zero or five, for instance, would be illegal, yet there is nothing in the Registry Editor that would indicate this.

Why Go On and On About the Limitations of the Parameter Settings?

Why do we keep harping on the limitations of the parameter settings? Because it's dangerous to edit settings in the Registry Editor! Never do this if there is an alternative. Do not change the LMAnnounce setting with the Registry Editor—change it from the Control Panel. Do not change the server size from the Registry Editor—change it from the Control Panel. Whenever you examine a setting in the Registry Editor and consider changing it, try to find an alternative way to change it first. And these ways are usually available in one or the other of the Control Panel components on a Windows NT system.

There really ought to be a button associated with the Registry Editor that would examine the LMAnnounce parameter and tell you that it can be changed in Control Panel/Network Settings (much like the Explain>> button in Performance Monitor's Add to Chart dialog). And because that way of changing it is available, the ability to edit it directly ought to be disabled. There are, of course, circumstances in which you have no choice.

The registry also allows you to configure systems remotely. From the Registry Menu of the Registry Editor, you can perform a Select Computer, select another Windows NT server or workstation on the network, and edit that computer's Registry (though when you do so, only the HKEY_LOCAL_MACHINE and HKEY_USERS menus will be available). If you must set a parameter remotely, that may be the only way to do it. But this is something that must be done with extreme care—when you use the Registry Editor to make a parameter change, you run the risk of typing an illegal parameter or deleting a value and not being able to remember what it is. Possibly the worst thing that you could do would be to delete a value then wish to re-establish it—and re-establish the wrong type.

Suppose, for example, we delete the LMAnnounce parameter. Blindly looking at the registry editor and thinking about the LMAnnounce parameter—remembering that it only has two possible states, on or off—we might very well tend to restore it as LMAnnounce type REG_SZ with a value of True or False. That would not work properly. Worse, it might cause the browser to malfunction, rendering the system unstable. We repeat: Do not make parameter changes using the Registry unless you have no alternative.

Registry Value Types

The types of entries that can be accepted in a Registry value include:

  • REG_DWORD is a double word value that can be represented as a decimal, hexadecimal, or binary number. By default, when displayed in the Registry, it will be displayed in hexadecimal format.

  • REG_SZ is a Registry string value, and this will be a data string.

  • REG_EXPANDSZ is a special string type used when you need to include environment variables within the string. For example, a legal REG_EXPANDSZ could contain the value %system root%/SYSTEM32/whatever. The %system root% environment variable will be expanded to the appropriate directory path at the time that the string is evaluated.

  • REG_MULTI_SZ is a multiple string type. Double-clicking on a REG_MULTI_SZ value will bring up a multi-string editor with scroll bars, allowing you to enter multiple strings with one string on each line in the editor.

  • REG_BINARY is used for binary data storage**,** and the Binary Editor is necessary to edit it. The Binary Editor can also be used to edit other types. It provides a bit-by-bit representation of the data similar to that used by the Dword Editor with the binary type selected. You can use the Binary String Dword and Multi-string options under the Edit Menu in the Registry to select whether the Binary String Dword or Multi-string Editor is used with a particular Registry entry, and all Registry entries are, in fact, 32-bit entries. Registry Names are not case sensitive, but they do preserve case, and they are unicode compatible.

Registry Capacity and Size

Currently,11 the total size of the NT registry files is limited to approximately 2GB (the limit of NT's 32-bit address space) or the free disk space available on the system volume, whichever is less. However, NT continues to require a maximum registry size to be set (Control Panel/System, Performance tab—press the Change button in the Virtual Memory section) and indicates how large the registry has become in comparison to that maximum with the % Registry Quota performance monitor counter, described earlier.

One Last Time...

Finally, a reminder: the Registry is an extremely powerful tool. It's tremendously useful when properly controlled. But if you get in there and meddle around blindly, you will mess up your system beyond repair. Treat it with care.

Other Tools

Windows NT Diagnostics

One of the most overlooked tools for troubleshooting NT systems is a 32-bit version of the Microsoft System Diagnostics (MSD) program. NT's version of MSD is actually implemented as a Windows application with a graphical interface, and its executable file is therefore named WINMSD.EXE. To launch it, select the Windows NT Diagnostics item from the Start Menu's Programs/Administrative Tools folder (see Figure 5.8).

Using Windows NT Diagnostics Beginning with NT 4.0, WINMSD was redesigned and significantly enhanced. It now sports a Windows 95-style tabbed dialog user interface and provides more information about the system, and best of all, it can be used over a network to examine a remote system. This works because, unlike MSD on DOS systems, NT Diagnostics is actually reporting information from the NT registry.

The information available in WINMSD is extensive and includes:

  • *Version—*The topmost tab on the WINMSD display shows system version information, including the NT version number, type (workstation or server) build and type (free or checked—the latter implies an instrumented kernel and is used mainly for development), CPU architecture, and multiprocessor support. It also displays the distribution CD's serial number and the name of the person to whom this copy of NT is registered.

  • *System—*This tab provides system-level information about the hardware on which WINMSD is being run, including vendor ID, Hardware Abstraction Layer (HAL) type, BIOS date, and a description of the CPU(s).

  • *Display—*This tab lists the video BIOS date (if available), display processor and DAC (Digital-to-Analog converter) types, driver type and revision, currently set video resolution, amount of video RAM, vendor, and lists of all the associated files.

  • Drives—This tab provides a tree display, sorted by drive type or drive letter, of each logical disk drive (be it a separate physical disk, logical disk partition, or network drive) known by the system. Double-clicking on any drive brings up a Drive Properties dialog (see Figure 5.9) showing general information including the drive letter, serial number, and disk space available and in use (displayed both in clusters and in bytes). A File System tab on the Properties dialog box gives general information about the file system (NTFS, FAT, CDFS, NetWare-Compatible, etc.) in use, including the maximum number of characters in a filename, and tells whether the namespace is case sensitive, supports Unicode, supports Compression, and so forth.

  • *Memory—*This tab gives details on system memory utilization, including the total number of handles, processes and threads in use, amounts of physical RAM; kernel (non-pageable) RAM; committed RAM and page file space in use and available; and location and size of all page files. Most importantly, it records peak usage for both committed RAM and pagefile space, providing a simple way to determine whether a system is running low on virtual RAM, which can then be changed with the virtual memory settings described earlier in this chapter.

  • *Services—*This tab displays essentially the same information as Control Panel/Services, with the additional refinement that it can display identical information for device drivers (when you press a Devices button on the bottom of the tab). Clicking the Properties button at the bottom of the NT Diagnostics dialog in this tab brings up a Service Properties dialog for the service or driver in question (see Figure 5.10), which displays the executable file associated with the service or driver, its start type, the user account with which it is associated (normally LocalSystem), any error associated with it, and its service flags (driver type, whether it runs in its own memory space, whether it can interact with the NT desktop). A dependencies tab allows you to see on what other services the service or driver in question depends (which can help in diagnosing why a particular service or driver fails to start.)

  • *Resources—*This unique (and most useful!) NT Diagnostics' tab displays information about hardware resources, including interrupts (IRQ), I/O Ports, Direct Memory Access (DMA), Memory; and Device Drivers. For each type of resource, a list is displayed indicating the associated device driver, bus, and bus type. Clicking the Properties button when this tab is displayed will yield a Resource Properties dialog box (see Figure 5.11) giving details on the device driver that "owns" the resource and telling whether the resource is shared (for device drivers, the dialog lists resources owned by the driver). A check-box on this tab allows you to choose whether resources owned by the NT HAL are listed.

    We cannot overemphasize the usefulness of the Resources tab. If you encounter a hardware problem such as a network card refusing to operate, after checking the physically obvious (i.e., making sure the network cable is actually present and connected), start Control Panel/Network's Adapters tab, and press the Properties button to view any card settings, such as the I/O port and IRQ. Then launch WINMSD, bring up its Resources tab, and see what driver owns those resources. If the two disagree, believe WINMSD. Both it and the Control Panel are getting their information from the NT Registry, but the Control Panel indicates the setting you requested, while WINMSD indicates the setting NT has actually used. Once you've identified such a problem, it's generally possible to correct it.12

  • *Environment—*This tab displays all environment variables and values. By default it shows global (system-level) values, but a Local User button shows user-specific entries as well.

  • *Network—*A close second in value to the Resources tab, the Network tab provides a wealth of network-specific information, including the network version, a list of logged-in users, transport-level protocols in use and associated Ethernet addresses, internal network Settings, and cumulative Statistics since system start. The Settings information is especially valuable—it corresponds to various registry entries for the Server and Workstation services, but centralizes them all in one place (and gives readable names for them!), which is of great value. Unfortunately, this value is somewhat lessened by the fact that WINMSD help provides no information on how to change any of the settings displayed.13

Using NT Diagnostics Remotely To view diagnostic information on a remote computer, use the File/Select Computer… menu. Most information displayed will be identical to that available if you run WINMSD locally. Exceptions may include certain details on the Display tab and Environment properties for the Locally logged-in user.

Task Manager

Like NT Diagnostics, Task Manager has been significantly enhanced for NT 4.0. You can now launch it by right-clicking in the taskbar at the bottom of the NT desktop (in an area not covered by an iconized application button) and selecting Task Manager from the resulting context menu. By default, it will appear with its Applications tab selected, as shown in Figure 5.12, which is useful mainly for shutting down hung applications. However, two other tabs are of special use in troubleshooting problems:

  • *Processes—*This tab lists all processes running in the system. It's a much longer, and more detailed list than the top-level Applications tab (see Figure 5.13), providing process name (typically the name of the associated .EXE file, though some internal processes have descriptive names instead), ID number, and the amount of CPU time (as a percent of the total available) and memory in use by each process. This vastly simplifies troubleshooting memory hogs, because you need only bring up the Task Manager and look for them—processes using thousands of Kbytes are easy to spot! It also helps in diagnosing the occasional problem with one or more invisible and hung instance(s) of an application.14 E If a user complains to you that a program won't start, no matter how often he or she clicks on its Start menu entry or icon, examine Task Manager/Properties and look for the associated .EXE file. If you find it, use the End Process button to stop it (and any duplicates). In all probability this will solve the user's problem.

  • Performance—This tab provides memory details (similar to those from the WINMSD memory tab) along with a graphic display of both CPU and memory utilization that's comparable to what you get with Performance Monitor's Processor and Memory objects. However, it's much faster to just launch Task Manager and look. Among other uses, this display makes it immediately obvious whether excessive disk activity (you don't need software to see that—look at the drive light) is being caused by virtual memory thrashing (Commit Change Total nearly equal to Limit and CPU utilization constantly high), in which case a change to virtual memory settings is called for. It can also help you diagnose a processor hog. If the user complains that his or her system seems excessively slow, first check WINMSD's Performance tab for consistently high CPU utilization when the system should be idle and eliminate thrashing as a possibility by checking that Commit Change Total is under the limit (as well as looking at the drive light and listening to the drives). Then switch to the Processes tab and see what process has the highest CPU value. It may even be a hung application with an invisible window. End that process, and performance is liable to improve (if it's a system process, you'll need to inspect the relevant settings to determine why it was using so much CPU).

Network Monitor: Wiretapping for NT LANs

A protocol analyzer has always been the network technician's tool of last resort. Basically one step above plugging an oscilloscope directly into the network cable, using a protocol analyzer shows you the actual packets on the wire. Implemented as a special-purpose computer (and a fairly simple one, at that), but packaged as a piece of test equipment, a protocol analyzer (colloquially, "sniffer") costs thousands of dollars.

Beginning with version 4.0, NT Server includes a new tool: Network Monitor, which amounts to nothing less than a protocol analyzer implemented in software!

Installing Network Monitor Network Monitor is implemented as a network service. As such, it's installed from Control Panel/Network's Services tab. Press the Add… button on that tab, and select Network Monitor Tools and Agent from the list. Then press OK. NT Server Setup will prompt you for a path to the files (in the /i386, /mips, /alpha or /ppc directory of the distribution CD or a network directory where you've already copied the same information), then copies the necessary files to your system directory. Then it instructs you to reboot your server. Do so.

After your server reboots, Network Monitor will be available in the Start menu's Administrative Tools folder. On startup, it displays an empty capture window (see Figure 5.14).

Capturing Data Before you can perform protocol analysis on your network, you need to capture some network data. To do so, Select Capture/Start. Network Monitor will allocate buffers (you can control the size it allocates with Capture/Buffer Settings) and begin capturing data. If your network is busy, you'll see the network statistics numbers in the upper right corner of the display change rapidly.

If your network appears to be idle, go to another machine on the LAN that's connected to your server, log in, and browse the network. If that doesn't produce any activity, select Capture/Networks and pick a different network segment (Network Monitor monitors only one network at a time). If you're getting lots of activity but no actual frames captured, select Capture/Filter (see Figure 5.15), select the line below "[AND] (Address Pairs)," and Delete it.

When the Captured Statistics section on the right of the Capture Window shows a dozen or so frames captured, select Capture Stop and View. A Capture Summary will appear. Double-click the first line in the Summary. The Capture Summary window will shrink to make room for Detail and Hex windows (see Figure 5.16).

You can adjust the relative sizes of the Capture Summary, Details, and Hex windows—and you'll want to do so (in particular, make sure the Hex window has enough room to display about a dozen lines of text, because some frames require that much room). You'll find yourself looking at a lot of very cryptic data, but it's worth its weight in gold!

Analyzing Captured Data To completely understand what Network Monitor is showing you requires a full understanding of network protocols, something far beyond the scope of this book (or the experience of most technicians). But you don't need to understand protocols completely to gain some benefit (if you don't understand anything about network protocols or if what follows seems to be one incomprehensible acronym after another, read Appendix 2).

Browse your way down the capture display until you find an entry that lists SMB as the protocol (if you can't find one, run Capture/Start again, and execute a NET VIEW from the command line). You should see a display something like this (with minor variations):

7 34.743 NCR_NT  PANTHER40 SMB C negotiate, Dialect = NT LM 0.12 NCR_NT  PANTHER40 IP 
----------------------------------------------------------------------------
+FRAME: Base frame properties
+ETHERNET: ETYPE = 0x0800 : Protocol = IP:  DOD Internet Protocol
+IP: ID = 0x75EC; Proto = TCP; Len: 214
+TCP: .AP..., len:  174, seq: 105127909-105128082, ack: 182164614, win: 8756, 
src: 1285  dst:  139 (NBT Session) 
+NBT: SS: Session Message, Len: 170
+SMB: C negotiate, Dialect = NT LM 0.12
----------------------------------------------------------------------------
00000:  02 60 8C 4C BC 99 00 00 1B 48 D2 AA 08 00 45 00   .`.L.....H....E.
00010:  00 D6 75 EC 40 00 80 06 68 28 0A 02 04 09 0A 02   ..u.@...h(......
00020:  04 01 05 05 00 8B 06 44 1F E5 0A DB 9C 86 50 18   .......D......P.
00030:  22 34 A5 A8 00 00 00 00 00 AA FF 53 4D 42 72 00   "4.........SMBr.
00040:  00 00 00 18 03 00 00 00 00 00 00 00 00 00 00 00   ................
00050:  00 00 00 00 FE CA 00 00 00 00 00 87 00 02 50 43   ..............PC
00060:  20 4E 45 54 57 4F 52 4B 20 50 52 4F 47 52 41 4D    NETWORK PROGRAM
00070:  20 31 2E 30 00 02 58 45 4E 49 58 20 43 4F 52 45    1.0..XENIX CORE
00080:  00 02 4D 49 43 52 4F 53 4F 46 54 20 4E 45 54 57   ..MICROSOFT NETW
00090:  4F 52 4B 53 20 31 2E 30 33 00 02 4C 41 4E 4D 41   ORKS 1.03..LANMA
000A0:  4E 31 2E 30 00 02 57 69 6E 64 6F 77 73 20 66 6F   N1.0..Windows fo
000B0:  72 20 57 6F 72 6B 67 72 6F 75 70 73 20 33 2E 31   r Workgroups 3.1
000C0:  61 00 02 4C 4D 31 2E 32 58 30 30 32 00 02 4C 41   a..LM1.2X002..LA
000D0:  4E 4D 41 4E 32 2E 31 00 02 4E 54 20 4C 4D 20 30   NMAN2.1..NT LM 0
000E0:  2E 31 32 00                                      .12.

The top section (the one beginning 7 34.743 NCR_NT) is from the Capture Summary at the top of the display. It shows that this is frame #7, capture time 34.743, and it came from NCR_NT. Following on from there, we can see that the destination address was PANTHER40 (the machine on which this capture occurred), and the protocol was SMB (Server Message Block). A description of the packet is next.

The next section (the one beginning +FRAME) is the Detail window. It's actually a tree, so clicking any of the + characters will expand that line to show all the properties in question. Thus, clicking the first line yields:

-FRAME: Base frame properties
FRAME: Time of capture = Aug 31, 1996 21:45:16.602
FRAME: Time delta from previous physical frame: 4 milliseconds
FRAME: Frame number: 7
FRAME: Total frame length: 228 bytes
FRAME: Capture frame length: 228 bytes
FRAME: Frame data: Number of data bytes remaining = 228 (0x00E4)

From this window we can see when the frame (or network packet) was captured, along with other information. Clicking on down each + in turn will show us the packet's Ethernet properties, its IP properties (this packet was captured on an IP network), TCP and NBT properties, and finally its SMB properties.

Server Message Block (SMB) is the top-level protocol for NT's built-in networking: irrespective of the underlying name resolution protocol (in this case, NBT), control protocol (TCP), transport protocol (IP), and wire protocol (Ethernet), it's SMB that defines how NT does things like directory browsing. On another LAN the protocols could just as easily be NB-IPX, SPX, IPX and Token Ring, but the top-level protocol for NT is always SMB. Armed with that knowledge, let's look at the Hex dump at the bottom of the display:

00000:  02 60 8C 4C BC 99 00 00 1B 48 D2 AA 08 00 45 00   .`.L.....H....E.
00010:  00 D6 75 EC 40 00 80 06 68 28 0A 02 04 09 0A 02   ..u.@...h(......
00020:  04 01 05 05 00 8B 06 44 1F E5 0A DB 9C 86 50 18   .......D......P.
00030:  22 34 A5 A8 00 00 00 00 00 AA FF 53 4D 42 72 00   "4.........SMBr.
00040:  00 00 00 18 03 00 00 00 00 00 00 00 00 00 00 00   ................
00050:  00 00 00 00 FE CA 00 00 00 00 00 87 00 02 50 43   ..............PC
00060:  20 4E 45 54 57 4F 52 4B 20 50 52 4F 47 52 41 4D    NETWORK PROGRAM
00070:  20 31 2E 30 00 02 58 45 4E 49 58 20 43 4F 52 45    1.0..XENIX CORE
00080:  00 02 4D 49 43 52 4F 53 4F 46 54 20 4E 45 54 57   ..MICROSOFT NETW
00090:  4F 52 4B 53 20 31 2E 30 33 00 02 4C 41 4E 4D 41   ORKS 1.03..LANMA
000A0:  4E 31 2E 30 00 02 57 69 6E 64 6F 77 73 20 66 6F   N1.0..Windows fo
000B0:  72 20 57 6F 72 6B 67 72 6F 75 70 73 20 33 2E 31   r Workgroups 3.1
000C0:  61 00 02 4C 4D 31 2E 32 58 30 30 32 00 02 4C 41   a..LM1.2X002..LA
000D0:  4E 4D 41 4E 32 2E 31 00 02 4E 54 20 4C 4D 20 30   NMAN2.1..NT LM 0
000E0:  2E 31 32 00                                      .12.

The last line in the Detail window (the one beginning +SMB) called this a Negotiate packet, and that's certainly what it looks like. Note the long list of compatible systems—a veritable history of Microsoft's network software: PC Net 1.0, Xenix, MS-Net 1.03, LAN Manager 1.0, WFWG 3.1, LAN Manager 1.2, LAN Manager 2.1, NT LM 0.12…

NT LM 0.12? Yes, the truth comes out! Elsewhere in this chapter, we've referred to the LanManServer and LanManWorkstation objects; that's NT's built-in networking. The 0.12 appears to be a new version numbering scheme.

Browsing down to later SMB packets (to make this easier, select Capture/Filter, double-click the Protocol==ANY line of the resulting Display Filter dialog shown in Figure 5.15, and disable everything except the SMB protocol, then click OK) will show things like the logon process (you'll see the workstation request \\panther40\ipc$) and eventually a list of servers on the network.

For certain commands, SMB will transmit native NT data, which is built on the international Unicode character set (see Chapter 1 and Appendix 1), so the names will appear as .P.A.N.T.H.E.R (the periods represent the zero-byte of the unicode character).

Following this same approach, you can examine various packets in the system, and by looking at them (in conjunction with a good understanding of the material in Appendix 2 and perhaps a good reference book on protocols) begin to make some sense of them. In trying to understand what the packets do, keep in mind that again, it's beyond the scope of this chapter (or even the whole book) to explain all the protocols you're likely to encounter, but some of the more interesting ones (all acronyms that follow are defined in Appendix 2 unless otherwise noted) will include the following:

  • RIP, which carries router data on both IP and IPX networks

  • ARP, which resolves DNS (whoever.whereever.com) names into IP (10.2.3.4) addresses

  • NBT and NBIPX, which put NetBIOS (Microsoft's favorite protocol) onto the IP or IPX transport protocols

  • SAP, which is used to announce service availability on IPX nets

  • Microsoft Remote Procedure Call (MSRPC), which is used to carry out underlying operations (You'll frequently find that an SMB command packet is followed by several RPC packets.)

How can you troubleshoot with this? For starters, look for packet types that don't belong. For instance, if you're running an all-IP net, you shouldn't see any NBIPX, IPX, SPX, or SAP packets. The same applies for NBT, IP, TCP, UDP, and ARP packets on an all-IPX net. Look for an excessive amount of broadcast activity—typically UDP, NBF, or SAP packets. Those may indicate a browser problem.

Learning to use Network Monitor (or any protocol analyzer) takes time and patience. It often helps to exploit downtime when the net is idle or use a private subnet when you can control all traffic. You can turn on capture, perform some operation, stop capture, and view the results. Sometimes those can be very enlightening, as for example:

3588 1152.575 NCR_NT PANTHER40 FTP Req. from Port 1341, 'PASS test' NCR_NT  PANTHER40 IP 
-----------------------------------------------------------------------------------
+FRAME: Base frame properties
+ETHERNET: ETYPE = 0x0800 : Protocol = IP:  DOD Internet Protocol
+IP: ID = 0x7EFD; Proto = TCP; Len: 51
+TCP: .AP..., len: 11, seq:111121206-111121216, ack:188156601,win:8675,src:1341 dst:21
(FTP) 
+FTP: Req. from Port 1341, 'PASS test'
---------------------------------------------------------------------------------
00000:  02 60 8C 4C BC 99 00 00 1B 48 D2 AA 08 00 45 00   .`.L.....H....E.
00010:  00 33 7E FD 40 00 80 06 5F BA 0A 02 04 09 0A 02   .3~.@..._.......
00020:  04 01 05 3D 00 15 06 9F 93 36 0B 37 0A B9 50 18   ...=.....6.7..P.
00030:  21 E3 15 30 00 00 50 41 53 53 20 74 65 73 74 0D   !..0..PASS test.
00040:  0A                                              .

This display is quite real—it was captured on a machine running the IIS FTP Service (see Chapter 7), which had been configured to allow both anonymous access and access by password. And here you see the password, test, in plain text, readable to anyone with a protocol analyzer.

Security Issues The subject of passwords brings us to the matter of security. As we said earlier, Network Monitor is nothing less than a protocol analyzer implemented in software. To put it another way, Network Monitor is the NT network equivalent of a wiretap on a telephone line. There is no greater risk to network security. Using Network Monitor, you or anyone else with access to the Administrative Tools group will have the ability to "sniff" any and all network packets sent to or from your server—not just simple protocols like ftp either (everyone knows that's insecure!). The following is an example of e-mail:

123 166.720 PANTHER40 NCR_NT SMB R read & X, Read 0x74 PANTHER40 
NCR_NT IP 
---------------------------------------------------------------------------------
+FRAME: Base frame properties
+ETHERNET: ETYPE = 0x0800 : Protocol = IP:  DOD Internet Protocol
+IP: ID = 0x2662; Proto = TCP; Len: 220
+TCP: .AP..., len:  180, seq: 189081087-189081266, ack: 112034892,win: 8760, src:  139
 (NBT Session)  dst: 1355 
+NBT: SS: Session Message, Len: 176
+SMB: R read & X, Read 0x74
---------------------------------------------------------------------------------
00000:  00 00 1B 48 D2 AA 02 60 8C 4C BC 99 08 00 45 00   ...H...`.L....E.
00010:  00 DC 26 62 40 00 80 06 B7 AC 0A 02 04 01 0A 02   ..&b@...........
00020:  04 09 00 8B 05 4B 0B 45 25 FF 06 AD 84 4C 50 18   .....K.E%....LP.
00030:  22 38 0A 06 00 00 00 00 00 B0 FF 53 4D 42 2E 00   "8.........SMB..
00040:  00 00 00 98 00 20 00 00 00 00 00 00 00 00 00 00   ..... ..........
00050:  00 00 01 10 FE CA 03 08 40 2C 0C FF 00 00 00 FF   ........@,......
00060:  FF 00 00 00 00 74 00 3C 00 00 00 00 00 00 00 00   .....t.<........
00070:  00 00 00 75 00 00 00 00 6A 72 75 6C 65 79 00 00   ...u....jruley..
00080:  00 00 00 52 45 3A 20 45 2D 4D 61 69 6C 20 53 65   ...RE: E-Mail Se
00090:  63 75 72 69 74 79 00 00 00 00 00 00 00 00 00 00   curity..........
000A0:  00 00 00 00 00 00 00 00 00 00 00 00 33 00 35 00   ............3.5.
000B0:  17 00 1F 00 08 00 CC 07 00 30 30 30 30 30 30 30   .........0000000
000C0:  36 00 00 00 45 08 00 00 00 00 00 00 00 00 18 00   6...E...........
000D0:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
000E0:  00 00 35 F7 C8 00 00 00 00 00                    ..5.......

Note the address (jruley) and the title (RE: E-Mail Security). The message body itself isn't visible (it's in a packed binary format in one of the follow-up packets), but the point is surely made: Network Monitor is a serious security risk.

In fact, it's not as bad as all that. Microsoft took steps with NT Network Monitor to make sure it stays under control. To wit:

  • No promiscuous mode support: Most sniffers employ a special network card operating mode in which the card becomes promiscuous: that is, captures any packet that goes over the wire, irrespective of whether it was intended for the card in question. Instead of doing this, Microsoft uses a new feature of the NDIS 4.0 driver specification to capture packets as they are sent from or received by your card. Thus, Network Monitor can capture only packets sent from or to one of your server's network cards.

  • Password protection: In addition to placing Network Monitor only on NT Servers, and running it from the Administrative Tools group (which isn't available to end users), you can set capture and display passwords through Control Panel/Monitoring Agent. This will ensure that even among Administrative users, only those who know the passwords can use Network Monitor.

  • No remote operation: NT 4.0 Workstations come with the same Network Monitor Agent as NT Server, but Network Monitor cannot connect to those agents. It can use them only for identification purposes (besides typing in capture and display passwords, Control Panel/Monitoring Agent lets you associate a text description with your network card(s), which can save puzzling out which Ethernet MAC address belongs to which workstation on the net). Network Monitor can be operated only locally. You must be physically present and logged in at the NT console to use it.

On the other hand, if you really want promiscuous mode support and remote operation, check out SMS Network Monitor, covered (briefly) in Chapter 8.

Disk Fragmentation

Since NT was first introduced, Microsoft's position on disk fragmentation has been as follows:

  1. Use NTFS because it doesn't require defragmenting.

  2. If you run a DOS-compatible FAT partition, boot DOS and use a DOS-based defragmenter (or an OS/2 defragmenter on HPFS partitions).

  3. As a last resort, back everything up on tape, reformat your disk, and restore the tape.

Of course, NTFS does in fact require defragmentation. We've seen a 2:1 or better15 performance improvement by defragmenting NTFS partitions, and other users (especially those running heavily used servers) report much the same results.

As it happens, this isn't the first time Dave Cutler and his ex-Digital crew have missed the boat on fragmentation. Cutler's VMS operating system had much the same problem on VAX computers, and Executive Software16 eventually filled the gap with a line of Diskeeper products. In 1994 they brought the same technology to NT with Diskeeper for Windows NT. Separate versions are available for NT Workstation and Server, and a "Light"17 version for NT 4.0 recently became available that's free for the download from Executive Software's Web site: https://www.execsoft.com.

Two caveats on Diskeeper: First, read the release notes before installing it. We periodically hear from users who've had files corrupted and blame Diskeeper. Invariably, they did not follow instructions. Diskeeper operates as a low-level adjunct to the NT file system drivers, and as such it cannot safely defragment files from applications (such as Oracle server) that bypass the file system and manipulate bits on the disk directly. This is documented in the release notes, and you can deal with it by adding the files in question to a list of files that Diskeeper will not touch. Second, be aware that in the past, each revision of NT (including not only major version number changes, but also service packs) has required a new version of Diskeeper. Again, Diskeeper acts as an adjunct to the NT file system—when Microsoft modifies kernel drivers that affect the file system, Diskeeper is affected in turn. With NT 4.0, Microsoft has provided "hook" APIs to the file system that should allow Diskeeper to work even if the file system changes. But to be on the safe side, check with Executive Software before applying an NT service pack or upgrade.

Undeleting Files

Currently, no file system undelete18 programs for NT are available. However, three approaches may retrieve an accidentally deleted file. First, if you're using the DOS-compatible FAT file system, you can boot your computer to DOS (using either NT multiboot or a DOS boot diskette) and use DOS undelete software (likewise, for NT versions prior to 4.0, you can undelete files on HPFS partitions by using an OS/2 boot diskette and OS/2 disk utilities).

Alternatively, if you are not using NTFS disk compression, try the DiskProbe application from the Windows NT 4.0 Resource Kit.19 Although this doesn't provide a simple undelete, it does provide a way to search the disk cluster by cluster for data on any partition type, including NTFS. If you find a cluster containing your data, you can copy it (and closely adjacent clusters) to a new file.

Finally, you can always restore a file from backup, provided you've been keeping regular backups.

Several vendors have expressed interest in providing a true undelete capability for NT, along with other disk maintenance tools. None is available at this writing. As and when such a utility ships, you can expect us to report it on our electronic update, at the location mentioned in the Introduction.

Resources

A variety of available command-line tools can be of help when you are trying to track down low-level protocol problems. Among these are the IP network utilities ping, arp, nbtstat, netstat, nslookup, tracert, and route (all covered in Chapter 6), and the IPX network utility ipxroute (Chapter 10).

The Windows NT Resource Kit (covered in Appendix 4) includes a very wide range of maintenance and support tools, including tools to back up and restore registry files, monitors for the browser and domain services, and even an upgrade for the single-CPU version of NT that adds multiprocessor support. (The hardware, needless to say, is not included!) These are covered in Appendix 4.

Microsoft Systems Management Server (SMS) provides a wide variety of troubleshooting and maintenance tools, including an enhanced version of Network Monitor that supports both promiscuous mode and remote operation (which makes it an even more effective network wiretap!), along with remote software installation/upgrade capability. With SMS version 1.2, Microsoft added remote control support for NT 3.51 and 4.0 as well. It's of interest primarily to larger sites and is covered in Chapter 8, where we also cover troubleshooting and maintenance of Microsoft's Remote Access Services (RAS) and other wide-area networking issues.

Finally, don't neglect the release notes that come with an NT distribution CD. Aside from whatever printed documentation you find, check the CD for .TXT and .WR* files. Currently, NT 4.0 setup copies README.WRI (general release notes) into \winnt\system32, while NETWORK.WRI (network card issues, and related material) and PRINTERS.WRI (printer issues) are copied into your \winnt directory.

NT Messages

Windows NT can produce a variety of messages during its normal operation along with a wide range of error messages. Prior to NT 4.0, you could expect all NT distribution CDs to include a Messages database in a run-time Microsoft Access format. Unfortunately, this does not seem to be included with either NT 4.0 Workstation or Server. We assume it will be included with the NT 4.0 Resource Kit—and in any case, all editions of that kit have included an extensive manual on NT Messages. We cover the Resource Kit in Appendix 4.

Character Mode, Stop, and Hardware Malfunction Messages The ultimate worst-case situation you have to deal with in Windows NT is the "blue screen crash." This happens when the Windows NT kernel encounters a completely unrecoverable error either in the kernel software or in hardware. The system will stop and display a screen similar to that illustrated in Figure 5.17.

In a "blue screen crash," the first line displayed on the screen will generally be of the form

*** STOP 0x000000nn DESCRIPTION

The 0x000...number is a unique hexadecimal identifier that identifies the STOP message number and will indicate the cause of the crash. The text immediately following it is a text description of the crash. This will be followed by a system trace, including an identification of the address areas in which the crash occurred, register dump, and a system call tree indicating the various functions that are in the tree of system calls above the function in which the crash occurred. They are of value only to a system developer or hardware support engineer, but if the same crash occurs repeatedly, it may be worth writing it down, in particular the first two or three lines of information on the screen so that the information can be presented when you call Tech Support.

The follow-up to a blue screen crash generally involves making a change to the hardware settings in the system, removing hardware devices from the system, or taking other relatively drastic steps. The list of troubleshooting problems and work-arounds in this chapter gives some suggestions for certain well-known errors (such as the 0x0000000A IRQL problem) but the general nature of this kind of crash is that it's serious.

Hardware Malfunctions If a low-level hardware problem occurs on a system at such a level that Windows NT kernel cannot handle it at all (technically, a non-maskable interrupt or NMI), you're likely to see a message beginning "Hardware malfunction..." and ending "...call your hardware vendor for support." And the message says it all—call the vendor.

Status and Warning Messages Status and warning messages will appear as a Windows alert and indicate some specific matter of concern for the system. They may simply indicate some piece of system information that is of general interest such as "Password too complex." They may warn of a problem with some components of the system such as a "Printer Out Of Paper" message. They may report a more serious problem such as the "Access Denied" message that indicates that an application has tried to do something for which it doesn't have the necessary security permissions. Consult the Resource Kit's NT Messages manual for details on the specific message.

Network Messages Errors that occur within the network components of Windows NT and Windows NT Server will be identified as network errors and will have a four-digit number associated with them. In addition to the messages database you can get a brief description of each error by typing net helpmsg and the message number; for instance:

D:\>net helpmsg 2102

The workstation driver is not installed.

EXPLANATION

Windows NT is not installed, or your configuration file is incorrect.

ACTION

Install Windows NT, or see your network administrator 
about possible problems with your configuration file.

But the explanation for this message in the NT Messages manual (or database) will be far more complete.

Online Troubleshooting Guides

Microsoft has a series of useful step-by-step guides to troubleshooting problems that are available from its Web site at https://www.microsoft.com/support (pick Windows NT Workstation or Server from the GO! List). Topics with online guides available include:20 Applications, Directory Replication, Fault Tolerance, Licensing, Remote Access, Support Resources, User Profiles, File Systems, Joining a Domain, Printing, Setup, and Trust Relationships. In addition, there is some good troubleshooting information available in NT Server Books Online. Check under Troubleshooting in the Index for a list of topics.

Service Packs

Microsoft periodically makes fixes and upgrades available for NT as service packs. These are numbered, and you will find from time to time that a given piece of software may require that one be installed. You can find out about the latest service packs on Microsoft's Web site at https://www.microsoft.com/support (pick Windows NT Workstation or Server from the GO! List).

Two caveats about service packs: Once you have installed one, you must reapply it any time you install new components from your original NT setup CD (the service pack may include updates for the new component(s) you've installed). And there have been problems with some service packs—after all, they update system software, and it's all but impossible for Microsoft to check the update against all possible hardware/software combinations. It's wise to check the NT support newsgroups to see what problems have been reported before applying a service pack.

Getting NT Tech Support

Even the best technicians get in over their heads from time to time and must call in the support engineers. Unfortunately, calling for NT technical support can be expensive. Microsoft provided 30 days of free support for installation problems with NT 3.1, but dropped all free support in NT 3.5. With NT 3.51, support was reintroduced, but only the first call was free (and then only if you called on a setup issue). With NT 4.0, support has been increased to two free calls (again, setup-related issues only). After that, you're expected to pay.

Currently, Microsoft's least expensive telephone support for NT is $195 per incident (what Microsoft calls "priority support"). Microsoft justifies this pricing by calling NT a "Business Systems" product rather than a personal product, but it seems excessive for NT Workstation. Fortunately, Unisys offers a low-cost21 support program for setup problems, and they cover all versions of NT on all platforms.

Larger organizations that want to purchase a support contract or one of Microsoft's higher-end "Premier" or "Select" support options may call Microsoft Product Support Services at (800) 426-9400. Microsoft can also refer you to a local "solution provider" if you prefer to deal with someone in your area.

Crash Recovery

Windows NT is a very reliable operating system, but it can crash because of errant drivers, hardware problems, or—rarely—undetected operating system (or application) bugs. Beginning with NT 3.5, you have some options for handling such crashes. The most important of these are the Recovery settings, controlled by the Control Panel/System object's Startup/Shutdown tab (see Figure 5.18).

In many cases, the most obvious of the recovery options is, of course, the one to automatically reboot after a crash. However, if the problem that caused the server to crash in the first place recurs, you can put your server into an infinite loop: reboot, crash, reboot, crash...

Obviously, you should enable the options to write a system event (and possibly to send an administrative alert) if you're enabling the automatic reboot feature. Enabling the memory dump feature can also help, though decoding it will most likely require cooperation from a Microsoft support engineer.

Incidentally, the crash recovery behavior of NT is controlled, like so much else, through the configuration registry. The HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \CrashControl key contains entries that match all the control panel settings. As we've said before, making these settings in Control Panel is preferable to making them directly in the registry, but if you're managing several servers on a LAN or WAN, you may find it simpler to set them using the registry editor.

DRWATSON and DRWATSON32

When applications crash in Windows NT, Dr. Watson will appear. This is a simplified run-time debugger application that (optionally) performs a crash dump and maintains a log of application errors. It is not of much immediate help, but if you find that an application is crashing repeatedly, having the logs available may help you (or more likely, the vendor) diagnose the problem. For more information, launch Dr. Watson manually by typing drwtsn32 at an NT command prompt, and press the Help button.

Making and Updating Boot and Emergency Repair Diskettes

You cannot create an old-fashioned, DOS-style boot into character mode for Windows NT, but you can nonetheless boot the operating system from a floppy22 (although the NT systems files will still have to load from the hard disk), which can be a lifesaver if the boot sector on your hard disk accidentally gets overwritten. You can do this by running winnt32.exe (from the /i386, /mips, /ppc, or /alpha directory of the distribution CD-ROM) with the /ox command-line switch. You will need three formatted floppy diskettes to hold all the files.

Of course, a good support person covers all the bases, so having both the boot diskettes and an emergency disk is a good idea. The latter is normally created during the NT setup process, but if you need to make one later (or update the data on the original, which is a good idea, especially after you install any software packages or modify the user database on an NT Server), you can use the RDISK utility provided with NT versions from 3.5 on (see Figure 5.19).

Warning:* *If the Emergency Disk was created during the installation process and never updated, it contains the original registry settings for the computer, which most likely will include only default user accounts. Restoring the registry from such a diskette will destroy any user accounts created after the installation. Because NTFS tracks directory permissions based on security access rights, it can make acessing data impossible as well.

With the boot diskettes and Emergency Disk available, it is possible to recover from most "soft" errors (corrupt files) that prevent NT from starting up normally. Insert the NT Startup disk in your A: drive and reboot the system. When NT Setup starts, it will give you an option to carry out a new installation or repair an existing installation. Select the latter option. You will then be asked to insert the Emergency Diskette, which is used to recover Registry data. You will also have the opportunity to copy system files from the CD.

On Intel-based systems, you also need to make a recovery diskette. This is simply a formatted diskette onto which you have copied the key system files that may need to be restored in order for NT to boot (note that you will need to clear the system/hidden/readonly attributes to copy these files):

  • NTLDR

  • NTDETECT.COM

  • NTBOOTDD.SYS (On systems with older SCSI drives—if the file doesn't exist, you don't need it.)

  • BOOT.INI

  • BOOTSECT.DOS (If you have a dual-boot Setup)

On RISC-based systems, the needed files are OSLOADER.EXE and HAL.DLL. Use the ARCS menu to create an additional boot option with the following values:

  • OSLOADER= SCSI(0)DISK(0)FDISK(0)\OSLOADER.EXE

  • SYSTEMPARTITION=SCSI(0)DISK(0)FDISK(0 or 1, depending on whether you want to boot from the first or second floppy drive)

  • OSLOADPARTITION and OSLOADFILENAME to the same values they're set for in your regular boot menu

Troubleshooting Hit List

In any system as complex as Windows NT a broad range of errors and problems can occur. As we note in Appendix 6, the potential for errors increases enormously when the system is networked. So it's impossible for us to present a comprehensive list of the errors you are likely to encounter and directions for fixing them. However, certain errors are more likely to occur than others, so we present some of the most frequently encountered errors with suggestions about how to troubleshoot them and fix them.23 We've arranged them by general category.

Failure to Boot

In general, when a Windows NT system that has otherwise operated correctly suddenly refuses to boot (or recover from a reboot), you have to expect that one of two things has happened. Either there has been a major hardware failure or something has changed in the configuration. Major hardware failures or boot problems that occur when a system is first created are generally of the type that we covered in the Troubleshooting section of Chapter 2, and we urge you to look there. The following are possible explanations for the problem:

  • Misconfigured System. It's worth remembering that many problems that appear to be due to boot failure can actually reflect misconfiguration. For instance, if you change the video type in Windows NT Setup to one that's not compatible with your particular hardware, you may have a completely successful boot (NT is still running) but find yourself looking at a blank screen. So the best initial step to take with any boot problem is to try selecting the previous configuration from the Last Known Good configuration menu, or if that doesn't work, try using the Windows NT Emergency Diskette. Get the boot diskette originally supplied with Windows NT, insert it in drive A, reboot the computer, and when it asks whether you want to do an installation or attempt a repair, select repair and insert the emergency disk. The odds are good that this will allow the system to "heal itself." But if that doesn't work, some other boot problems that may occur include those listed below.

  • Unrecognized Partition Types and BOOT.INI. When you install Windows NT on a system in which an unusual partitioning scheme is used or a partition type is presented that Windows NT does not recognize, it is possible for Windows NT to install but for the system partition to be incorrectly identified. The boot subsystem may assume that system files are on partition 0 when they are in fact on partition 1, for instance. Inspect the BOOT.INI file to make sure that it refers to the correct partition or logical disk drive and directory. You may also want to check and examine the arc system formatted syntax for the initial partition location. This will be in a format like:

SCSI(0)DISK(0)RDISK(0)PARTITION(1)\WINDOWS="Windows NT". MULTI(0)DISK(0)RDISK(0)PARTITION(1)\WINDOWS="Windows NT".

(The format here is BUS(*number*), where the bus can be SCSI or AT-bus, the latter represented by MULTI, the disk controller number, represented by DISK, the disk itself, represented by RDISK where R stands for Rigit, and the PARTITION.)

PARTITION(1) is most likely to be the cause of a problem here, although on some machines the controller, represented by DISK(0) could be the cause of the problem, as noted in the section on installation. Try changing the partition number to 2 in this case (PARTITION(0) refers to the entire unpartitioned physical disk) or to another partition number depending on the contents of your partition table (which you can examine using the *fdisk* program on DOS machines).
  • Boot NTLDR Not Found. If for any reason the NTLDR file is deleted from the root of the C drive, Windows NT will be unable to boot. You can cure this situation by using the NT boot disks with the NT Setup repair option and copying NTLDR back on the hard disk from the recovery diskette.

  • NTDETECT.COM Deleted. When Windows NT starts on x86 systems, it employs the NTDETECT.COM program to detect the hardware configuration on the system—which updates the hardware information in the Configuration Registry and begins to carry out the boot process. This insulates Windows NT from configuration errors that may occur when someone changes a hardware component. However, it also means that if NTDETECT.COM is deleted, the system will fail to boot, generally failing with a fatal general system error of 0x00000067-Configuration Initialization Failed. This can also indicate that an error has been introduced into the BOOT.INI file (an indicator for this is if an additional line appears in the BOOT.INI file besides those for NT and for any alternate operating systems that existed when NT was first installed). So check the BOOT.INI file as well. If the BOOT.INI file is found to be correct, you will need to restore the NTDETECT.COM file from the installation CD or recovery diskette.

  • Problems in the OS Loader. If the BOOT.INI is sufficiently correct for the OS to start loading but then presents a bad path, it's possible that the OS Loader blue screen will start but will then fail with one of these errors:

    Could not read from the selected boot disk.

    The system did not load because it could not find the following file:...

    Either of these errors, again, indicates a problem with the BOOT.INI file, and as with NTLDR and NTDETECT.COM, the solution is to restore it from your recovery diskette.

  • Failure to Boot Back to a Previous Operating System. On systems in which a dual boot is installed (NT+DOS, Windows 95, OS/2, etc.), NT uses a hidden file called BOOTSECT.DOS to store information about the physical layout of the hard disk so that the system can boot back into DOS (or other operating systems) from Windows NT. If this file is inadvertently deleted or cannot be found during an attempt to boot to an alternate operating system, the boot process will fail with the message: "Couldn't open boot sector file." Once again, to solve the problem, restore the file from the copy on your recovery diskettes.

  • OS/2 Boot Manager Problems. The Boot Manager that IBM supplies with OS/2 versions 2.0 and 2.1, attempts to perform very much the same functions that the Windows NT Flexboot performs. Unfortunately, each tends to compete with the other to a certain extent, so it's possible that a system that has been set up with the OS/2 Boot Manager will fail to operate properly after the Windows NT Flexboot has been installed. You can get around this problem by booting OS/2 from the installation disk, pressing escape at the first opportunity to get to the OS/2 command line, bringing up the OS/2 fdisk, and reinstalling Boot Manager, adding entries for each of the bootable partitions in the system.

    When Boot Manager is installed after the Windows NT Flexboot, it generally seems to operate correctly. OS/2 Boot Manager will give you the option either to boot DOS or OS/2—there won't be any mention of Windows NT, but if you boot to DOS, you will get the Windows NT Flexboot—giving you the option to use Windows NT or DOS. Another option is to avoid the use of the OS/2 Boot Manager entirely and instead use the OS/2 Dual Boot feature in conjunction with Windows NT, although this method does not give the same flexibility in terms of booting from multiple partitions on the disk.

    A related common problem with the OS/2 Boot Manager is that the Boot Manager and Windows NT Flexboot may disagree on which drive letters represent which partitions in the system. The simplest solution is to install the OS/2 Boot Manager in the last partition on the drive and put Windows NT on the primary (first) partition at the start of the drive. Both systems will then agree on the drive letter assignments for all partitions (unless, of course, the "sticky drive letter" feature of Windows NT has been used to modify the drive letters used with Windows NT).

If you've tried all of the above and NT still won't start, it's time to call tech support, but to save yourself time (and possibly money), check the online troubleshooting guides mentioned earlier. Then check the "Before You Call…" topics in README.WRI. Then call technical support.

CPU Problems

Generally, a problem with the Central Processor Unit in a Window NT system will be detected during the installation process and the system will fail to install properly. Again, there are a few things to watch out for. The first—which again, is an installation problem—is to make sure you are installing on a CPU that supports NT. Windows NT requires a 25MHz 386 or higher processor. Note that for the 386 processor it does not support version B1 and earlier 386 chips. If you have such a chip you'll need a CPU upgrade. Also look for the following:

  • Machine Check Exception on Pentium Chips. Windows NT machines equipped with early Intel Pentium (P5) CPUs may experience a machine check exception fault during operation, particularly if they have been in heavy use over an extended period of time. A machine check exception on the Pentium processor chip is an indication that the processor self-test hardware has detected an internal fault. It most commonly indicates an overheat condition. This is not unknown on early model Pentium CPUs, and it generally indicates a cooling problem in the system. The first solution, of course, is to turn the computer off and let it cool down. If the problem happens repeatedly, you may want to open up the case and make sure any on-chip cooling fan is operating and make sure that there isn't any obstruction in the airflow, and consider moving the system so that the airflow holes are not being obstructed by walls, desks, or other obstructions. Finally, contact your system manufacturer to see about some kind of an upgrade.

  • Poor CPU Performance. This is a topic that really refers to the tuning section earlier in this chapter. If the computer is running but seems to be dead slow and the processor appears bogged with tasks that should not bog it, you may want to check to see first if the "turbo switch" (if any) is depressed. Second, you may need to reboot the computer and examine the CMOS register settings to see if the computer is set for one or more memory wait states. A computer operating in a one-wait state condition effectively is operating at a half the stated CPU clock rate, because after every clock cycle involving a memory access it will idle or "wait" one cycle to give memory a chance to stabilize. In the event that your system is using one or more wait states, try resetting to a zero-wait state condition. If the computer refuses to run, your memory is physically incapable of operating at the processor full speed, and the solution is to buy faster memory chips. Beyond that, refer to the section on Performance Tuning earlier in this chapter for suggestions on how overall system throughput may be increased.

COM Port Problems

Problems with serial (COM) ports will generally be due to one of the following mistakes:

  • Attempting to use one port for two applications. Aside from the usual problems with improperly matched baud rate, parity, stop bits, and so forth, between an application and the device attached to a COM port, Windows NT presents one new class of problem. It absolutely, positively will not let you assign a COM port to another application or device when one is already using it. You can see this by looking at Control Panel/Ports. If you have a mouse installed on COM1 port for instance, the COM1 port will not appear in the Control Panel listing even though it does exist in the system. The reason is that Windows NT has assigned the COM port permanently to the mouse, and it will not allow that port to be used by any other application or service until and unless the mouse releases it.

    You can determine which ports are assigned to devices in this way by inspecting the HKEY_LOCAL_MACHINE/HARDWARE/DESCRIPTION/System/Multifunction Adapter/0/Serial Controller entry (this may say EISA adapter instead of multifunction adapter on EISA machines, etc.), as illustrated in Figure 5.20.

    The ports will be stored in a sub-key numbered from zero through one less than the number of COM ports. Zero through three respectively, for instance, represent COM1 through COM4. The device using the COM port will appear within the numbered sub-key for the COM port in question. If no hardware device is using the COM port, the next thing to look for is the possibility that you have some application or service using the port.

    An example of this would be if COM1 is physically attached to a modem and you attempt to use COM1 from a communications program at the same time that Remote Access Services (RASs) are running bound to COM1 through the network's Control Panel. Windows NT won't let you assign the port to the communications program. The solution is to stop RAS (or the other service in question) using the Services applet in the Control Panel while you use the communications program, then close the communications program (or select another COM port temporarily) and start RAS again.

  • Incompatible Hardware. Most other COM port problems will be improper matches between the COM port settings and the external device as mentioned above, or in rare cases, you may run into a COM port using a universal asynchronous receiver transmitter (UART) chip that is incompatible with Windows NT. The way to test for this is to attach a known good serial device (such as a dumb terminal or another computer running a terminal program) using a null modem cable to the port, run the Windows terminal, select identical baud rates, parity settings, and word lengths on both ends of the connection, and then try typing on the Windows NT system's keyboard. If only one or two characters appear on the other screen and the port appears to hang up and refuse to transmit, you need a new UART chip. Machines known to have this problem include several models of DEC machines in the 300 and 400C series.

    Some systems with 16550 UART chips may be incompatible with NT's support for a FIFO buffer. If you have a 16550 and are experiencing COM port problems (you may note an event log entry saying, "A FIFO was detected and enabled"), try disabling it with the Ports control panel applet (select the port in question, click Settings, then Advanced, and you'll find a FIFO checkbox).

  • COM3, COM4,—COMn Problem. On machines that don't include a Micro-Channel Adapter (MCA) bus—virtually all machines except IBM PS/2 computers—COM3 and COM4 support is provided by sharing the same interrupt as COM1 and COM2 with two different port addresses. That is, COM3 has the same interrupt number as COM1 but is at a different physical port address. COM4 is at the same interrupt level as COM2, but at a different port. This works fine until you try to use both COM1 and COM3 (or COM2 and COM4) at the same time. Windows NT supports interrupt sharing by the two sets of ports but it cannot and will not permit devices to use the ports at the same time. As a result, you may find it impossible, for example, to attach modems to COM1 and COM3 and get two programs (for example, Remote Access Services and a terminal program or Microsoft Mail Remote) to work on both ports simultaneously. You can have one or the other, but not both.

  • Interrupt Conflicts. Just as Windows NT is intolerant of multiple applications or devices trying to share the same COM port, it is exceedingly intolerant of devices attempting to share an interrupt. The usual indication of an interrupt problem is the refusal of Windows NT to boot (on rare occasions, it can crash after booting correctly during an attempt to perform a network login). The major symptom will be the Windows NT "blue screen" displaying error number 0x000000A: IRQ Expected To Be Less Than Or Equal. This indicates that two hardware devices in the system are set for the same interrupt level. It most probably will happen just after you've installed a network card or other physical device.

    Remove the card most recently installed and reboot the computer. Examine the hardware manufacturer's settings for the device and attempt to find an interrupt level that is not used by other devices. A common cause of this problem is interrupt cards that are predefined at IRQ3, the interrupt used by COM2 and COM4. Therefore, if you have a second COM port in your machine IRQ3 is automatically disallowed. Common interrupts in most systems include:

    • IRQ0 (timer)

    • IRQ1 (keyboard)

    • IRQ3 (COM2 and COM4)

    • IRQ4 (COM1 and COM3)

    • IRQ5 (LPT2)

    • IRQ6 (floppy controllers)

    • IR7 (line printer one)

    • IRQ8 (system clock)

    • IRQ13 (math coprocessor)

    • IRQ14 (hard disk controller)

    • IRQ15 (secondary disk controller)

    You will need to select an interrupt number not used by any of these devices installed in your system.

    Note: After making the change and restarting NT, start NT diagnostics, and check to see that the driver in question is actually using the IRQ you think it is!

Malfunctioning Disk Drives

See the section on installation problems in Chapter 2 for information on the most common hard disk problems. Aside from the ones covered there, the problem that most frequently arises is failure to terminate a SCSI chain. Make sure that the last device in the chain is terminated and that there is terminating power. Failing this, if disk drives are misbehaving on Windows NT when they have been installed correctly and have been behaving themselves until now, check the BOOT.INI. Try reverting the configuration. Try using the emergency diskette. If none of that has any effect, you probably have a disk hardware problem and need to employ conventional hardware troubleshooting techniques (swap disk controllers, then disk drives) to isolate the bad component—then call your hardware vendor.

CD-ROM Problems

The most frequent CD-ROM difficulty with Windows NT is adding a CD-ROM into an installation that initially did not have a CD-ROM. Making Windows NT recognize the CD-ROM is fairly straightforward: from Control Panel/Devices, select the SCSI CD-ROM object and set the startup value to Automatic so that the service will start when the system boots. You may want to set the CD Audio entry to Automatic as well (for certain CD-ROMs, this may be required). Then shut down and restart NT. Other problems include the following:

  • CD-ROM Impacting Windows NT Performance. Certain CD-ROM players, specifically those including the NEC Intersect series players, may have a dramatic impact on Windows NT performance when the CD-ROM is playing. This will occur because of the setting of a jumper switch on the CD-ROM reader that disables disconnects during read operations. Disk-read operations on CD-ROMs are very slow, and if a disconnect is not available, no other device has access to the SCSI interface card until the disk read is finished. Consult the hardware documentation for your CD-ROM reader and reset the jumper switch as necessary to enable disk connects during read operations.

  • Failure to Recognize Data on a CD. Windows NT supports the ISO9660 CD-ROM format but does not support any format extensions. A series of extensions known as the Rock Ridge CD-ROM format provides additional features that are used by CD-ROMs for some systems, in particular, UNIX systems that require long filenames and a complex directory structure and, unfortunately, the Macintosh Heirarchical File System (HFS) format. Windows NT's CD file system does not recognize these formats.

Printing Problems

Windows NT suffers from one unique set of printing problems in common with its COM port problems, which again arises because only one device can own an interrupt. A number of sound cards, including the SoundBlaster Pro card, by default use interrupt 7, the same interrupt that is typically used by the Line Printer 1 port. If Windows NT refuses to recognize a printer attached to LPT1, start a command line prompt, and type:

mode LPT1:

If you see the message "Device Not Found," IRQ7 is being subverted by another hardware device. You can check this using NT Diagnostics: select the Resources tab, press the IRQ button, and look for IRQ 7. It's normally invisible until you click the Include HAL Resources check-box, because the NT Hardware Abstraction Layer uses it. If you find it has been taken by another device, you must remove the offending device, change the settings on the device, or otherwise make an adaptation so that the interrupt conflict is eliminated. Other problems could be the following:

  • Cross-Platform Network Printing. If RISC and Intel versions of Windows NT are mixed on a network, the usual Windows NT print driver approach in which the remote printer takes advantage of the print driver installed in the print server will fail because a MIPS RISC machine, for example, has no use for an Intel print driver. The indication will be an error message, when you attempt to connect to the printer, saying that the server does not have a suitable print driver installed. You then have the option to make a temporary print driver installation on the local machine or install print drivers for the other types on the print server.

    For instance, if the print server is a RISC machine, you could install the Intel print driver. Alternatively if the print server is an Intel machine, you could install one or more RISC drivers, as described in Chapter 2.

Network Problems

Difficulties involving the network could involve the following:

  • Disconnection—The most common symptom of a network card problem is that the user is unable to connect to the network. The most common cause is that the network cable is not plugged in to the card. So, the first thing to do if you suspect a network card error is to check the connection between the network cable and the computer and then the connection between the network cable and the wall. If it's a 10base2 (coax) Ethernet connection, make sure that the chain of connections isn't broken. The cable may be plugged in on the computer of the user who is reporting a problem, but it may be unplugged further down the line. Of course, this will usually be easy to spot. If such a break in a 10base2 cable exists, all users on that side of the break will be disconnected, not just one. But that good first step is to be sure everything's connected. The next step is to run the Windows NT Event Manager and see if it's reporting any network errors.

  • Misconfigured network card—If the network connection appears to be good and the other systems on the subnet are up, check to see whether you have a hardware or software error. The easiest way to do this is with PING (on TCP/IP networks) or the NETSEND24 command (on NetBIOS networks). (Use of PING is covered the next boldface heading.)

    In either case, you will want to determine whether the computer is in fact talking to the network at all. From this you can tell if you have a software problem with misconfigured networking software or a hardware problem in which the network is not working at all. In our experience the net send command is convenient for this because it operates at a very low level on the system. You can reliably expect a net send command to tell you if the network is properly installed. If the network is installed and network communications exist, but the computer is not being logged into the network properly, the net send will still reach the designated target system. For instance:

net send mips1 Can anyone hear me?

will print the message "Can anyone hear me?" in a pop-up window on the mips1 workstations or server. A second possibility is that net send will not give an indication on the target but will return with the message "The message was successfully sent to MIPS1." In this case, the low-level Windows NT software, driver, and transporter are all working properly—they are getting proper indications from the card—but for some reason the transmission is not getting out on the network. This indicates that the network cable is bad, and the signal is being blocked somewhere outside the computer.
  • TCP/IP Misconfiguration—If DHCP is not in use, inability to "see" hosts on TCP/IP networks may indicate that the HOSTS or LMHOSTS database files (described in Chapter 6) contain bad information. Try accessing a local host (or router) using the TCP/IP "ping" utility, using the four-number IP address of the host (or router) in question. The syntax of the command is ping <ip-address>, as in the following example for a node with address 127.119.13.213:

ping 127.119.13.213

Do *not* use a ping to a DNS or HOSTS name (at least, not at first) because this may not be definitive. The command:

<pre IsFakePre="true" xmlns="https://www.w3.org/1999/xhtml">

ping vax.cmp.com

will evaluate to the same command as ping 127.119.13.213 if and *only* if the vax.cmp.com DNS name properly evaluates to 127.119.12.213. By contrast, pinging "by the numbers" is an absolute—if it gives you no response, there is a very deep configuration problem.

If a "by the numbers" ping gives a response, try a ping to the name. If that doesn't work, check the Name Resolution settings in Control Panel/Networks TCP/IP Configuration to see whether DNS, WINS, or HOSTS naming is in use, and then check the status of the DNS server, WINS server (or HOSTS file) as appropriate.

If both pings work, but you still can't "see" the system in question using the built-in Windows NT networking, check the WINS and LMHOSTS file settings and the settings of any intervening routers. A useful diagnostic for systems that use Windows NT at each end may be to run the FTP Server Service on one end and attempt FTP client access from the other. If that works, the low-level linkage (and router, if any) are properly set up, and the problem *must* lie with Windows NT name resolution. See the troubleshooting section in Chapter 6 (particularly IPCONFIG and NBTSTAT).
  • *Hardware (interrupt) Problems—*It is quite common to experience network problems on Windows NT machines if the network card is set to interrupt level 3. Normally, Interrupt 3 is used by the COM2 port and since Windows NT does not permit interrupt sharing if a network card is designated to use Interrupt 3, there are two possibilities: One is that you will see the infamous blue screen when NT boots up with "Error 0x0000000A—IRQ expected to be less than or equal." This is the most severe version (the other case is that NT starts, but the network refuses to run). In either case, take the network card out and reset it to a new IRQ setting. You'll also have to change the IRQ setting for the card in question in Control Panel/Networks.

    It's possible that the computer will boot but the network card will refuse to function. In this case, again, you need to shut down the computer, take out the card, change the settings on the card, bring up the computer, change the settings on the Network Control Panel applet, and then shut down and restart Windows NT, and it should work. If it's not an interrupt problem and the network cables are believed to be good, you need to begin troubleshooting procedures to determine whether you, in fact, have any connectivity to the network card and try to determine where the break is occurring. You can use the PING application on TCP/IP networks or Net Send on NetBEUI and other SMB networks.

    On rare occasions, there are network cards with programmable interrupt and I/O settings in which the low-level network software can see what appears to be a perfectly good network connection, yet will not work initially. It may be worth trying a warm boot by shutting down Windows NT and selecting the Restart When Shutdown Is Complete option, and then try Net Send again. If it operates correctly after the reboot, you have a network card that requires two passes to set the software configuration. You may want to consider reconfiguring the card with a hardware configuration (if that's possible), or you may need to tell the user that when he starts up in the morning, he needs to do a warm boot before he can expect to see his network.

    If Net Send reports that the message is not being sent because of a network problem, this invariably indicates an error in the binding of the low-level network software to the network card. This should be accompanied by an entry in the System Event Log. (You did check the log, didn't you?) In any case, the problem is a low-level one. It indicates that, for whatever reason, the software is not recognizing the card. This may mean that you're using the wrong driver for the particular network card you have, or that the network card may be misconfigured. Take a close look at the network card to verify that the network settings match the settings in the Network Control Panel, and cross-check with NT diagnostics to verify that any resources such as IRQs and ports are in fact "owned" by the network adapter card driver (if not, you have a conflict!). Verify that you are using the correct driver, and try again.

Sound Card Problems

As with network cards, the most usual symptom for a soundboard problem is the user reporting that no sound comes out of the speakers, and as with network cards, the first thing to do is see that there is a speaker plugged in, that the speaker has power, that the speaker volume is turned up, and that in all other respects you have a situation in which sound should be coming from the computer. If it is not then you may want to look at the following:

  • Is the sound driver installed? This may sound simple-minded, but Windows NT does not install sound drivers during installation by default, so you will very likely will have to install a sound driver for each system. You do this through the Control Panel Sound Driver's applet. Make sure that you are using the right driver for the right sound card. In particular, with Creative Labs SoundBlaster cards, you must be careful because there are several different versions of the SoundBlaster, and the drivers are not interchangeable. For example, the driver for a SoundBlaster Pro will not work with a SoundBlaster version 1.

  • Do you have an interrupt conflict? Check the interrupt and port settings that are set in Control Panel/Multimedia's Devices tab (select your device and click the Properties button). Make sure those settings match any switches on the audio card. Use NT Diagnostics' Resources tab to verify that the interrupt and port settings are not in fact used by another device. Note that the original SoundBlaster uses IRQ7 by default. This is also the setting for LPT1, and as noted elsewhere, Windows NT does not tolerate interrupt overloading, so it is likely that if you've installed a SoundBlaster card and it refuses to work, you'll have to change the interrupt. If you can play .WAV files (an easy way to check this is with the Control Panel Sound applet setting system sounds on and using the test button) but you can't play .MID (MIDI) files, you may need to install the ad-lib midi driver. Because most sound boards have two independent audio chips on them, one for midi synthesis and one for wave audio, two drivers are typically required.

  • If you are using a Windows sound system and have upgraded from Windows 3.1 or Windows for Workgroups, you may see the message "SOUND.CPL is not a valid Windows NT Image" and find that the Control Panel is not working properly. That's because the SOUND.CPL file installed by Windows is incompatible with Windows NT.

  • As we noted in the section on CD-ROMs and SCSI, a number of sound card manufacturers incorporate a proprietary CD-ROM interface on the sound card. Windows NT supports most of the common ones, with either built-in drivers or ones from the installation's CD's DRIVERS library. See Chapter 2 for details.

Video Problems

The most common video problem arises when a user changes the video settings using Windows NT Setup to try to get a higher resolution and is suddenly presented with an image that is either grossly unstable or completely blank. The solution in either case is the same. Restart Windows NT going through the shutdown procedure if you can. (This is one case in which pressing the reset switch may be your only option.) When Windows NT starts, it will start with the character mode startup that ordinarily will survive a change in video resolution and will present you with a "Press Escape for Last Known Good Menu" option. Immediately hit the Escape key, and select Last Known Good Configuration. If the user has not repeatedly modified the installation (which is almost impossible with a video problem), this will get you back to the working video.

Another common problem with video drivers occurs when a user installs a new video board without resetting the driver, in which case, the only response is to use the VGA mode boot option and then install the proper driver.

Finally, remember that Microsoft changed the video driver model in NT 4.0. Past versions of NT supported "downlevel" video drivers with reduced functionality; but in 4.0 you cannot use an older driver, period.

Conclusion

We've reviewed the basic principles of preventive maintenance (PM—covered in detail in Appendix 6), examined the steps necessary for performance tuning in a Windows NT system, reviewed the tools used for tuning and troubleshooting—including those that are new or updated for NT 4.0—and presented a list of the most likely problems and their solutions. With this information at your disposal, you'll have a good idea of how to proceed when you're presented (inevitably) with your first Windows NT system crash. However, we reiterate that it's far better to apply PM principles and avoid the crash altogether!

For More Information

Microsoft Staff (1996), Windows NT Server 4.0 Concepts and Planning. Redmond, WA: Microsoft Corp. Covers many aspects of installation and operation.

Microsoft Staff (1996), Windows NT Server 4.0 Network Supplement. Redmond, WA: Microsoft Corp. Detailed information on network operation, including troubleshooting.

Microsoft Staff, TechNet CD. Redmond, WA: Microsoft Product Support Services (PSS). TechNet is a monthly publication on CD-ROM containing a digest of topics from the Microsoft Knowledge Base, the Net News publication, Resource Kits, and other information. TechNet is available from Microsoft sales. A one-year subscription (12 CDs) costs $295 and is worth every penny.

Microsoft Staff (1993–96), Windows NT Resource Kit. Redmond, WA: Microsoft Corp. The only source for detailed information on the Windows NT configuration registry and the best source of information on topics like performance monitor counters.

Microsoft Staff (1995), Windows NT Training, Redmond, WA: Microsoft Corp. This is a two-volume set with a video and diskettes, covering Windows NT support and troubleshooting issues. It's marketed as a self-paced training guide for professionals studying to take the Microsoft Certified Professional (MCP) examinations.

Cc767116.fig5x1(en-us,TechNet.10).gif

Figure 5.1: Registry editor. NT's configuration registry editor (REGEDT32.EXE) provides an interface to the registry—a redundant database of configuration information for the system, software, and users.

Cc767116.fig5x2(en-us,TechNet.10).gif

Figure 5.2: Performance monitor. NT Performance Monitor gives administrators and support personnel the ability to observe, monitor, and record data on a wide variety of system (and application software) components.

Cc767116.fig5x3(en-us,TechNet.10).gif

Figure 5.3: Performance Monitor, working set. Performance Monitor can be very useful in diagnosing memory hogs. Now that NTBOMB has been identified as the errant process, it can be shut down.

Cc767116.fig5x4(en-us,TechNet.10).gif

Figure 5.4: Virtual memory. NT's Paging File and Registry settings are adjusted in the Control Panel.

Cc767116.fig5x5(en-us,TechNet.10).gif

Figure 5.5: Server object in control panel/network. The Server Configuration dialog allows you to control the memory optimization settings of NT's built-in network services. You reach this dialog from the Control Panel/Network Settings, by selecting the Server object and clicking the Configure... button.

Cc767116.fig5x6(en-us,TechNet.10).gif

Figure 5.6: Counter definition. Performance Monitor counters have definition information associated with them, which can be displayed by clicking the Explain>> button on the Add to Chart dialog.

Cc767116.fig5x7(en-us,TechNet.10).gif

Figure 5.7: Long-term log. Performance Monitor can be used to log counter values over an extended period—in this case 24 hours of server operation.

Cc767116.fig5x8(en-us,TechNet.10).gif

Figure 5.8: Windows NT diagnostics. NT includes a Windows-based system diagnostics application as a standard component. In NT 4.0 this tool can be used remotely as well as locally.

Cc767116.fig5x9(en-us,TechNet.10).gif

Figure 5.9: Drive properties. Selecting properties for a disk drive displays information about the drive, including its capacity and free space.

Cc767116.fig5x10(en-us,TechNet.10).gif

Figure 5.10: Service properties. Selecting properties for a service displays information about the service, including its startup type, security account, and service flags.

Cc767116.fig5x11(en-us,TechNet.10).gif

Figure 5.11: Resource properties. Selecting properties for a system resource (such as an IRQ) displays information about that resource, including the owning device driver, the interrupt vector, and whether the resource is shared.

Cc767116.fig5x12(en-us,TechNet.10).gif

Figure 5.12: Task Manager—Applications. By default, NT Task Manager shows a list of applications running in the system. Note that the list contains only nine entries, despite the fact that some 30 processes are running.

Cc767116.fig5x13(en-us,TechNet.10).gif

Figure 5.13: Task Manager—Processes. Selecting the Properties tab gives you a more detailed view of exactly what's going on in the system, including memory and CPU use broken down by a per-process basis.

Cc767116.fig5x14(en-us,TechNet.10).gif

Figure 5.14: Network Monitor—Capture window. The Capture Window, initially displayed empty when Network Monitor starts, is the top-level display from which capture statistics are available.

Cc767116.fig5x15(en-us,TechNet.10).gif

Figure 5.15: Capture filter. You can limit what data Network Monitor will capture through the use of a Capture Filter. This allows you to limit capture based on packet type, address, or even a particular text string or other byte pattern.

Cc767116.fig5x16(en-us,TechNet.10).gif

Figure 5.16: Capture detail. Once data is available, the detail view allows you to analyze it in depth, including what transport protocol was used, what kind of packet was captured, and the actual packet data is displayed in both Hex and ASCII (text) formats.

Cc767116.fig5x17(en-us,TechNet.10).gif

Figure 5.17: Blue screen crash. You should never see this display from NT under normal circumstances—if you do, then the system has become completely unstable, and will require a hardware reboot. The *** STOP 0x000000... message will identify the type of error involved, and it is followed by a register dump that can be helpful in identifying what's gone wrong with the system.

Cc767116.fig5x18(en-us,TechNet.10).gif

Figure 5.18: Recovery. NT 4.0 provides recovery options that may be used to control how an NT system behaves during and after a system crash. These options are set using the Startup/Shutdown tab in Control Panel/System.

Cc767116.fig5x19(en-us,TechNet.10).gif

Figure 5.19: Repair disk utility. NT's RDISK utility allows you to create (or update) an NT emergency repair diskette. Used in conjunction with a book diskette, this allows recovery from a variety of serious system errors.

Cc767116.fig5x20(en-us,TechNet.10).gif

Figure 5.20: Registry editor. NT's system registry database is edited using the Registry Editor (REGEDT32.EXE). It allows registry "hive" files to be loaded, viewed, and modified. The particular registry "key" shown here displays information about COM port utilization on the system.

1 Sometimes even the Emergency Disk won't help. We recommend using the Resource Kit's REGBACK and REGREST utilities (see Appendix 4) to keep separate copies of registry data in a nice safe place. You'll never know how much you need it until it's way too late.

2 You can test for the presence of an FPU and profile its performance using WINDOWS Magazine's WINTUNE benchmark, available for download from https://www.winmag.com.

3 Or (on NT 3.51 and later systems) you may have a system that has a floating-point unit, but has been configured to emulate floating-point operation (e.g., an older model Intel Pentium chip, in which the FPU has been disabled because of the infamous divide flaw). You can check and change the emulation mode with the PENTNT command, covered in Appendix 5. Alternatively, you can edit the relevant registry entry: HKEY_LOCAL_MACHINE \System \CurrentControlSet \Control \SessionManager \ForceNpxEmulation. This is a REG_DWORD that accepts values of 0 (hardware floating-point), 1 (Pentium-only, may emulate FP divide instructions if a defective Pentium CPU is installed), and 2 (emulates all floating-point instructions).

4 In the OS/2 1.x environment, rebooting servers nightly was a common practice because of a system-wide memory fragmentation problem. NT has no such problem, so although it may be necessary to shut down an ill-behaved application, it should never be necessary to reboot the computer.

5 You can reboot systems remotely using the NT Resource Kit's SHUTCMD.EXE utility. See Appendix 4.

6 In the Resource Kit. See Chapter 4, "Optimizing Windows NT."

7 Except in the Windows NT Resource Kit. See Appendix 4 for details.

8 For details, see the Windows NT Server Concepts and Planning Guide (included in NT Server Books Online) or Windows NT Workstation Resource Kit.

9 It was four keys in NT 3.x: HKEY_CURRENT_CONFIG, which stores data for the current hardware profile, is new for NT 4.0.

10 As with HKEY_CURRENT_CONFIG, the Hardware Profiles sub-key is new for NT 4.0.

11 NT 3.1 had a fixed maximum registry size of 8MB. NT 3.5 allowed the registry to be resized, but the ability to monitor the registry and set an alert if the maximum size was approached only appeared with NT 4.0.

12 Doing so, however, can sometimes be frustrating. One author operates an obsolete NCR 486/33 as a combination PDC and router, the latter requiring two network adapters. When this system was configured to add a second SCSI card to support a backup device (IOMEGA ZIPdrive), the resulting conflicts among the network cards, SCSI cards, and motherboard devices eventually required replacing one network card with a different model. It did finally work, and it was WINMSD that solved the problem!

13 Once again, the only place to find that information seems to be the NT Resource Kit.

14 Microsoft's Internet Explorer is a persistent example, in our experience.

15 On the order of 5:1 for a heavily fragmented NTFS partition. See the "Windows NT" column in the August 1995 WINDOWS Magazine for details.

16 You can reach the company at (818) 829-6468.

17 Diskeeper Light differs from the full-up versions in that it implements a single-pass defragmentation scheme rather than automatic defragmentation in the background. The latter is a much better choice, especially on file servers.

18 There is, of course, the Recycle Bin on the NT Desktop, but it stores only files that are deleted using the Windows Explorer. Files deleted from the command line or under program control are, in a word, gone!

19 Covered in Appendix 4.

20 This list is current as of August 1996.

21 Just $30 per incident when we last checked. Call (800) 328-0440.

22 On Intel systems only. On RISC systems, use the ARCS menu to execute NT setup directly from the CD, as described in Chapter 2.

23 Information in this section is from a variety of sources, including the Microsoft on-line Knowlege Base (go MSKB on Compuserve), the Microsoft TechNet CD-ROM, reports from Windows NT users, and our own experience with Windows NT over the past few years. We can't claim to have personally experienced every problem (or tested every fix) reported here, but we've had quite a few!

24 In NT 3.1, it was possible to use net send /BROADCAST text without designating a target. This functionality has been removed from NT 3.5 and later versions. Assuming that you know the name of any one machine on the net, you can achieve the same effect with net send MachineName text or net send /domain:domainname text. For instance: net send mips1 just testing should print "just testing" on \\mips1, assuming nothing's broken.