Chapter 39 - Windows NT Debugger

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

This chapter first defines debugging terminology and provides an overview of debugging on Windows NT. Next, it describes setting up the computers for debugging. This chapter goes into how to create a memory dump file, the utilities that you can use to process the memory dump file, and interpreting the information in the memory dump file**.**

For Windows NT versions 3.51 and 4.0, Windbg, the utility used for reading memory dump files in earlier Windows NT releases, was replaced with a set of utilities that automatically read and interpret memory dump files. These new utilities simplify the process of dealing with kernel memory dump files and aid in sending memory dump files to support personnel for advanced analysis.

New material about the debugger and information about using the output from the Dumpexam utility is also included in this chapter.

Debugging Terms

This section defines some common terms and procedures you need when you debug kernel STOP errors.

Kernel STOP Error, Blue Screen, or Trap

When Windows NT encounters hardware problems, inconsistencies within data necessary for its operation, or other similar errors, the operating system processes the error based upon the information entered in the Recovery dialog box. For information about the Recovery dialog box, see "Creating a Memory Dump File," later in this chapter.

If the user did not select Automatically reboot in the Recovery dialog box, Windows NT displays a blue screen containing error information, then stops.

Knowledge Base articles and other Windows NT documentation sometimes refer to this type of error as blue screen, kernel error, or even trap. This chapter uses the term kernel STOP error. However, if the context specifically refers to Windows NT stopping with the blue screen displayed, the term blue screen is used instead. The term trap is used in this chapter to mean that the kernel has detected an error and might write a memory dump file as part of its processing of the error.

Symbols and Symbol Trees

Usually, when code is compiled, one of two versions of the executable file can be created: a debug (also known as checked) version, or a nondebug (also known as free) version. The checked version contains extra code that enables a developer to debug problems, but this means a larger and possibly slower executable file. The free version of the executable file is smaller and runs at a normal speed, but cannot be debugged.

Windows NT combines the speed and smaller size of free versions with the debugging capabilities of the checked versions. All executable files, drivers, dynamic-link libraries, and other program files in Windows NT are the free versions. However, each program file has a corresponding symbol file, which contains the debug code that is normally part of the checked file. These symbol files are on the Windows NT Server product CD, in the Support\Debug\Platform\Symbols directories, where Platform is I386, Alpha, MIPS, or PowerPC. Within each Symbols directory, there is one directory for each type of file (such as .exe, .dll, and .sys). This structure is referred to as a symbol tree. Table 39.1 describes directories that exist in a standard symbol tree.

Directory	Contains symbols for
ACM	Microsoft Audio Compression Manager files
COM	Executable files (.com)
CPL	Control Panel programs
DLL	Dynamic-link library files (.dll)
DRV	Driver files (.drv)
EXE	Executable files (.exe)
SCR	Screen-saver files
SYS	Driver files (.sys)

All of the utilities used to debug Windows NT or interpret memory dump files require a symbol tree containing the symbol files for the version of Windows NT you were running at the time of the kernel STOP error. With some utilities, you need the \Symbols directory to be on your hard drive, in the \Systemroot directory. With other utilities, you can specify the path to the \Symbols directory as a command-line option or in a dialog box.

Target Computer

The term target computer refers to the computer on which the kernel STOP error occurs. This computer is the one that needs to be debugged. It can be a computer located within a few feet of the computer on which you run the debugger, or it can be a computer that you dial in to by using a modem.

Host Computer

The term host computer refers to the computer on which you run the debugger. This computer should run a version of Windows NT that is at least as recent as the one on the target computer.

Debugging Overview

There are three approaches you can take to finding the cause of kernel STOP errors:

Set up a remote debug session with the Microsoft Support Network. This process is needed if a memory dump file cannot be generated or if the target computer halts with a STOP screen. The connection process involves configuring your target computer for a connection (modem to modem) to a host computer located at Microsoft.
Set up a local debug session with Microsoft Support Network by using a Remote Access Service (RAS) server. This process is needed if a memory dump file cannot be generated or if the target computer halts with a STOP screen. The connection process involves using a null modem cable to configure both your target computer and your host computer. The host is then networked to a RAS server and the debugging information is sent to Microsoft over an asynchronous connection. You can also analyze the debugging information at your host computer.
Set up your target computer to write the contents of its RAM to a memory dump file when a kernel STOP error occurs. You can then use the dump analysis utilities to analyze the memory dump, or send the memory dump file to technical support personnel for their analysis.

Kernel Debuggers

The Windows NT kernel debuggers — I386kd.exe, Alphakd.exe, Mipskd.exe, and Ppckd.exe — are 32-bit executable files that are used on the host computer to debug the kernel on the target computer. Each host hardware platform has its own set of utilities, which are provided on the Windows NT product CD in the \Support\Debug directory.

The kernel debuggers can be used for either remote or local kernel debugging. If you use local kernel debugging, the host computer is located within a few feet of the target computer and the two computers communicate through a null modem serial cable. If you use remote kernel debugging, the host computer can be at any distance from the target computer because communication takes place through modems.

The host and target computers send debugging information back and forth through their communications ports. The ports on both computers must be configured to pass data at the same rate in bits per second (bps).

After a blue screen appears, record the important information in the message, then restart the computer. You might need to configure the target computer for local or remote debugging and reboot it a second time. You can then continue running Windows NT until the message is displayed again. After the blue screen is displayed the second time, call your technical support group and request assistance with the debugging. They can decide whether to debug the kernel STOP error locally or remotely and instruct you to configure your system appropriately.

Dump Analysis Utilities

To use the Windows NT dump analysis utilities, you must first configure your computer to write a memory dump file when it gets a kernel STOP error. Use the Recovery dialog box to configure the target computer to write the memory file, as described in the section "Creating a Memory Dump File" later in this chapter. This file preserves information about the state of the computer at the time of the kernel STOP error. The memory dump file can be used by the dump analysis utilities to troubleshoot the problem. If you use this option, you can run the dump analysis utilities on any Windows NT–based computer after you load the memory dump file, including the computer on which the kernel STOP error occurred.

This approach is usually the best for a computer running Windows NT Server because it minimizes the amount of time the server is unavailable. The default for a Windows NT Server–based computer is to automatically restart after writing an event to the system log, then alert administrators and dump system memory to the Memory.dmp file. Because of this, to preserve memory dump files, you rename the newest one each time a kernel STOP error occurs. You can then run the dump analysis utilities and send the information to your technical support group for processing.

Setting Up for Debugging

If you decide to use the kernel debugger to analyze the kernel STOP error, you need to set up the host and connect your host and target computers. To do this, you use either a null modem cable for a local debug session or a modem cable for a remote debug session. Before you can start debugging, you must complete several steps.

To prepare for debugging

Set up the modem connection.
Configure the target system for debugging.
Set up a symbol tree on the host system.
Set up the debugger on the host system.
Start the debugger on the host system.

Note None of the procedures in this section are necessary if you use the Recovery dialog box to create a memory dump file. For information about that alternative, see "Creating a Memory Dump File," later in this chapter.

Setting Up a Remote Debugging Session on an Intel-Based Computer

If you enable the kernel debugger on your target computer, it sends debugging information to a host computer for a remote user to analyze. A support engineer often requests this to help analyze a fatal error in Windows NT that cannot be diagnosed from the Memory.dmp file or if a Memory.dmp file is not produced.

The process of remote debugging occurs when two computers are connected by means of modems over a phone line. The target and the host computer can thus communicate by using a special debugging API and protocol.

The following figure shows the connection between the host and the target computer for a remote debugging session.

Cc750060.xwrnn01(en-us,TechNet.10).gif

Figure 39.1 Remote Debugging

To configure a system for remote debugging, you change the boot options to set Windows NT to load the kernel debugger. On an X86–based platform, you do this by editing the Boot.ini file. On a RISC-based system (DEC Alpha, MIPS and PowerPC processors), you change the boot options in the firmware menu. You must also connect an external modem to the appropriate COM port on the target computer and connect an inbound phone line to the modem.

Booting the Target Machine

If the target computer stops at a blue screen every time you boot it, or does not keep running long enough for you to edit the Boot.ini file to enable the debugger, you can try these options:

If your boot partition is FAT, you can start MS-DOS from a boot floppy disk and use the MS-DOS-based editor to edit Boot.ini.
If your boot partition is NTFS (or HPFS, if you are running Windows NT version 3.1 or 3.5), you can install Windows NT on a different partition and boot from that partition. (You must use this method because you cannot access files on an NTFS or HPFS partition from MS-DOS.)
If you previously created a Windows NT boot recovery disk for the workstation that has the problem, you can use this disk on another machine to edit the Boot.ini file, and then boot the target machine.

Setting Up the Modem on the Target Machine

To set up a remote debugger session, you must connect an external modem to the target machine and reconfigure the modem parameters to meet the requirements of the kernel debugger. To configure the modem, you must be able to run Terminal.exe or some other communications program. If you are unable to run these programs on the target machine, connect the modem to a computer that is close to the target machine. Make sure you can move the modem back to the target machine without losing power to the modem. An internal modem does not work because rebooting the system resets the configuration changes you have made to the modem.

The modem must be connected to a spare COM port and must be configured as shown in the following table:

Auto answer mode	On
Hardware compression	Disabled
Error detection	Disabled
Flow control	Disabled
Baud rate 9600 bps for x86-based system and 19200 bps for RISC-based system.

Consult your modem documentation for the correct string values to send to the modem during the configuration process. The following table gives an example of how to configure a USRobotics modem for a remote debugging session.

Function	String Value
Set Back to Factory Defaults	AT & F
Disable Transmit Data Flow Control	AT & H0
Disable Receive Data Flow Control	AT & I0
Disable Data Compression	AT & K0
Disable Error Control	AT & M0
Auto Answer On	ATS0=1
Disable Reset Modem on Loss of DTR	AT & D0
Write to NVRAM	AT & W

To configure the modem

Connect the modem to an unused COM port on the target machine or on another computer that is close enough to the target machine to connect by using a standard modem cable.

Note If you connect the modem to a computer other than the target machine, make sure you can move the modem back to the target COM port without removing power from the modem.
Run Terminal.exe or some other communications program to configure the modem parameters.
Set the modem speed to 9600 bps. See your modem documentation to find out how to do this.
Turn off all hardware compression, flow control, and error detection.

How to do this varies widely from modem to modem. See your modem documentation for the correct strings to send to the modem.
Enable auto-answer by sending the string ATS0=1 to your modem. Consult your modem documentation to verify that this will work with your modem.
If the modem was configured on a machine other than the target computer, move it to the target computer without removing the power from the modem.

Editing the Boot.ini File on the Target Machine

To configure a target system for a remote or local remote debugging, you edit the boot options in the Boot.ini file to tell Windows NT to load the kernel debugger.

Debugger Options

The following table lists the boot options that can be used to configure the system for debugging. These options are the same on Intel X86 and RISC platforms, but the slash (/) is not required when used on a RISC platform.

/Debug	Causes the kernel debugger to be loaded during boot and kept in memory at all times. This means that a support engineer can dial into the system being debugged and break into the debugger, even when the system is not suspended at a kernel STOP screen.
/Debugport	Specifies the serial port to be used by the kernel debugger. If no serial port is specified, the debugger will default to COM2 on Intel X86-based computers and to COM1 on RISC computers.
/Crashdebug	Causes the kernel debugger to be loaded during boot but swapped out to the pagefile after boot. As a result, a support engineer cannot break into the debugger unless Windows NT is suspended at a kernel STOP screen.
/Baudrate	Sets the speed that the kernel debugger will use in bits per second. The default rate is 19200 bps. A rate of 9600 bps is the normal rate for remote debugging over a modem.

When you use Debugport or Baudrate, you need not use Debug, as Windows NT assumes that the computer will load in Debug mode. You must use at least one of the options described in Table 39.1 to configure a computer for remote debugging. Otherwise, Windows NT does not load the debugger at all.

To set up the target computer on an Intel X86-based computer, edit the Boot.ini file by using a standard ASCII text editor and add the appropriate debugger options to the file. The Boot.ini file is located in the system root directory (usually the C drive) and has the Hidden, System, and Read-Only attributes set. These attributes must be changed.

To Change the Attributes of the Boot.ini File

Type the following at a command prompt:

attrib -s -h -r c:\boot.ini
To restore the Read-Only, Hidden, and System attributes when you finish debugging the system, type the following at a command prompt:

attrib +h +r +s c:\boot.ini \

To Configure the Boot Options in the Boot.ini File

To configure the target computer for remote or local debugging, add the /Debug and /Baudrate options to the Boot.ini file. If you cannot use the default COM port (COM 2) for debugging, use /Debugport=COMx where x is the COM port number. Use the MS-DOS-based Editor to edit the Boot.ini file.

At a command prompt, type:

edit boot.ini

The Boot.ini file appears in the MS-DOS Editor window. It looks similar to this:

[boot loader]

timeout=30

default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS

[operating systems]

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"

[VGA mode] /BASEVIDEO

C:\="MS-DOS"
Select the startup option that you normally use and add the /Debug option at the end of the line.
To specify the communications port, add the option /Debugport=comx where x is the communications port that you want to use.
Add the option /Baudrate=9600.

This is the output if the Boot.ini file after it has been modified by steps 1-4:

[boot loader]

timeout=30

default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS

[operating systems]

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0" /debug /debugport=com1 /baudrate=9600

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"

[VGA mode] /BASEVIDEO

C:\="MS-DOS"
Save the Boot.ini file and quit the text editor or the MS-DOS Editor.
Restart the computer to run under Windows NT.

Your technical support group can now call the modem to establish the remote debugging session.

Setting Up a Remote Debugging Session on a RISC-Based Computer

To prepare a RISC-based computer for a remote or local kernel debugging session, you edit one line in a startup file. But you access that file in a different way. The procedure for all Alpha systems is the same. The options you use to configure the PowerPC-based system are the same as the options you select to configure the MIPS-based system. However, the path to the firmware menus may vary for MIPS-based and PowerPC-based systems.

On RISC-based computers, the default COM port is always COM1, and the default speed is always 19200 bps.

Before you begin the procedure to configure the rarget machine, make sure you set it up properly for communication. If you cannot run Terminal.exe or any other communications programs on the target machine, connect the modem to a computer that is near the target machine. Make sure that you can move the modem back to the target machine without removing the power to the modem.

All modem parameters are configured for a RISC-based computer in the same way as they are for an X86-based system with the exception of the modem speed. The default speed is always 19.2 kbps for a RISC-based system. For more information, see "Setting up the Modem on the Target Machine," earlier in this chapter.

After you have set up your computer for communication, restart the computer. The ARC System screen appears, displaying the main menu, from which you can select an action. Now you are ready to configure.

To configure the target machine

On a MIPS RISC-based system, select Run Setup to display the Setup menu, then select Manage Startup. A menu of boot options appears.

On a Digital Alpha AXP RISC-system or a PowerPC RISC-based system, select the menu options listed in the following table to get to the Boot selections menu.

On Menu

Select

System Boot

Supplementary menu

Supplementary

Setup the system

Setup

Manage boot selections
On the Boot Selections menu, select Change a Boot Selection. A list of the operating systems that are installed on this computer appears.
From the list of operating systems, select the Windows NT operating system. If you have more than one version of Windows NT installed, select the version that you want to debug.

A two-part screen appears with options for changing the current settings of the environment variables used to start the RISC-based computer. The environment variable that controls whether or not the RISC-based computer starts up in debug mode is the OSLOADOPTIONS variable.
Select the OSLOADOPTIONS variable from the list of environment variables.

You edit the value of the OSLOADOPTIONS variable to control whether the RISC-based computer starts up in debug mode.

After you select OSLOADOPTIONS, it appears in the Name box at the top of the screen.
Press ENTER to display the Value box.
Type the options that you want to add in the Value box separated by spaces. Press ENTER to save them and to turn on the debug mode.

You can also add a value that explicitly sets the communications port, as in the following example:
```
OSLOADOPTIONS debug debugport=com2
```

On Menu	Select
System Boot	Supplementary menu
Supplementary	Setup the system
Setup	Manage boot selections

If you do not specify the debug port, the default debug port is set to COM1. Because RISC– based computers allow only a default modem speed of 19.2 Kbps, you do not need to specify the baud rate.

Press Esc to stop editing.
Return to the ARC System screen by using the method for your system:

System

Procedure

MIPS RISC and PowerPC RISC

Select Return to Main Menu, then Exit.

Digital Alpha AXP

Select Supplementary Menu, save your changes, then select Boot Menu.

If this is the first time that you have debugged a Digital Alpha AXP RISC–based system, follow these steps after connecting the local host computer to the target:
- Shut down both computers.
- Restart the host (debugger) computer.
- Run Alphakd.exe on the local host.
- Restart the target (Digital Alpha AXP RISC-based) computer while Alphakd.exe is running on the host computer to set up configuration information on the target computer, and prepare it for either local or remote debugging.
Note After you complete steps 1-4, you can use either a local or a remote host to debug the target.
To run under Windows NT, restart the RISC-based computer.

You may now contact your technical support group or a trained technician and have them call the modem to establish a remote debugging session.

System	Procedure
MIPS RISC and PowerPC RISC	Select Return to Main Menu, then Exit.
Digital Alpha AXP	Select Supplementary Menu, save your changes, then select Boot Menu.

Setting Up a Local Debugging Session on a Host Computer

You need a local debug session for debugging in cases where a user-mode .dll or a device driver is causing server crashes. In such a case, you use a user-mode debugger (such as NTSD) and you build the server symbols on the host computer.

You can also use this setup if your Remote Access Service (RAS) account allows a Microsoft Support engineer to dial into your network and debug the computer. This debug option overcomes many modem-related issues.

You use a local debug setup in cases where:

You debug a user-mode component in Windows NT by using NTSD or CDB.
A live remote debug does not work because of modem connection issues.
Customer has worked with a senior ESS debug engineer and the situation warrants a local debug session.

To debug a Windows NT–based target computer by using a local host system, you need to:

Connect the host and the target computers by using a null-modem serial cable.
Set up a symbol tree on the local host computer to match the version of Windows NT that resides on the target computer. If you are using NTSD or CDB, you will need to set up a symbol tree on the target computer, in the directory %SYSTEMROOT%\Symbols.
Set up the debugging files on the host computer.
Start the debugger on the host.

Figure 39.2 shows the connection between the host and the target computer for a local debugging session. It also shows how to use your RAS account to connect to the Microsoft Support Network for help in analyzing the debug information.

Cc750060.xwrnn02(en-us,TechNet.10).gif

Figure 39.2 Local Debugging

Setting Up for Local Debugging

To set up for a local debugging session, you use a null-modem cable to connect the target and the host machines. For an x86-based system, the boot options in the Boot.ini file must be configured on the target machine to invoke the debugger and to set the data transfer rate between the target computer and the host computer. On a RISC-based system, the boot options are configured from a firmware menu.

For information on configuring the boot options for an x86-based system, see "Editing the Boot.ini File on a Target Machine," earlier in this chapter. For information on configuring a RISC-based system for a local debug session, see "Setting Up a Remote Debugging Session on a RISC-Based Computer," earlier in this chapter.

Be sure to start the host computer before restarting the target computer.

Setting Up a Null-Modem Connection

A modem is not used in a local debug session. Therefore, the procedure for setting up the null-modem cable is the same on both the host computer and target computer.

A standard, commercially available null-modem serial cable has this configuration:

Transmit Data connected to Receive Data
Receive Data connected to Transmit Data
Ground connected to Ground

For 9-pin and 25-pin D-subminiature connectors (known as db9 and db25, respectively), the cable connects as follows:

Pin 2 to pin 3
Pin 3 to pin 2
Pin 7 to pin 7

The debugger on the host does not depend on any control pins (such as Data Terminal Ready, Data Set Ready, Request To Send, or Clear To Send). However, you might need to put a jumper in the connectors on both ends of the cable from Data Terminal Ready to Data Set Ready and from Request To Send to Clear To Send, as follows:

Connector	Jumpers
db9	From pin 4 to pin 6 and from pin 7 to pin 8
Db25	From pin 20 to pin 6 and from pin 4 to pin 5

Connect the null-modem cable to an unused serial port on both the host computer and the target computer.

Setting Up the Symbol Tree on the Host

You set up the symbol tree on the host machine to match the version of Windows NT that you are running on the target computer.

The Windows NT Server and Windows NT Workstation product CDs come with symbol trees already created. They are in the Symbols directories on the CD under Support\Debug\platform, where platform is I386, Alpha, MIPS, or PowerPC. The platform sprcification must match your target computer.

If you have not installed any service packs or hot fixes and do not have a multiprocessor system, you might need to specify only the path to the correct Symbols directory on the CD, or copy that directory to \Systemroot and use this as the symbol path.

If you have installed service packs or hot fixes to Windows NT, or if you are using any HAL (Hardware Abstraction Layer) other than the standard, single-processor HAL, you must construct a symbol tree.

To construct a symbol tree

Copy the correct tree from the Support directory on the CD to your hard drive.
Copy the symbols into this tree for the updates you have applied in the same order in which you applied the updates, so that the later versions overwrite the earlier versions.
If you are using kernel debuggers to debug a multiprocessor system, or a single-processor system that is using a special HAL, you must rename some of the symbol files. The rest of this section discusses what to rename and how to rename it.

The kernel debuggers always load the files named Ntoskrnl.dbg for kernel symbols and Hal.dbg for HAL symbols. Therefore, you need to determine which kernel and HAL you are using, and rename the associated files to these filenames.

If you have a multiprocessor computer, you only need to rename Ntkrnlmp.dbg to Ntoskrnl.dbg. These files are in the \Exe subdirectory of the symbol tree.

If your computer uses a special HAL, there are a number of possibilities. Tables 39.2-39.5 list the possible HAL files for each hardware platform. These tables list the actual name of the .dll file as it exists on the product CD and the uncompressed size of the file in bytes. Each .dll file has a corresponding .dbg file, which is in the \Dll subdirectory of the symbol tree. Determine which HAL you are using, and rename the associated .dbg file to Hal.dbg. If you are not sure which HAL you are using, compare the file size in the table with the Hal.dll file on the target system. The Hal.dll file can be found in \Systemroot\System32.

Filename	Uncompressed size (bytes)	Description
Hal.dll	52,768	Standard HAL for Intel systems
Hal486c.dll	51,712	HAL for 486 c Step processor
Halapic.dll	68,096	Uniprocessor version of Halmps.dl
Halast.dll	49,328	HAL for AST® SMP systems
Halcbus.dll	87,328	HAL for Cbus systems
Halcbusm.dll	85,376
Halmca.dll	49,696	HAL for MCA-based systems (PS/2® and others)
Halmps.dll	70,240	HAL for most Intel multiprocessor systems
Halmpsm.dll	69,184
Halncr.dll	83,920	HAL for NCR® SMP computers
Haloli.dll	42,992	HAL for Olivetti® SMP computers
Halsp.dll	56,592	HAL for Compaq Systempro®
Halwyse7.dll	43,728	HAL for WYSE7 systems

Filename	Uncompressed size (bytes)	Description
Hal.dll	60,160	Standard HAL for DEC Alpha systems
Hal0jens.dll	60,160	Digital DECpc AXP 150 HAL
Halalcor.dll	69,120	Digital AlphaStation 600 Family
Halavant.dll	69,856	Digital AlphaStation 200/400 Family HAL
Haleb164.dll	84,768
Haleb64p.dll	76,320	Digital AlphaPC64 HAL
Halflex.dll	89,472
Halgammp.dll	82,560	Digital AlphaServer 2x00 5/xxx Family HAL
Halx3.dll	79,072
Halmikas.dll	73,184	Digital AlphaServer 1000 Family Uniprocessor HAL
Halnonme.dll	68,320	Digital AXPpci 33 HAL
Halqs.dll	68,000	Digital Multia MultiClient Desktop HAL
Halrawmp.dll	93,280
Halsabmp.dll	78,496	Digital AlphaServer 2x00 4/xxx Family HAL
Halxl.dll	81,568

Filename	Uncompressed size (bytes)	Description
Hal.dll	41,856	Standard HAL for MIPS
Halacr.dll	42,496	ACER HAL
Haldti.dll	66,240	DESKStation Evolution
Halduomp.dll	41,536	Microsoft-designed dual MP HAL
Halflex.dll	96,640
Halfxs.dll	41,856	MTI with an R4000 or R4400
Halfxspc.dll	41,984	MTI with an R4600
Halnecmp.dll	47,040	NEC® dual MP
Halntp.dll	140,096	NeTpower FASTseries
Halr94a.dll	193,760
Halr96b.dll	194,432
Halr98mp.dll	108,608	NEC 4 processor MP
Halsni4x.dll	99,936	Siemens Nixdorf UP and MP
Halsnip.dll	116,864
Haltyne.dll	65,888	DESKStation Tyne

Filename	Uncompressed size (bytes)	Description
Halcaro.dll	234,240	HAL for IBM-6070
Haleagle.dll	211,232	HAL for Motorola PowerStack and Big Bend
Halfire.dll	292,384	Hal for Powerized_ES, Powerized_MX, and Powerized_MX MP
Halppc.dll	233,600	HAL for IBM-6015
Halps.dll	207,552
Halvict.dll	244,896
Halwood.dll	233,888	HAL for IBM-6020

In some cases, a HAL file might have been supplied by your computer manufacturer. If so, you need to obtain symbols for the file from the manufacturer, rename that symbol file to Hal.dbg, and place it in the \Dll subdirectory of the symbol tree. For example, Compaq provides updated HAL files for their Proliant™ systems. This also applies if you have drivers from third-party sources. Obtain symbols from your third-party vendor and put them in the appropriate directory.

Setting Up the Debugger Files on the Host

To set up the debugger on the host, first ensure that you have the correct files available. Copy these files from the Support\Debug\platform directory to a debug directory on the hard drive, where platform matches the platform of the host computer.

Some files that you copy from the directory must match the platform of the target computer, as described in the following table. These files are necessary for kernel debugging.

File	Source List
platformKd.exe*	Alphakd.exe I386kd.exe Mipskd.exe Ppckd.exe
Imagehlp.dll
Kdextplatform.dll*	Kdextalp.dll Kdextx86.dll Kdextmip.dll Kdextppc.dll

* platform matches the platform of the target computer

For instance, if your host computer is a 486 computer and the target computer is a MIPS RISC-based system, you copy the following files from the \Support\Debug\I386 directory:

Mipskd.exe
Imagehlp.dll
Kdextmip.dll

Once you have set up the symbol tree and copied the necessary files to it, use a batch file or command line to set the following environment variables on the host:

Variable	Purpose
_NT_DEBUG_PORT	COM port being used on host for debugging.
_NT_DEBUG_BAUD_RATE	Max baud rate for debug port. On x86-based computers, maximum is 9600 or 19200 bps for modems, 19200 bps for null-modem serial cables. On RISC-based computers, rate is always 19200 bps.
_NT_SYMBOL_PATH	Path to symbols directory
_NT_LOG_FILE_OPEN	Optional, the name of the file to which to write a log of the debug session

After these environment variables have been set, you can start the host debugger.

Note Setting the _NT_LOG_FILE_OPEN variable does not always result in a log file being written. You can also create the log file from the debugger. The command format is:

.logopen pathname

You might also need to issue the !reload command to get this to work.

Starting the Debugger on the Host

You can start the host debugger from the command line or a batch file by using the name of the executable file as the command. Each debugger supports the following command-line options:

Option	Action
-b	Causes the debugger to stop execution on the target computer as soon as possible by causing a debug breakpoint (INT 3).
-c	Causes the debugger to request a resync on connect. Resynchronization ensures that the host and target computers are communicating in sequence.
-m	Causes the debugger to monitor modem control lines. The debugger is only active when the carrier detect (CD) line is active; otherwise, the debugger is in terminal mode, and all commands are sent to the modem.
-n	Causes symbols to be loaded immediately, rather than in a deferred mode.
-v	Indicates verbose mode; displays more information about such things as when symbols are loaded.
-x	Causes the debugger to break in when an exception first occurs, rather than letting the application or module that caused the exception deal with it.

The most commonly used options are -v (verbose) and -m (for modem debugging).

Generally, the best way to start the debugger is to create a batch file with the necessary commands to set the environment variables, followed by the command to start the correct kernel debugger.

Using the Remote Utility to Start the Debugger

If the host computer is connected to a network, you can use the remote utility, included in the Windows NT Resource Kit, to start the debugger. Remote is a server/client utility that provides remote network access by means of named pipes to applications that use STDIN and STDOUT for input and output. Users at other computers on the network can then connect to your host debugger session and either view the debugging information or enter commands themselves. The syntax for starting the server (host) end of the remote session is as follows: remote /s "command" Unique_Id [/f foreground_color|/b background_color]

For example:

REMOTE /S "i386kd -v" debug

You end the server session by entering the @K command.

To interact with this session from some other computer, use the remote /c command. The syntax of this command is as follows: remote /c ServerName Unique_Id [/l lines_to_get|/f foreground_color|/b background_color]

To exit from the remote session on a client and leave the debugger running on the host computer, enter the @Q command.

For example, if a session with the ID debug was started on the host computer \\Server1 by using the remote /s command, you can connect to it with the command

REMOTE /C server1 debug

For more information on using the remote command, see the Rktools.hlp file on the Windows NT Resource Kit CD.

Examples

Assume the following:

Debugging needs to take place over a null-modem serial cable on COM2.
The symbols are on a CD on the E drive.
A log file called Debug.log is to be created in C:\Temp.

Note The log file holds a copy of everything you see on the debug screen during your debug session. All input from the person doing the debugging, and all output from the kernel debugger on the target system, is written to the log file.

A sample batch file for local debugging is:

REM Target computer is local
set _NT_DEBUG_PORT=com2
set _NT_DEBUG_BAUD_RATE=19200
set _NT_SYMBOL_PATH=e:\support\debug\i386\symbols
SET _NT_LOG_FILE_OPEN=c:\temp\debug.log
remote /s "i386kd -v" debug

The last line of the batch file uses the remote utility to start the host debugger. If you use this, users of Windows NT–based computers who are networked to the host computer (and who have a copy of the remote utility) can connect to the debug session by using the command: remote /c computername debug

where computername is the name of the host computer.

To allow remote debugging, which requires the use of a modem, begin with the batch file in the previous example. Change the baud rate to 9600, and add the -m switch to the last line. The result is as follows:

REM Target computer is remote from the host
set _NT_DEBUG_PORT=com2
set _NT_DEBUG_BAUD_RATE=9600
set _NT_SYMBOL_PATH=e:\support\debug\i386\symbols
SET _NT_LOG_FILE_OPEN=c:\temp\debug.log
remote /s "i386kd -v -m" debug

You run the batch file from the directory that contains the debugger files.

When you start the debugger, one of two screens appears, depending upon whether you are doing local debugging or remote debugging.

When doing local debugging, the following screen appears:

**************************************
_********** REMOTE ***********_
_********** SERVER ***********_
_************************************_
To Connect: Remote /C BANSIDHE debug

Microsoft(R) Windows NT Kernel Debugger
Version 3.51
(C) 1991-1995 Microsoft Corp.

Symbol search path is:
KD: waiting to connect...

At this screen, you can press CTRL+C to gain access to the target computer, if it is still running. If the target is currently stopped at a blue screen, you will probably gain access automatically. If you have any problems, press CTRL+R to force a resync between the host computer and the target computer.

If you are doing remote debugging, the same screen as shown for local debugging appears, with the following extra line:

KD: No carrier detect - in terminal mode

In this case, the debugger is in terminal mode, and you can issue any of the standard AT commands to your modem. Begin by sending commands to disable hardware compression, flow control, and error correction. These commands will vary from modem to modem, so consult your modem documentation. Once you connect to the target system and have a carrier detect (CD) signal, you are returned to the debugger.

Creating a Memory Dump File

If you do not want to or are unable to do local or remote debugging, you can configure Windows NT Server or Windows NT Workstation to write a memory dump file each time it generates a kernel STOP error. This file contains all the information needed by the dumpexam utility to troubleshoot the kernel STOP error, as if you were connected to a live computer experiencing the problem.

Using the memory dump file enables you to examine the error at any time, so you can immediately restart the computer that failed. Thus, your target computer can be available while you are using the debugger. The only drawback to this method is that you must have sufficient space on a hard disk partition for the resulting memory dump file, which will be as large as your RAM memory. Therefore, whenever a kernel STOP error occurs, a computer with 32 MB of RAM produces a 32-MB memory dump file. You must also have a page file on your system root drive that is at least as large as your RAM memory.

To configure Windows NT to save STOP information to a memory dump file

In Control Panel, double-click System.
In the System Properties dialog box, click the Startup/Shutdown tab.
Under Recovery, select the Write debugging information to check box. Either accept the default path and filename (C:\systemroot\Memory.dmp) or type a path in the text box.
If you want this memory dump file to overwrite any file of the same name, select the Overwrite any existing file check box. If you set the option to overwrite an existing file, rename or move the file so it does not get overwritten before you have time to process it. If you clear this check box, Windows NT will not write a memory dump file if there is already a file by that name.

Cc750060.xwrnn06(en-us,TechNet.10).gif

Using Utilities to Process Memory Dump Files

Included on the Windows NT Server and Windows NT Workstation version 3.51 CDs are three utilities for processing memory dump files: dumpflop, dumpchk, and dumpexam. All three utilities are on the product CDs in the Support\Debug\platform directories, where platform is I386, Alpha, MIPS, or PowerPC.

The primary purpose of these utilities is to create files on floppy disks or a text file that you can send to technical support personnel for analysis.

Dumpflop

Dumpflop is a command-line utility that you can use to write a memory dump file in segments to floppy disks, so it can be sent to a support engineer. This is rarely the most efficient way to send a memory dump file, but it is sometimes the only way. Dumpflop compresses the information it writes to the floppy disks, so a 32 MB memory dump file can fit onto 10 floppy disks, rather than 20 or more. Dumpflop does not require access to symbols.

To store the crash dump onto floppy disks, use dumpflop with the following command-line syntax: dumpflop options CrashDumpFile Drive:

To assemble a crash dump from floppy disks, use dumpflop with the following command-line syntax: dumpflop options Drive: CrashDumpFile

In either case, Options can include:

Option	Action
-?	Displays the command syntax.
-p	Only prints the crash dump header on an assemble operation.
-v	Shows compression statistics.
-q	Formats the floppy disk, when necessary, before writing the memory dump file to the floppy disk. When reading the floppy disks to assemble the file, overwrites an existing memory dump file.

If executed with no parameters, dumpflop attempts to find a memory dump file in the \systemroot directory (the default location for creating a memory dump file) and writes it to floppy disks on the A drive.

Dumpchk

Dumpchk is a command-line utility that you can use to verify that a memory dump file has been created correctly. Dumpchk does not require access to symbols.

Dumpchk has the following command-line syntax: dumpchk options CrashDumpFile

The Options can include:

Option	Action
-?	Displays the command syntax.
-p	Prints the header only (with no validation.
-v	Specifies verbose mode.
-q	Performs a quick test.

Dumpchk displays some basic information from the memory dump file and then verifies all the virtual and physical addresses in the file. If any errors are found in the memory dump file, it reports them. The following is an example of the output of a Dumpchk command:

Filename . . . . . . .memory.dmp
Signature. . . . . . .PAGE
ValidDump. . . . . . .DUMP
MajorVersion . . . . .free system
MinorVersion . . . . .807
DirectoryTableBase . .0x00030000
PfnDataBase. . . . . .0xffb7e000
PsLoadedModuleList . .0x80196d40
PsActiveProcessHead. .0x80196c38
MachineImageType . . .i386
NumberProcessors . . .1
BugCheckCode . . . . .0xc000021a
BugCheckParameter1 . .0xe17b7b68
BugCheckParameter2 . .0xc0000005
BugCheckParameter3 . .0x00000000
BugCheckParameter4 . .0x00000000

ExceptionCode. . . . .0x80000003
ExceptionFlags . . . .0x00000001
ExceptionAddress . . .0x8015f015

FakePre-7847d33d5b214aa5ae75f6add029f785-5a3ab7df7df94410aae2d4b465d43033FakePre-44c57f6f09574857ba55dada5e18ba45-da04f05f5b3b473ea371b87afe8e155cFakePre-f501693806ce44199c2c45fdea0aad09-d275a52a22f04b3587e0c6cbf8910292

In this example, the most important information (from a debugging standpoint) is the following:

MajorVersion . . . . .free system
MinorVersion . . . . .807
MachineImageType . . .i386
NumberProcessors . . .1
BugCheckCode . . . . .0xc000021a
BugCheckParameter1 . .0xe17b7b68
BugCheckParameter2 . .0xc0000005
BugCheckParameter3 . .0x00000000
BugCheckParameter4 . .0x00000000

This information can be used to determine what kernel STOP error occurred and what version of Windows NT was in use.

Dumpexam

Dumpexam is a command-line utility that examines a memory dump file, extracts information from it, and writes it to a text file. This text file can then be used by support personnel to determine the cause of the kernel STOP error. In many cases, the dumpexam analysis provides enough information for support personnel to determine the cause of the error without directly accessing the memory dump file.

Three files are required to run dumpexam, and they all must be in the same directory. You can find them on the Windows NT Server or Windows NT Workstation CD in the directory Support\Debug\platform, where platform is I386, Alpha, MIPS, or PowerPC. The first two files are:

Dumpexam.exe
Imagehlp.dll

The third file is one of the following, depending on the type of computer on which the memory dump file was generated:

Kdextx86.dll
Kdextalp.dll
Kdextmip.dll
Kdextppc.dll

You can run dumpexam directly from the product CD with no parameters, if

The computer on which the dump occurred was running Windows NT version 4.0.
You have not applied any hot fixes or service packs on that computer.
The memory dump file you want to examine is in the location specified in the Recovery dialog box.

Dumpexam creates a text file called Memory.txt, located in the same directory as the Memory.dmp file, that contains information extracted from the memory dump file.

You can also use dumpexam to examine memory dump files created on computers running earlier versions of Windows NT. However, you can run it only with Windows NT version 3.51 or 4.0. Therefore, if your memory dump file was created in an earlier version of Windows NT, you must move the memory dump file or access it over the network. In addition, you must replace the Kdext*.dll files listed above with copies from the version of Windows NT that was running on the computer on which the dump occurred. These files contain debug information specific to that version of Windows NT. You must also specify the path to the symbols for the operating system version that was running on that computer.

Syntax for Dumpexam

The syntax for dumpexam is: dumpexam options CrashDumpFile

where options can include:

Option	Action
-?	Displays the command syntax.
-p	Prints the header only.
-v	Specifies verbose mode.
-f filename	Specifies the output filename and path
-y path	Sets the symbol search path.

You need to specify the memory dump file path only if you have moved the memory dump file.

You need to specify the symbol search path (using the -y option) only if you are using an alternative symbol path. The symbol path for dumpexam can contain several directories, separated by semicolons(;). Because these directories are searched in the order in which they are listed, you list directories with the most recently installed hot fixes or service packs first.

Examples

In the first example, the memory dump file was created on a computer running Windows NT Workstation version 3.51, and no service packs were installed. The symbols are all in the directory C:\Symbols. The memory dump file is in the directory C:\Dump and is called Machine1.dmp. The command line reads as follows:

dumpexam -y c:\symbols c:\dump\machine1.dmp

The results of the exam will be in \Systemroot\Memory.txt.

In the next example, the memory dump file was created on a DEC Alpha computer running Windows NT Server version 3.5, with Service Pack 2 installed. The Service Pack 2 symbols are in D:\Sp2\Symbols. The Windows NT Server 3.5 symbols are on the product CD, which is in the E drive. The memory dump file Memory.dmp is in D:\Temp. The output file is to be put in the same directory as the memory dump file. The command line reads as follows:

dumpexam -y d:\sp2\symbols;e:\support\debug\alpha -f d:\temp\memory.txt d:\temp\memory.dmp

Using the Dumpexam Output File

Dumpexam reads a memory dump file, executes debugger commands on it, and writes the output in a text file, called Memory.txt, by default. The same debugger commands are executed on each memory dump file.

A full interpretation of the output requires knowledge of Windows NT kernel processes and the ability to read assembly language; however, there are some guidelines you can follow to get an idea of what the output means. This section first describes each part of the memory dump file output, giving sample output and a description. Then several common traps are discussed, along with guidelines on which sections of the Memory.txt file can help you determine what caused the kernel STOP error.

Because the primary purpose of the dumpexam utility is to create a text file to send to support personnel, the descriptions in this section do not provide complete details of the contents of the Memory.txt file.

The following sections of the Memory.txt file each occur once, as they include information that applies to the whole system. These sections are listed in the order in which they appear in Memory.txt.

Windows NT Crash Dump Analysis

The first section of output is Windows NT Crash Dump Analysis, which looks like the following:

****************************************************************
** Windows NT Crash Dump Analysis
_**************************************************************_
Filename . . . . . . .c:\temp\dumps\mac.dmp
Signature. . . . . . .PAGE
ValidDump. . . . . . .DUMP
MajorVersion . . . . .free system
MinorVersion . . . . .1057
DirectoryTableBase . .0x0006f005
PfnDataBase. . . . . .0x83fce000
PsLoadedModuleList . .0x800ee5c0
PsActiveProcessHead. .0x800ee590
MachineImageType . . .alpha
NumberProcessors . . .2
BugCheckCode . . . . .0x0000002e
BugCheckParameter1 . .0x00000000
BugCheckParameter2 . .0x00000000
BugCheckParameter3 . .0x00000000
BugCheckParameter4 . .0x00000000
ExceptionCode. . . . .0x80000003
ExceptionFlags . . . .0x00000001
ExceptionAddress . . .0x800bc140

Most of the information here is useful only for determining whether the memory dump file is corrupted. The following items are most important, especially if you did not record any information from the blue screen generated when the computer trapped:

Parameter	Meaning
BugCheckCode	This code lists the number of the stop that occurred. The stop code can be used by support personnel to determine what trap occurred. For information on bug check codes, see Chapter 4, "Message Reference," in Windows NT Messages. Descriptions of the STOP code message start on page 441 in chapter 4 and are in numerical order. In the preceding example, the code was 0x0000002e, which is a DATA_BUS_ERROR.
BugCheckParameters	These are the four parameters that are normally included with each STOP code. The description of the STOP code in Windows NT Messages includes the meaning of the parameters for some of the kernel STOP Errors.

BugCheckCode

This code lists the number of the stop that occurred. The stop code can be used by support personnel to determine what trap occurred. For information on bug check codes, see Chapter 4, "Message Reference," in Windows NT Messages. Descriptions of the STOP code message start on page 441 in chapter 4 and are in numerical order. In the preceding example, the code was 0x0000002e, which is a DATA_BUS_ERROR.

BugCheckParameters

These are the four parameters that are normally included with each STOP code. The description of the STOP code in Windows NT Messages includes the meaning of the parameters for some of the kernel STOP Errors.

Symbol File Load Log

This section of the Memory.txt file includes any errors that were generated when the symbols were loaded. If no errors were generated, this section will be blank.

!drivers

The !drivers command is a debug command that you use to list information on all the device drivers loaded on the system. The information for the device drivers looks like this:

****************************************************************
** !drivers 
_**************************************************************_

Loaded System Driver Summary

Base Code Size Data Size Driver Name Creation Time
80080000 f76c0 (989 kb) 1f100 (124 kb) ntoskrnl.exe Fri May 26 15:13:00 1995
80400000 d980 ( 54 kb) 4040 ( 16 kb) hal.dll Tue May 16 16:50:34 1995
80654000 3f00 ( 15 kb) 1060 ( 4 kb) ncrc810.sys Fri May 05 20:07:04 1995
8065a000 a460 ( 41 kb) 1e80 ( 7 kb) SCSIPORT.SYS Fri May 05 20:08:05 1995

The following information can be determined from the above output:

Parameter	Meaning
Base	The starting address of the device driver code, in hexadecimal. When the code that causes a trap falls between the base address for a driver and the base address for the next driver in the list, then that driver is frequently the cause of the fault. For instance, the base for Ncrc810.sys is 0x80654000. Any address between that and 0x8065a000 belongs to this driver.
Code Size	The size in kilobytes of the driver code, in both hexadecimal and decimal.
Data Size	The amount of space in kilobytes allocated to the driver for data, in both hexadecimal and decimal.
Driver Name	The driver filename.
Creation Time	The link date of the driver. Do not confuse this with the file date of the driver, which can be set by external utilities. The link date is set by the compiler when a driver or executable file is compiled. It should be close to the file date, but it will not always be the same.

!locks

The !locks command is a debugger command that displays all locks held on resources by threads. A lock can be shared or exclusive, which means no other threads can access that resource. This information is useful when a deadlock occurs on a system, because a deadlock is caused when one nonexecuting thread holds an exclusive lock on a resource needed by an executing thread.

****************************************************************
** !locks -p -v -d
_**************************************************************_
DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.................

Resource @ 0xffb6ed14 Shared 2 owning threads
Threads: ffb3bb70-01 
0012fb50: Unable to read ThreadCount for resource

Resource @ 0xffb6ecdc Shared 2 owning threads
Threads: ffb3bb70-02 
0012fb50: Unable to read ThreadCount for resource

!memusage

The !memusage command gives a short description of the current memory use of the system. Then it gives a much longer listing of the memory usage summary. The output looks something like this:

****************************************************************
** !memusage 
_**************************************************************_
*

loading PFN database...................................................

Zeroed: 405 ( 3240 kb)
Free: 0 ( 0 kb)
Standby: 3242 ( 25936 kb)
Modified: 135 ( 1080 kb)
ModifiedNoWrite: 0 ( 0 kb)
Active/Valid: 4410 ( 35280 kb)
Transition: 0 ( 0 kb)
Unknown: 0 ( 0 kb)
TOTAL: 8192 ( 65536 kb)

Usage Summary in KiloBytes (Kb):
Control Valid Standby Dirty Shared Locked PageTables name
80975548 0 56 0 0 0 0 mapped_file(oemnxpip.inf)
80975248 0 16 0 0 0 0 mapped_file(oemnxpnb.inf)
8096aa68 0 160 0 0 0 0 mapped_file(SFMATALK.SY_)
80974f48 0 104 0 0 0 0 mapped_file(oemnxpsm.inf)
809758e8 0 96 0 0 0 0 mapped_file(utility.inf)

This section provides information for some memory leak issues, but it is more useful to refer to the !vm section for memory information for most common kernel STOP errors.

!vm

The !vm command lists the system's virtual memory usage. The output of !vm looks like this:

****************************************************************
** !vm 
_**************************************************************_
_** Virtual Memory Usage **_
Physical Memory: 32784 (131136 Kb)
Available Pages: 27435 (109740 Kb)
Modified Pages: 33 ( 132 Kb)
NonPagedPool Usage: 461 ( 1844 Kb)
PagedPool 0 Usage: 1519 ( 6076 Kb)
PagedPool 1 Usage: 125 ( 500 Kb)
PagedPool 2 Usage: 149 ( 596 Kb)
PagedPool Usage: 1793 ( 7172 Kb)
Shared Commit: 173 ( 692 Kb)
Process Commit: 254 ( 1016 Kb)
PagedPool Commit: 1793 ( 7172 Kb)
Driver Commit: 321 ( 1284 Kb)
Committed pages: 4261 ( 17044 Kb)
Commit limit: 80792 (323168 Kb)

All memory usage is listed in pages and in kilobytes. The most useful information in the !vm section for diagnosing problems is:

Parameter	Meaning
Physical Memory	The total physical memory in the system.
Available Pages	The number of pages of memory available on the system, both virtual and physical. If this is low, it might indicate a problem with a process allocating too much virtual memory.
NonPagedPool Usage	The amount of pages allocated to the nonpaged pool. The nonpaged pool is memory that cannot be swapped out to the pagefile, so it must always occupy physical memory. This number should rarely be larger than 10% of the total physical memory. If it is larger, this is usually an indication that there is a memory leak somewhere in the system.

!errlog

The debugger sometimes keeps track of kernel errors logged by the system when a problem occurs. The !errlog section contains a dump of this log. In most cases, the error log is empty. If it is not empty, you can sometimes use it to determine the component or process that caused the blue screen.

!irpzone full

An Interrupt Request Packet (IRP) is a data structure used by device drivers and other kernel mode modules to communicate information to each other. The !irpzone full command displays a list of all the pending IRPs on the system. The following information is displayed in this section:

****************************************************************
** !irpzone full
_**************************************************************_
Small Irp list
Irp is from zone and active with 1 stacks 1 is current
No Mdl System buffer = fb564000 Thread fb5688a0: Irp stack trace. 
cmd flg cl Device File Completion-Context
> d 0 1 fb56a030 fb56cd48 00000000-00000000 pending
\FileSystem\MacSrv
Args: 00001000 00000000 00121020 00000000
Large Irp list
Irp is from zone and active with 4 stacks 5 is current
No Mdl Thread fb4b6860: Irp is completed. Pending has been returned
cmd flg cl Device File Completion-Context
0 0 0 00000000 00000000 00000000-00000000 

Args: 00000000 00000000 00000000 00000000
0 0 0 00000000 00000000 00000000-00000000 

Args: 00000000 00000000 00000000 00000000
0 0 0 00000000 00000000 00000000-00000000 

Args: 00000000 00000000 00000000 00000000
d 0 0 fb5e3020 00000000 f8a8c711-fb48df10 
\FileSystem\Ntfs SrvCompleteRfcbClose
Args: 00000000 00000000 00000000 00000000

Each entry lists information about a different IRP and points to the driver that currently owns the IRP. This information can be useful when the trap analysis (which occurs later in the Memory.txt file) points to a problem with a corrupted or bad IRP. The IRP listing usually contains several entries in both the small and large IRP lists.

!process 0 0

This command lists all processes and their headers. The process header list will contain entries like the following:

****************************************************************
** !process 0 0
_**************************************************************_
NT ACTIVE PROCESS DUMP ****
PROCESS fb667a00 Cid: 0002 Peb: 00000000 ParentCid: 0000
DirBase: 00030000 ObjectTable: e1000f88 TableSize: 112.
Image: System

PROCESS fb5edde0 Cid: 0018 Peb: 7ffdf000 ParentCid: 0002
DirBase: 01587000 ObjectTable: e11d59a8 TableSize: 48.
Image: SMSS.EXE

The important information in the !process 0 0 section is:

Parameter	Meaning
Process ID	The 8-character hexadecimal number after the word PROCESS is the process ID. This is used by the system to track the process. For the first process in the example, this is fb667a00.
Image	The name of the module that owns the process. In the above example, the first process is owned by System, the second by Smss.exe.

!process 0 7

This command also lists process information. But instead of just listing the process header, the !process 0 7 command lists all information about the process, including all threads owned by each process. This is a very long listing because each system has a large number of processes and each process has one or more threads. In addition, if the stack from a thread is resident in kernel memory (as opposed to swapped to the page file), it is listed after the thread information. Most process and thread listings look like the following:

****************************************************************
** !process 0 7
_**************************************************************_
NT ACTIVE PROCESS DUMP ****

FakePre-6351e88d692a4eeba820e9188779a204-fcf5b800511744a1a9317daa533549dbFakePre-56d49c9f17354fd2a71cd426a2a770d9-8e404bd11fbb4daa85cabfdb7abe8882FakePre-04f3b468b46a400f8ba8af416b6f26a6-f0e35700bc854e459a9ec68e7f1dde8f

The following entries in the process information can be important:

Parameter	Meaning
UserTime	Lists the amount of time the process has been running in user mode. If the value for UserTime is exceptionally high, it might identify a process that is taking up all the resources and starving the system.
KernelTime	Lists the amount of time the process has been running in kernel mode. If the value for KernelTime is exceptionally high, it might identify a process that is taking up all the resources and starving the system.
Working Set Size	Lists the current, minimum, and maximum working set size for the process, in pages. An exceptionally large working set size can also be a sign of a process that is leaking memory or using too many system resources.
QuotaPoolUsage Entries	List the paged and nonpaged pool used by the process. On a system with a memory leak, looking for excessive nonpaged pool usage on all the processes can tell you which process has the memory leak.

In addition to the process list information, the thread information also contains a list of the resources on which the thread has locks. This information is listed right after the thread header. In this example, the thread has a lock on one resource, a SynchronizationEvent with an address of 80144fc0. By comparing this address to the list of locks shown in the !locks section, you can determine which threads have exclusive locks on resources.

Processor-Specific Information in Memory.txt

The following sections in the Memory.txt file occur once for each processor on the system. In a four-processor system, these sections will be repeated for processors 0 through 3. In addition, some traps generate a few extra sections, such as STOP 0x0000001E.

Register Dump for Processor #x

A dump of the state of all registers at the time of the trap is included in this section. For an x86-based system, it appears as follows:

****************************************************************
** Register Dump For Processor #0
_**************************************************************_
eax=ffdff13c ebx=00000000 ecx=00000000 edx=fb5a7db4 esi=00000d31 edi=00000d31
eip=8013b446 esp=f88b6de4 ebp=f88b6df8 iopl=0 nv up di pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
cr0=8001003b cr2=00000d31 cr3=00030000 dr0=00000000 dr1=00000000 dr2=00000000
dr3=00000000 dr6=ffff0ff0 dr7=00000400 cr4=00000000
gdtr=80036000 gdtl=03ff idtr=80036400 idtl=07ff tr=0028 ldtr=0000

For a RISC-based system, the register dump varies from processor type to processor type. The following example is from a DEC Alpha system:

v0=80006000 t0=00000000 t1=00000000 t2=800ef538
t3=00000008 t4=00000000 t5=800ec440 t6=00000000
t7=00000000 s0=c53f2000 s1=00000002 s2=00000001
s3=00000000 s4=00000001 s5=0018da83 fp=fc90f940
a0=00000002 a1=c53f2000 a2=c53f2000 a3=00000000
a4=00000000 a5=00000002 t8=800ed580 t9=80a4752c
t10=c53f2000 t11=80a4752c ra=8009b0bc t12=80a61ecc
at=a0000000 gp=800ed430 sp=fc90f890 zero=00000000
pcr=0000000008000000 softfpcr=0000000000000000 fir=800bf2fc
psr=0000000a
mode=0 ie=1 irql=2

In general, the register dump is valuable only if you are skilled in reading assembly language on the system you are debugging.

Stack Trace for Processor x

The next section includes a trace of the stack for that processor. The stack trace is important because it tells you what functions were called. You can use it to trace back from a trap to determine why it happened. Included right after each stack trace is a section of disassembled code from the area in memory around the last instruction in the stack. This information also looks different, depending on platform.

The first example is an excerpt from an x86-based computer on which a STOP 0x0000000A occurred:

****************************************************************
** Stack Trace
_**************************************************************_
ChildEBP RetAddr Args to Child
f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0])
f88b6df8 fb4a71a0 fb4a6028 f89805b0 fb55ea88 NTSend+0x142

8013B430: 8B 4D 64 mov ecx,dword ptr [ebp+64h]
8013B433: 83 E1 02 and ecx,2
8013B436: D1 E9 shr ecx,1
8013B438: 8B 75 68 mov esi,dword ptr [ebp+68h]
8013B43B: 56 push esi
8013B43C: 51 push ecx
8013B43D: 50 push eax
8013B43E: 57 push edi
8013B43F: 6A 0A push 0Ah
8013B441: E8 00 C6 FD FF call KiTrap0E+24Eh
--->8013B446: F7 45 70 00 00 02 test dword ptr [ebp+70h],offset KiTrap0E+255h
00
8013B44D: 74 0D je KiTrap0E+268h
8013B44F: 83 3D EC 05 14 80 cmp dword ptr [KiTrap0E+25Dh],0
00
8013B456: 0F 85 29 FE FF FF jne KiTrap0E+264h
8013B45C: 83 3D 38 49 14 80 cmp dword ptr [KiTrap0E+26Ah],0
00
8013B463: 0F 85 1C FE FF FF jne KiTrap0E+271h
8013B469: 83 3D C0 4D 14 80 cmp dword ptr [KiTrap0E+277h],0
00
8013B470: 0F 85 0F FE FF FF jne KiTrap0E+27Eh
8013B476: B8 FF 00 00 00 mov eax,offset KiTrap0E+283h
8013B47B: EB AC jmp KiTrap0E+235h
8013B47D: A1 52 F0 DF FF mov eax,[KiTrap0E+28Ah]
8013B482: C6 05 52 F0 DF FF mov byte ptr [KiTrap0E+290h],0

The arrow (--->) indicates the line in the assembly code at which the system trap occurred.

The most important information here is the stack trace at the top. This tells you in which part of the code the system trapped. Each line of a stack trace is a different instruction that has been pushed on the stack, with the first line being the last thing pushed on the stack. The following information is included in each line of an x86 stack trace:

Parameter	Meaning
ChildEBP	The base pointer. This is an address on the stack.
RetAddr	The return address. This is the address that the processor returns to when it finishes executing the current thread. This is also the address of the instruction on the next line of the stack.
Args to Child	The first three arguments passed to the function when it was called. These are usually pointers, but can also be other values.
Function name and offset	The final piece of information is a function name and an offset into that function that identifies the location, in code, whose address was pushed on the stack.

The next example is from a DEC Alpha system that experienced STOP 0x0000002E:

Callee-SP Arguments to Callee Call Site
fc8e4f90 80403e08 : 80ae1060 00000000 00000000 00000000 KeBugCheckEx+0x58
fc8e5290 800c3ce8 : 80ae1060 00000000 00000000 00000000 HalMachineCheck+0x198
fc8e52d0 800c33b8 : 80ae1060 00000000 00000000 00000000 KiMachineCheck+0x28
fc8e52e0 800c1c20 : 80ae1060 00000000 00000000 00000000 KiDispatchException+0x68
fc8e55e0 800c1bcc : 80ae1060 00000000 00000000 00000000 KiExceptionDispatch+0x50
fc8e5680 80409d4c : 80ae1060 00000000 00000000 00000000 KiGeneralException+0x4
fc8e5880 f7361344 : 80ae1060 00000000 00000000 00000000 READ_REGISTER_UCHAR+0x6c
fc8e5880 f71313c4 : 80ae1060 00000000 00000000 00000000 AtalkReceiveIndication+0x654
fc8e5930 f71361a4 : 80ae1060 00000000 00000000 00000000 EthFilterDprIndicateReceive+0x234
fc8e5990 f713218c : 80ae1060 00000000 00000000 00000000 MiniportSendLoopback+0xb14
fc8e5a30 f71308d8 : 80ae1060 00000000 00000000 00000000 MiniportSyncSend+0x20c
fc8e5a70 f73628c0 : 80ae1060 00000000 00000000 00000000 NdisMSend+0x158

800BC12C: B21DF170 stl a0,KeBugCheckEx+80x4(gp)
800BC130: 0000001C call_pal rdpcr
800BC134: A0000CA0 ldl v0,KeBugCheckEx+80x4(v0)
800BC138: 22000060 lda a0,KeBugCheckEx+80x5(v0)
800BC13C: D3406778 bsr ra,RtlCaptureContext
--->800BC140: 0000001C call_pal rdpcr
800BC144: A0000CA0 ldl v0,KeBugCheckEx+t0x5(v0)
800BC148: 22000060 lda a0,KeBugCheckEx+t0x6(v0)
800BC14C: D34006DC bsr ra,KiSaveProcessorControlState
800BC150: 0000001C call_pal rdpcr
800BC154: 45299801 xor s0,76,t0
800BC158: 221E00D0 lda a0,KeBugCheckEx+o0x7(sp)
800BC15C: A0000CA0 ldl v0,KeBugCheckEx+o0x7(v0)
800BC160: 223F0230 mov KeBugCheckEx+E0x78,a1
800BC164: 22400060 lda a2,KeBugCheckEx+o0x7(v0)
800BC168: D340803D bsr ra,OtsMove
800BC16C: 47EB0402 mov s2,t1

In an Alpha stack trace, the Callee-SP parameter serves the same purpose as the ChildEBP parameter in the x86 stack. The number right after the Callee-SP is the return address, and the next four numbers are the arguments that were pushed on the stack. The values for these are usually 0 because a RISC-based system uses special registers and does not pass arguments on the stack.

!process

A !process command without any parameters lists information on the process currently running on the active processor. Its output looks exactly the same as the output in the !process 0 7 section, except that it is only for one process, and no thread information is listed.

!thread

A !thread command without any parameters behaves exactly as a !process command without any parameters, and lists the thread that is currently running. The thread output looks exactly the same as the output in the !process 0 7 section.

Note There are three very similar versions of the same information so it is easier to find which thread(s) are currently executing. A !process 0 7 command lists all process and thread information, which results in 10–15 pages of data just for the process and thread output. Picking out the process or thread that is currently running from this long list can be difficult.

Dump Analysis Heuristics for Bugcode

This section appears in a dump for the processor that actually caused the trap only. This section includes information specific to the STOP code and can be very important. The exact information presented in this section varies for different STOP codes, but it lists the address at which the STOP occurred and any more information that is available.

This an example from STOP 0x0000000A:

****************************************************************
** Dump Analysis Heuristics for Bugcode IRQL_NOT_LESS_OR_EQUAL
_**************************************************************_
Invalid Address Referenced: 0x00000020
IRQL: 2
Access Type: Write
Code Address: 0xfa6325a5

This example is from a STOP 0x0000001E:

****************************************************************
** Dump Analysis Heuristics for Bugcode KMODE_EXCEPTION_NOT_HANDLED
_**************************************************************_
Exception Code: 0xc0000005
Address of Exception: 0x801704a7
Parameter #0: 0x00000001
Parameter #1: 0x00000001

Common STOP Codes

By looking through the Memory.txt output of common STOP codes, you can sometimes identify the module or driver that caused the problem. Given this information, you might be able to determine whether a service pack or update to Windows NT will fix the problem. In many cases, you will still need to contact support personnel, but looking at the Memory.txt output gives you an idea about what is wrong.

STOP 0x0000000A IRQL_NOT_LESS_OR_EQUAL

STOP 0x0000000A indicates that a kernel mode process or driver attempted to access a memory address that it did not have permission to access. The most common cause of this error is a bad or corrupted pointer to an incorrect location in memory. A pointer is a variable used by a program to refer to a block of memory. If the variable has an incorrect value in it, then the program tries to access memory that it should not be using.

When this occurs in a user-mode application, it generates an access violation.

When it occurs in kernel mode, it generates a STOP 0x0000000A message. This trap can be caused by either hardware or software. Contact support personnel to determine the exact cause.

To determine the general cause of a STOP 0x0000000A message, look at the Stack Trace for Processor X section of the Memory.txt file. If you have a multiprocessor system, check the output for all processors and look for a stack trace that has a line similar to the following at the top of the stack:

ChildEBP RetAddr Args to Child
f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0])

This is the processor on which the trap occurred. After the stack trace section, additional information on the trap appears in the Dump Analysis Heuristics section. To determine the module that caused the trap, look at the line on the stack trace occurring immediately after the line in the preceding example. This line is usually the line of code that caused the trap. From this information, you can identify the module in which the trap occurred. For example, the top lines of the stack trace can read:

ChildEBP RetAddr Args to Child
fa679758 fa6325a5 fcdb0b58 fccd3770 02611e6c KiTrap0E+0x252
fa6797e0 fa63ae8e fcc37528 fa67992e fccd3770 FindNameOrQuery+0x141
fa679838 fa6444a5 fa679854 fa6a33d0 fa6798d0 NbtConnect+0x3ae
fa679860 fa630393 fccd3770 fcdb2e08 fa679900 NTConnect+0x2b

The first line of the stack trace contains the reference to KiTrap0E and the second line contains FindNameOrQuery+0x141, which means that the processor trap occurred in the function FindNameOrQuery.

STOP 0x0000001E KMODE_EXCEPTION_NOT_HANDLED

STOP 0x0000001E can also be caused by either hardware or software. It is caused by hardware more often than a STOP 0x0000000A is, but can be caused by software.

When looking at dumpexam output from STOP 0x0000001E, you see two stack trace listings for the processor on which the STOP occurred. The first listing is the stack after the trap occurred, which shows only the kernel calls made to handle the trap and does not include any information about what code caused the trap.

The second listing shows the stack just before the trap occurred. This is the listing you use for your analysis. The register dump for the processor is also duplicated, with the first dump showing the status of the registers after the trap and the second showing the state of the registers when the trap occurred. These two sets of information are separated by a section that looks like the following:

****************************************************************
** !exr fca49c20
_**************************************************************_
Exception Record @ FCA49C20:

ExceptionCode: c0000005
ExceptionFlags: 00000000
Chained Record: 00000000
ExceptionAddress: 801704a7
NumberParameters: 00000002

Parameter[0]: 00000001
Parameter[1]: 00000001

This section includes the following information:

Parameter	Meaning
ExceptionCode	A status code that identifies what type of exception occurred. In this case, the code is c0000005, which indicates an access violation. To find out what a particular status code means, contact support personnel.
ExceptionAddress	The address of the instruction that caused the STOP.

The first stack trace from STOP 0x0000001E, the one that does not provide any useful information, looks like the following:

ChildEBP RetAddr Args to Child
fca49968 8013387e fca49990 801367ab fca49998 PspUnhandledExceptionInSystemThread+0x18 (FPO: [0,0,0])
fca49970 801367ab fca49998 00000000 fca49998 PspSystemThreadStartup+0x4a (FPO: [0,0,0])
fca49f7c 8013e452 fca54bae 00000001 00000000 _except_handler3+0x47
00000000 00000000 00000000 00000000 00000000 KiThreadStartup+0x16

To determine where the trap occurred, ignore this stack and look at the second listing, after the !exr entry. The first line in this listing indicates the location in code that caused the trap.

With STOP 0x0000001E, it is also useful to compare the exception address listed in the !exr section to the list of device drivers in the !drivers section of the Memory.txt file. If the trap was caused by a specific driver, this address falls into the address range in the drivers list. If this is the case, it can indicate a problem either with the device that the driver controls or with the driver itself. Here is an example:

FramePtr RetAddr Param1 Param2 Param3 Function Name
fa1bcda4 8010e244 fcff3940 00000000 00000220 NT!PsReturnPoolQuota+0xe
fa1bcdd4 80117085 fcbee668 fcddf648 fcbff020 NT!ExFreePool+0x16c
fa1bce24 8011c60b fcddf648 fa1bce58 fa1bce54 NT!IopCompleteRequest+0xbd
fa1bce5c 8013de15 00000000 00000000 00000000 NT!KiDeliverApc+0x83
fa1bce7c 8011a1ce 00000000 00000000 80179a01 NT!@KiSwapThread@0+0x15d
fa1bcea0 80179b3f fcc4bf60 00000006 80179a01 NT!KeWaitForSingleObject+0x1c2
fa1bcef0 80139b09 00000114 00000001 00000000 NT!NtWaitForSingleObject+0xaf
fa1bcef0 77f893eb 00000114 00000001 00000000 NT!KiSystemService+0xa9
00000000 00000000 00000000 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb

STOP 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP

STOP 0x0000007F usually occurs in the processor itself and almost always indicates a hardware fault. There are several kinds of STOP 0x0000007F, which you can determine by the first parameter of the STOP code, found in the Windows NT Crash Dump Analysis section at the beginning of the Memory.txt file.

The following are common kernel mode traps:

First Parameter	Meaning
0x00000000	Divide by zero error
0x00000004	Arithmetic overflow
0x00000006	Invalid opcode
0x00000008	Double fault

A divide by zero error is caused when a DIV instruction is executed and the divisor is 0. This can be caused by problems which need to be investigated further, such as memory corruption, hardware problems, or software failures.

Here's an example of a divide by zero error:

ChildEBP RetAddr Args to Child
8019d778 8013cdcc fe483688 00000000 00000000 NT!_KiSystemFatalException+0xe
(FPO: [0,0] TrapFrame @ 8019d778)
8019d7e8 fbb053be 0001440d 000004a9 000004a9 NT!_RtlEnlargedUnsignedDivide+0xc
(FPO: [4,0,0])
8019d80c 8010f613 0001440d 000004a9 fe482bd0 bhnt!_BhStationQueryTimeout+0x44
(FPO: [4,0,1])
8019d820 fb910aa6 fe50a000 fe44255a fe44254c NT!_KeSetTimer+0x8f
8019d85c fb9409b3 fe4820c8 fe44255a fe44254c
NDIS!_EthFilterDprIndicateReceive+0x111
8019d894 fb94044a fe482b98 fe483688 ffdff401 netflx!NetFlexProcessEthRcv+0x85
8019d8ac fb910ba1 fe482aa8 fb910b30 00000001
netflx!_NetFlexHandleInterrupt+0x4a
8019d8c4 80137c06 fe482bac fe482b98 00000000 NDIS!_NdisMDpc+0x71 (FPO: [EBP
0xfb910b30] [4,0,4])
fb910b30 18247c8b 8b34778b 4e8d106f d015ff30 NT!_KiIdleLoop+0x5a
kd> !trap 8019d778
eax=0001440d ebx=00000003 ecx=8019d81c edx=000004a9 esi=fe4820c8 edi=fe46a188
eip=8013cdcc esp=8019d7ec ebp=8019d820 iopl=0 nv up ei pl zr na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
ErrCode = 00000000
8013cdcc f774240c div dword ptr [esp+0xc]

An arithmetic overflow error occurs when the result of a multiplication operation is larger than a 32-bit integer. This error can be caused by a software failure, but it is also frequently a hardware problem.

An invalid opcode error occurs when the processor attempts to execute an instruction that is not defined. This error is almost always caused by hardware memory corruption. If you receive this error, run memory diagnostics on your regular memory and both L1 and L2 cache memory.

A double fault trap occurs when two kernel-mode traps occur simultaneously and the processor is unable to handle them. This trap is almost always caused by hardware failure.

If a particular trap can be caused by either software or hardware, more analysis is required to determine which is the cause. If you suspect a hardware problem, try the following hardware troubleshooting steps:

Run diagnostic software to test the RAM in the computer. Replace any RAM reported to be bad. Also, make sure that all the RAM in the computer is the same speed.
Try removing or swapping controllers, cards, or other peripherals.
Try a different motherboard on the computer.

Chapter 39 - Windows NT Debugger

Debugging Terms

Kernel STOP Error, Blue Screen, or Trap

Symbols and Symbol Trees

Target Computer

Host Computer

Debugging Overview

Kernel Debuggers

Dump Analysis Utilities

Setting Up for Debugging

To prepare for debugging

Setting Up a Remote Debugging Session on an Intel-Based Computer

Booting the Target Machine

Setting Up the Modem on the Target Machine

Editing the Boot.ini File on the Target Machine

To Change the Attributes of the Boot.ini File

To Configure the Boot Options in the Boot.ini File

Setting Up a Remote Debugging Session on a RISC-Based Computer

To configure the target machine

Setting Up a Local Debugging Session on a Host Computer

Setting Up for Local Debugging

Setting Up a Null-Modem Connection

Setting Up the Symbol Tree on the Host

Setting Up the Debugger Files on the Host

Starting the Debugger on the Host

Using the Remote Utility to Start the Debugger

Examples

Creating a Memory Dump File

To configure Windows NT to save STOP information to a memory dump file

Using Utilities to Process Memory Dump Files

Dumpflop

Dumpchk

Dumpexam

Syntax for Dumpexam

Examples

Using the Dumpexam Output File

Windows NT Crash Dump Analysis

Symbol File Load Log

!drivers

!locks

!memusage

!vm

!errlog

!irpzone full

!process 0 0

!process 0 7

Processor-Specific Information in Memory.txt

Register Dump for Processor #x

Stack Trace for Processor x

!process

!thread

Dump Analysis Heuristics for Bugcode

Common STOP Codes

STOP 0x0000000A IRQL_NOT_LESS_OR_EQUAL

STOP 0x0000001E KMODE_EXCEPTION_NOT_HANDLED

STOP 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP

Additional resources