Chapter 39 - Windows NT Debugger
Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
This chapter first defines debugging terminology and provides an overview of debugging on Windows NT. Next, it describes setting up the computers for debugging. This chapter goes into how to create a memory dump file, the utilities that you can use to process the memory dump file, and interpreting the information in the memory dump file**.**
For Windows NT versions 3.51 and 4.0, Windbg, the utility used for reading memory dump files in earlier Windows NT releases, was replaced with a set of utilities that automatically read and interpret memory dump files. These new utilities simplify the process of dealing with kernel memory dump files and aid in sending memory dump files to support personnel for advanced analysis.
New material about the debugger and information about using the output from the Dumpexam utility is also included in this chapter.
Debugging Terms
This section defines some common terms and procedures you need when you debug kernel STOP errors.
Kernel STOP Error, Blue Screen, or Trap
When Windows NT encounters hardware problems, inconsistencies within data necessary for its operation, or other similar errors, the operating system processes the error based upon the information entered in the Recovery dialog box. For information about the Recovery dialog box, see "Creating a Memory Dump File," later in this chapter.
If the user did not select Automatically reboot in the Recovery dialog box, Windows NT displays a blue screen containing error information, then stops.
Knowledge Base articles and other Windows NT documentation sometimes refer to this type of error as blue screen, kernel error, or even trap. This chapter uses the term kernel STOP error. However, if the context specifically refers to Windows NT stopping with the blue screen displayed, the term blue screen is used instead. The term trap is used in this chapter to mean that the kernel has detected an error and might write a memory dump file as part of its processing of the error.
Symbols and Symbol Trees
Usually, when code is compiled, one of two versions of the executable file can be created: a debug (also known as checked) version, or a nondebug (also known as free) version. The checked version contains extra code that enables a developer to debug problems, but this means a larger and possibly slower executable file. The free version of the executable file is smaller and runs at a normal speed, but cannot be debugged.
Windows NT combines the speed and smaller size of free versions with the debugging capabilities of the checked versions. All executable files, drivers, dynamic-link libraries, and other program files in Windows NT are the free versions. However, each program file has a corresponding symbol file, which contains the debug code that is normally part of the checked file. These symbol files are on the Windows NT Server product CD, in the Support\Debug\Platform\Symbols directories, where Platform is I386, Alpha, MIPS, or PowerPC. Within each Symbols directory, there is one directory for each type of file (such as .exe, .dll, and .sys). This structure is referred to as a symbol tree. Table 39.1 describes directories that exist in a standard symbol tree.
Directory |
Contains symbols for |
---|---|
ACM |
Microsoft Audio Compression Manager files |
COM |
Executable files (.com) |
CPL |
Control Panel programs |
DLL |
Dynamic-link library files (.dll) |
DRV |
Driver files (.drv) |
EXE |
Executable files (.exe) |
SCR |
Screen-saver files |
SYS |
Driver files (.sys) |
All of the utilities used to debug Windows NT or interpret memory dump files require a symbol tree containing the symbol files for the version of Windows NT you were running at the time of the kernel STOP error. With some utilities, you need the \Symbols directory to be on your hard drive, in the \Systemroot directory. With other utilities, you can specify the path to the \Symbols directory as a command-line option or in a dialog box.
Target Computer
The term target computer refers to the computer on which the kernel STOP error occurs. This computer is the one that needs to be debugged. It can be a computer located within a few feet of the computer on which you run the debugger, or it can be a computer that you dial in to by using a modem.
Host Computer
The term host computer refers to the computer on which you run the debugger. This computer should run a version of Windows NT that is at least as recent as the one on the target computer.
Debugging Overview
There are three approaches you can take to finding the cause of kernel STOP errors:
Set up a remote debug session with the Microsoft Support Network. This process is needed if a memory dump file cannot be generated or if the target computer halts with a STOP screen. The connection process involves configuring your target computer for a connection (modem to modem) to a host computer located at Microsoft.
Set up a local debug session with Microsoft Support Network by using a Remote Access Service (RAS) server. This process is needed if a memory dump file cannot be generated or if the target computer halts with a STOP screen. The connection process involves using a null modem cable to configure both your target computer and your host computer. The host is then networked to a RAS server and the debugging information is sent to Microsoft over an asynchronous connection. You can also analyze the debugging information at your host computer.
Set up your target computer to write the contents of its RAM to a memory dump file when a kernel STOP error occurs. You can then use the dump analysis utilities to analyze the memory dump, or send the memory dump file to technical support personnel for their analysis.
Kernel Debuggers
The Windows NT kernel debuggers — I386kd.exe, Alphakd.exe, Mipskd.exe, and Ppckd.exe — are 32-bit executable files that are used on the host computer to debug the kernel on the target computer. Each host hardware platform has its own set of utilities, which are provided on the Windows NT product CD in the \Support\Debug directory.
The kernel debuggers can be used for either remote or local kernel debugging. If you use local kernel debugging, the host computer is located within a few feet of the target computer and the two computers communicate through a null modem serial cable. If you use remote kernel debugging, the host computer can be at any distance from the target computer because communication takes place through modems.
The host and target computers send debugging information back and forth through their communications ports. The ports on both computers must be configured to pass data at the same rate in bits per second (bps).
After a blue screen appears, record the important information in the message, then restart the computer. You might need to configure the target computer for local or remote debugging and reboot it a second time. You can then continue running Windows NT until the message is displayed again. After the blue screen is displayed the second time, call your technical support group and request assistance with the debugging. They can decide whether to debug the kernel STOP error locally or remotely and instruct you to configure your system appropriately.
Dump Analysis Utilities
To use the Windows NT dump analysis utilities, you must first configure your computer to write a memory dump file when it gets a kernel STOP error. Use the Recovery dialog box to configure the target computer to write the memory file, as described in the section "Creating a Memory Dump File" later in this chapter. This file preserves information about the state of the computer at the time of the kernel STOP error. The memory dump file can be used by the dump analysis utilities to troubleshoot the problem. If you use this option, you can run the dump analysis utilities on any Windows NT–based computer after you load the memory dump file, including the computer on which the kernel STOP error occurred.
This approach is usually the best for a computer running Windows NT Server because it minimizes the amount of time the server is unavailable. The default for a Windows NT Server–based computer is to automatically restart after writing an event to the system log, then alert administrators and dump system memory to the Memory.dmp file. Because of this, to preserve memory dump files, you rename the newest one each time a kernel STOP error occurs. You can then run the dump analysis utilities and send the information to your technical support group for processing.
Setting Up for Debugging
If you decide to use the kernel debugger to analyze the kernel STOP error, you need to set up the host and connect your host and target computers. To do this, you use either a null modem cable for a local debug session or a modem cable for a remote debug session. Before you can start debugging, you must complete several steps.
To prepare for debugging
Set up the modem connection.
Configure the target system for debugging.
Set up a symbol tree on the host system.
Set up the debugger on the host system.
Start the debugger on the host system.
Note None of the procedures in this section are necessary if you use the Recovery dialog box to create a memory dump file. For information about that alternative, see "Creating a Memory Dump File," later in this chapter.
Setting Up a Remote Debugging Session on an Intel-Based Computer
If you enable the kernel debugger on your target computer, it sends debugging information to a host computer for a remote user to analyze. A support engineer often requests this to help analyze a fatal error in Windows NT that cannot be diagnosed from the Memory.dmp file or if a Memory.dmp file is not produced.
The process of remote debugging occurs when two computers are connected by means of modems over a phone line. The target and the host computer can thus communicate by using a special debugging API and protocol.
The following figure shows the connection between the host and the target computer for a remote debugging session.
Figure 39.1 Remote Debugging
To configure a system for remote debugging, you change the boot options to set Windows NT to load the kernel debugger. On an X86–based platform, you do this by editing the Boot.ini file. On a RISC-based system (DEC Alpha, MIPS and PowerPC processors), you change the boot options in the firmware menu. You must also connect an external modem to the appropriate COM port on the target computer and connect an inbound phone line to the modem.
Booting the Target Machine
If the target computer stops at a blue screen every time you boot it, or does not keep running long enough for you to edit the Boot.ini file to enable the debugger, you can try these options:
If your boot partition is FAT, you can start MS-DOS from a boot floppy disk and use the MS-DOS-based editor to edit Boot.ini.
If your boot partition is NTFS (or HPFS, if you are running Windows NT version 3.1 or 3.5), you can install Windows NT on a different partition and boot from that partition. (You must use this method because you cannot access files on an NTFS or HPFS partition from MS-DOS.)
If you previously created a Windows NT boot recovery disk for the workstation that has the problem, you can use this disk on another machine to edit the Boot.ini file, and then boot the target machine.
Setting Up the Modem on the Target Machine
To set up a remote debugger session, you must connect an external modem to the target machine and reconfigure the modem parameters to meet the requirements of the kernel debugger. To configure the modem, you must be able to run Terminal.exe or some other communications program. If you are unable to run these programs on the target machine, connect the modem to a computer that is close to the target machine. Make sure you can move the modem back to the target machine without losing power to the modem. An internal modem does not work because rebooting the system resets the configuration changes you have made to the modem.
The modem must be connected to a spare COM port and must be configured as shown in the following table:
Auto answer mode |
On |
Hardware compression |
Disabled |
Error detection |
Disabled |
Flow control |
Disabled |
Baud rate 9600 bps for x86-based system and 19200 bps for RISC-based system. |
|
Consult your modem documentation for the correct string values to send to the modem during the configuration process. The following table gives an example of how to configure a USRobotics modem for a remote debugging session.
Function |
String Value |
---|---|
Set Back to Factory Defaults |
AT & F |
Disable Transmit Data Flow Control |
AT & H0 |
Disable Receive Data Flow Control |
AT & I0 |
Disable Data Compression |
AT & K0 |
Disable Error Control |
AT & M0 |
Auto Answer On |
ATS0=1 |
Disable Reset Modem on Loss of DTR |
AT & D0 |
Write to NVRAM |
AT & W |
To configure the modem
Connect the modem to an unused COM port on the target machine or on another computer that is close enough to the target machine to connect by using a standard modem cable.
Note If you connect the modem to a computer other than the target machine, make sure you can move the modem back to the target COM port without removing power from the modem.
Run Terminal.exe or some other communications program to configure the modem parameters.
Set the modem speed to 9600 bps. See your modem documentation to find out how to do this.
Turn off all hardware compression, flow control, and error detection.
How to do this varies widely from modem to modem. See your modem documentation for the correct strings to send to the modem.
Enable auto-answer by sending the string ATS0=1 to your modem. Consult your modem documentation to verify that this will work with your modem.
If the modem was configured on a machine other than the target computer, move it to the target computer without removing the power from the modem.
Editing the Boot.ini File on the Target Machine
To configure a target system for a remote or local remote debugging, you edit the boot options in the Boot.ini file to tell Windows NT to load the kernel debugger.
Debugger Options
The following table lists the boot options that can be used to configure the system for debugging. These options are the same on Intel X86 and RISC platforms, but the slash (/) is not required when used on a RISC platform.
/Debug |
Causes the kernel debugger to be loaded during boot and kept in memory at all times. This means that a support engineer can dial into the system being debugged and break into the debugger, even when the system is not suspended at a kernel STOP screen. |
/Debugport |
Specifies the serial port to be used by the kernel debugger. If no serial port is specified, the debugger will default to COM2 on Intel X86-based computers and to COM1 on RISC computers. |
/Crashdebug |
Causes the kernel debugger to be loaded during boot but swapped out to the pagefile after boot. As a result, a support engineer cannot break into the debugger unless Windows NT is suspended at a kernel STOP screen. |
/Baudrate |
Sets the speed that the kernel debugger will use in bits per second. The default rate is 19200 bps. A rate of 9600 bps is the normal rate for remote debugging over a modem. |
When you use Debugport or Baudrate, you need not use Debug, as Windows NT assumes that the computer will load in Debug mode. You must use at least one of the options described in Table 39.1 to configure a computer for remote debugging. Otherwise, Windows NT does not load the debugger at all.
To set up the target computer on an Intel X86-based computer, edit the Boot.ini file by using a standard ASCII text editor and add the appropriate debugger options to the file. The Boot.ini file is located in the system root directory (usually the C drive) and has the Hidden, System, and Read-Only attributes set. These attributes must be changed.
To Change the Attributes of the Boot.ini File
Type the following at a command prompt:
attrib -s -h -r c:\boot.ini
To restore the Read-Only, Hidden, and System attributes when you finish debugging the system, type the following at a command prompt:
attrib +h +r +s c:\boot.ini \
To Configure the Boot Options in the Boot.ini File
To configure the target computer for remote or local debugging, add the /Debug and /Baudrate options to the Boot.ini file. If you cannot use the default COM port (COM 2) for debugging, use /Debugport=COMx where x is the COM port number. Use the MS-DOS-based Editor to edit the Boot.ini file.
At a command prompt, type:
edit boot.ini
The Boot.ini file appears in the MS-DOS Editor window. It looks similar to this:
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"
[VGA mode] /BASEVIDEO
C:\="MS-DOS"
Select the startup option that you normally use and add the /Debug option at the end of the line.
To specify the communications port, add the option /Debugport=comx where x is the communications port that you want to use.
Add the option /Baudrate=9600.
This is the output if the Boot.ini file after it has been modified by steps 1-4:
[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0" /debug /debugport=com1 /baudrate=9600
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows NT Version 4.0"
[VGA mode] /BASEVIDEO
C:\="MS-DOS"
Save the Boot.ini file and quit the text editor or the MS-DOS Editor.
Restart the computer to run under Windows NT.
Your technical support group can now call the modem to establish the remote debugging session.
Setting Up a Remote Debugging Session on a RISC-Based Computer
To prepare a RISC-based computer for a remote or local kernel debugging session, you edit one line in a startup file. But you access that file in a different way. The procedure for all Alpha systems is the same. The options you use to configure the PowerPC-based system are the same as the options you select to configure the MIPS-based system. However, the path to the firmware menus may vary for MIPS-based and PowerPC-based systems.
On RISC-based computers, the default COM port is always COM1, and the default speed is always 19200 bps.
Before you begin the procedure to configure the rarget machine, make sure you set it up properly for communication. If you cannot run Terminal.exe or any other communications programs on the target machine, connect the modem to a computer that is near the target machine. Make sure that you can move the modem back to the target machine without removing the power to the modem.
All modem parameters are configured for a RISC-based computer in the same way as they are for an X86-based system with the exception of the modem speed. The default speed is always 19.2 kbps for a RISC-based system. For more information, see "Setting up the Modem on the Target Machine," earlier in this chapter.
After you have set up your computer for communication, restart the computer. The ARC System screen appears, displaying the main menu, from which you can select an action. Now you are ready to configure.
To configure the target machine
On a MIPS RISC-based system, select Run Setup to display the Setup menu, then select Manage Startup. A menu of boot options appears.
On a Digital Alpha AXP RISC-system or a PowerPC RISC-based system, select the menu options listed in the following table to get to the Boot selections menu.
On Menu
Select
System Boot
Supplementary menu
Supplementary
Setup the system
Setup
Manage boot selections
On the Boot Selections menu, select Change a Boot Selection. A list of the operating systems that are installed on this computer appears.
From the list of operating systems, select the Windows NT operating system. If you have more than one version of Windows NT installed, select the version that you want to debug.
A two-part screen appears with options for changing the current settings of the environment variables used to start the RISC-based computer. The environment variable that controls whether or not the RISC-based computer starts up in debug mode is the OSLOADOPTIONS variable.
Select the OSLOADOPTIONS variable from the list of environment variables.
You edit the value of the OSLOADOPTIONS variable to control whether the RISC-based computer starts up in debug mode.
After you select OSLOADOPTIONS, it appears in the Name box at the top of the screen.
Press ENTER to display the Value box.
Type the options that you want to add in the Value box separated by spaces. Press ENTER to save them and to turn on the debug mode.
You can also add a value that explicitly sets the communications port, as in the following example:
OSLOADOPTIONS debug debugport=com2
If you do not specify the debug port, the default debug port is set to COM1. Because RISC– based computers allow only a default modem speed of 19.2 Kbps, you do not need to specify the baud rate.
Press Esc to stop editing.
Return to the ARC System screen by using the method for your system:
System
Procedure
MIPS RISC and PowerPC RISC
Select Return to Main Menu, then Exit.
Digital Alpha AXP
Select Supplementary Menu, save your changes, then select Boot Menu.
If this is the first time that you have debugged a Digital Alpha AXP RISC–based system, follow these steps after connecting the local host computer to the target:
Shut down both computers.
Restart the host (debugger) computer.
Run Alphakd.exe on the local host.
Restart the target (Digital Alpha AXP RISC-based) computer while Alphakd.exe is running on the host computer to set up configuration information on the target computer, and prepare it for either local or remote debugging.
Note After you complete steps 1-4, you can use either a local or a remote host to debug the target.
To run under Windows NT, restart the RISC-based computer.
You may now contact your technical support group or a trained technician and have them call the modem to establish a remote debugging session.
Setting Up a Local Debugging Session on a Host Computer
You need a local debug session for debugging in cases where a user-mode .dll or a device driver is causing server crashes. In such a case, you use a user-mode debugger (such as NTSD) and you build the server symbols on the host computer.
You can also use this setup if your Remote Access Service (RAS) account allows a Microsoft Support engineer to dial into your network and debug the computer. This debug option overcomes many modem-related issues.
You use a local debug setup in cases where:
You debug a user-mode component in Windows NT by using NTSD or CDB.
A live remote debug does not work because of modem connection issues.
Customer has worked with a senior ESS debug engineer and the situation warrants a local debug session.
To debug a Windows NT–based target computer by using a local host system, you need to:
Connect the host and the target computers by using a null-modem serial cable.
Set up a symbol tree on the local host computer to match the version of Windows NT that resides on the target computer. If you are using NTSD or CDB, you will need to set up a symbol tree on the target computer, in the directory %SYSTEMROOT%\Symbols.
Set up the debugging files on the host computer.
Start the debugger on the host.
Figure 39.2 shows the connection between the host and the target computer for a local debugging session. It also shows how to use your RAS account to connect to the Microsoft Support Network for help in analyzing the debug information.
Figure 39.2 Local Debugging
Setting Up for Local Debugging
To set up for a local debugging session, you use a null-modem cable to connect the target and the host machines. For an x86-based system, the boot options in the Boot.ini file must be configured on the target machine to invoke the debugger and to set the data transfer rate between the target computer and the host computer. On a RISC-based system, the boot options are configured from a firmware menu.
For information on configuring the boot options for an x86-based system, see "Editing the Boot.ini File on a Target Machine," earlier in this chapter. For information on configuring a RISC-based system for a local debug session, see "Setting Up a Remote Debugging Session on a RISC-Based Computer," earlier in this chapter.
Be sure to start the host computer before restarting the target computer.
Setting Up a Null-Modem Connection
A modem is not used in a local debug session. Therefore, the procedure for setting up the null-modem cable is the same on both the host computer and target computer.
A standard, commercially available null-modem serial cable has this configuration:
Transmit Data connected to Receive Data
Receive Data connected to Transmit Data
Ground connected to Ground
For 9-pin and 25-pin D-subminiature connectors (known as db9 and db25, respectively), the cable connects as follows:
Pin 2 to pin 3
Pin 3 to pin 2
Pin 7 to pin 7
The debugger on the host does not depend on any control pins (such as Data Terminal Ready, Data Set Ready, Request To Send, or Clear To Send). However, you might need to put a jumper in the connectors on both ends of the cable from Data Terminal Ready to Data Set Ready and from Request To Send to Clear To Send, as follows:
Connector |
Jumpers |
---|---|
db9 |
From pin 4 to pin 6 and from pin 7 to pin 8 |
Db25 |
From pin 20 to pin 6 and from pin 4 to pin 5 |
Connect the null-modem cable to an unused serial port on both the host computer and the target computer.
Setting Up the Symbol Tree on the Host
You set up the symbol tree on the host machine to match the version of Windows NT that you are running on the target computer.
The Windows NT Server and Windows NT Workstation product CDs come with symbol trees already created. They are in the Symbols directories on the CD under Support\Debug\platform, where platform is I386, Alpha, MIPS, or PowerPC. The platform sprcification must match your target computer.
If you have not installed any service packs or hot fixes and do not have a multiprocessor system, you might need to specify only the path to the correct Symbols directory on the CD, or copy that directory to \Systemroot and use this as the symbol path.
If you have installed service packs or hot fixes to Windows NT, or if you are using any HAL (Hardware Abstraction Layer) other than the standard, single-processor HAL, you must construct a symbol tree.
To construct a symbol tree
Copy the correct tree from the Support directory on the CD to your hard drive.
Copy the symbols into this tree for the updates you have applied in the same order in which you applied the updates, so that the later versions overwrite the earlier versions.
If you are using kernel debuggers to debug a multiprocessor system, or a single-processor system that is using a special HAL, you must rename some of the symbol files. The rest of this section discusses what to rename and how to rename it.
The kernel debuggers always load the files named Ntoskrnl.dbg for kernel symbols and Hal.dbg for HAL symbols. Therefore, you need to determine which kernel and HAL you are using, and rename the associated files to these filenames.
If you have a multiprocessor computer, you only need to rename Ntkrnlmp.dbg to Ntoskrnl.dbg. These files are in the \Exe subdirectory of the symbol tree.
If your computer uses a special HAL, there are a number of possibilities. Tables 39.2-39.5 list the possible HAL files for each hardware platform. These tables list the actual name of the .dll file as it exists on the product CD and the uncompressed size of the file in bytes. Each .dll file has a corresponding .dbg file, which is in the \Dll subdirectory of the symbol tree. Determine which HAL you are using, and rename the associated .dbg file to Hal.dbg. If you are not sure which HAL you are using, compare the file size in the table with the Hal.dll file on the target system. The Hal.dll file can be found in \Systemroot\System32.
Filename |
Uncompressed size (bytes) |
Description |
---|---|---|
Hal.dll |
52,768 |
Standard HAL for Intel systems |
Hal486c.dll |
51,712 |
HAL for 486 c Step processor |
Halapic.dll |
68,096 |
Uniprocessor version of Halmps.dl |
Halast.dll |
49,328 |
HAL for AST® SMP systems |
Halcbus.dll |
87,328 |
HAL for Cbus systems |
Halcbusm.dll |
85,376 |
|
Halmca.dll |
49,696 |
HAL for MCA-based systems (PS/2® and others) |
Halmps.dll |
70,240 |
HAL for most Intel multiprocessor systems |
Halmpsm.dll |
69,184 |
|
Halncr.dll |
83,920 |
HAL for NCR® SMP computers |
Haloli.dll |
42,992 |
HAL for Olivetti® SMP computers |
Halsp.dll |
56,592 |
HAL for Compaq Systempro® |
Halwyse7.dll |
43,728 |
HAL for WYSE7 systems |
Filename |
Uncompressed size (bytes) |
Description |
---|---|---|
Hal.dll |
60,160 |
Standard HAL for DEC Alpha systems |
Hal0jens.dll |
60,160 |
Digital DECpc AXP 150 HAL |
Halalcor.dll |
69,120 |
Digital AlphaStation 600 Family |
Halavant.dll |
69,856 |
Digital AlphaStation 200/400 Family HAL |
Haleb164.dll |
84,768 |
|
Haleb64p.dll |
76,320 |
Digital AlphaPC64 HAL |
Halflex.dll |
89,472 |
|
Halgammp.dll |
82,560 |
Digital AlphaServer 2x00 5/xxx Family HAL |
Halx3.dll |
79,072 |
|
Halmikas.dll |
73,184 |
Digital AlphaServer 1000 Family Uniprocessor HAL |
Halnonme.dll |
68,320 |
Digital AXPpci 33 HAL |
Halqs.dll |
68,000 |
Digital Multia MultiClient Desktop HAL |
Halrawmp.dll |
93,280 |
|
Halsabmp.dll |
78,496 |
Digital AlphaServer 2x00 4/xxx Family HAL |
Halxl.dll |
81,568 |
|
Filename |
Uncompressed size (bytes) |
Description |
---|---|---|
Hal.dll |
41,856 |
Standard HAL for MIPS |
Halacr.dll |
42,496 |
ACER HAL |
Haldti.dll |
66,240 |
DESKStation Evolution |
Halduomp.dll |
41,536 |
Microsoft-designed dual MP HAL |
Halflex.dll |
96,640 |
|
Halfxs.dll |
41,856 |
MTI with an R4000 or R4400 |
Halfxspc.dll |
41,984 |
MTI with an R4600 |
Halnecmp.dll |
47,040 |
NEC® dual MP |
Halntp.dll |
140,096 |
NeTpower FASTseries |
Halr94a.dll |
193,760 |
|
Halr96b.dll |
194,432 |
|
Halr98mp.dll |
108,608 |
NEC 4 processor MP |
Halsni4x.dll |
99,936 |
Siemens Nixdorf UP and MP |
Halsnip.dll |
116,864 |
|
Haltyne.dll |
65,888 |
DESKStation Tyne |
Filename |
Uncompressed size (bytes) |
Description |
---|---|---|
Halcaro.dll |
234,240 |
HAL for IBM-6070 |
Haleagle.dll |
211,232 |
HAL for Motorola PowerStack and Big Bend |
Halfire.dll |
292,384 |
Hal for Powerized_ES, Powerized_MX, and Powerized_MX MP |
Halppc.dll |
233,600 |
HAL for IBM-6015 |
Halps.dll |
207,552 |
|
Halvict.dll |
244,896 |
|
Halwood.dll |
233,888 |
HAL for IBM-6020 |
In some cases, a HAL file might have been supplied by your computer manufacturer. If so, you need to obtain symbols for the file from the manufacturer, rename that symbol file to Hal.dbg, and place it in the \Dll subdirectory of the symbol tree. For example, Compaq provides updated HAL files for their Proliant™ systems. This also applies if you have drivers from third-party sources. Obtain symbols from your third-party vendor and put them in the appropriate directory.
Setting Up the Debugger Files on the Host
To set up the debugger on the host, first ensure that you have the correct files available. Copy these files from the Support\Debug\platform directory to a debug directory on the hard drive, where platform matches the platform of the host computer.
Some files that you copy from the directory must match the platform of the target computer, as described in the following table. These files are necessary for kernel debugging.
File |
Source List |
---|---|
platformKd.exe* |
Alphakd.exe |
Imagehlp.dll |
|
Kdextplatform.dll* |
Kdextalp.dll |
* platform matches the platform of the target computer |
For instance, if your host computer is a 486 computer and the target computer is a MIPS RISC-based system, you copy the following files from the \Support\Debug\I386 directory:
Mipskd.exe
Imagehlp.dll
Kdextmip.dll
Once you have set up the symbol tree and copied the necessary files to it, use a batch file or command line to set the following environment variables on the host:
Variable |
Purpose |
---|---|
_NT_DEBUG_PORT |
COM port being used on host for debugging. |
_NT_DEBUG_BAUD_RATE |
Max baud rate for debug port. On x86-based computers, maximum is 9600 or 19200 bps for modems, 19200 bps for null-modem serial cables. On RISC-based computers, rate is always 19200 bps. |
_NT_SYMBOL_PATH |
Path to symbols directory |
_NT_LOG_FILE_OPEN |
Optional, the name of the file to which to write a log of the debug session |
After these environment variables have been set, you can start the host debugger.
Note Setting the _NT_LOG_FILE_OPEN variable does not always result in a log file being written. You can also create the log file from the debugger. The command format is:
.logopen pathname
You might also need to issue the !reload command to get this to work.
Starting the Debugger on the Host
You can start the host debugger from the command line or a batch file by using the name of the executable file as the command. Each debugger supports the following command-line options:
Option |
Action |
---|---|
-b |
Causes the debugger to stop execution on the target computer as soon as possible by causing a debug breakpoint (INT 3). |
-c |
Causes the debugger to request a resync on connect. Resynchronization ensures that the host and target computers are communicating in sequence. |
-m |
Causes the debugger to monitor modem control lines. The debugger is only active when the carrier detect (CD) line is active; otherwise, the debugger is in terminal mode, and all commands are sent to the modem. |
-n |
Causes symbols to be loaded immediately, rather than in a deferred mode. |
-v |
Indicates verbose mode; displays more information about such things as when symbols are loaded. |
-x |
Causes the debugger to break in when an exception first occurs, rather than letting the application or module that caused the exception deal with it. |
The most commonly used options are -v (verbose) and -m (for modem debugging).
Generally, the best way to start the debugger is to create a batch file with the necessary commands to set the environment variables, followed by the command to start the correct kernel debugger.
Using the Remote Utility to Start the Debugger
If the host computer is connected to a network, you can use the remote utility, included in the Windows NT Resource Kit, to start the debugger. Remote is a server/client utility that provides remote network access by means of named pipes to applications that use STDIN and STDOUT for input and output. Users at other computers on the network can then connect to your host debugger session and either view the debugging information or enter commands themselves. The syntax for starting the server (host) end of the remote session is as follows: remote /s "command" Unique_Id [/f foreground_color|/b background_color]
For example:
REMOTE /S "i386kd -v" debug
You end the server session by entering the @K command.
To interact with this session from some other computer, use the remote /c command. The syntax of this command is as follows: remote /c ServerName Unique_Id [/l lines_to_get|/f foreground_color|/b background_color]
To exit from the remote session on a client and leave the debugger running on the host computer, enter the @Q command.
For example, if a session with the ID debug was started on the host computer \\Server1 by using the remote /s command, you can connect to it with the command
REMOTE /C server1 debug
For more information on using the remote command, see the Rktools.hlp file on the Windows NT Resource Kit CD.
Examples
Assume the following:
Debugging needs to take place over a null-modem serial cable on COM2.
The symbols are on a CD on the E drive.
A log file called Debug.log is to be created in C:\Temp.
Note The log file holds a copy of everything you see on the debug screen during your debug session. All input from the person doing the debugging, and all output from the kernel debugger on the target system, is written to the log file.
A sample batch file for local debugging is:
REM Target computer is local set _NT_DEBUG_PORT=com2 set _NT_DEBUG_BAUD_RATE=19200 set _NT_SYMBOL_PATH=e:\support\debug\i386\symbols SET _NT_LOG_FILE_OPEN=c:\temp\debug.log remote /s "i386kd -v" debug
The last line of the batch file uses the remote utility to start the host debugger. If you use this, users of Windows NT–based computers who are networked to the host computer (and who have a copy of the remote utility) can connect to the debug session by using the command: remote /c computername debug
where computername is the name of the host computer.
To allow remote debugging, which requires the use of a modem, begin with the batch file in the previous example. Change the baud rate to 9600, and add the -m switch to the last line. The result is as follows:
REM Target computer is remote from the host set _NT_DEBUG_PORT=com2 set _NT_DEBUG_BAUD_RATE=9600 set _NT_SYMBOL_PATH=e:\support\debug\i386\symbols SET _NT_LOG_FILE_OPEN=c:\temp\debug.log remote /s "i386kd -v -m" debug
You run the batch file from the directory that contains the debugger files.
When you start the debugger, one of two screens appears, depending upon whether you are doing local debugging or remote debugging.
When doing local debugging, the following screen appears:
************************************** _********** REMOTE ***********_ _********** SERVER ***********_ _************************************_ To Connect: Remote /C BANSIDHE debug Microsoft(R) Windows NT Kernel Debugger Version 3.51 (C) 1991-1995 Microsoft Corp. Symbol search path is: KD: waiting to connect...
At this screen, you can press CTRL+C to gain access to the target computer, if it is still running. If the target is currently stopped at a blue screen, you will probably gain access automatically. If you have any problems, press CTRL+R to force a resync between the host computer and the target computer.
If you are doing remote debugging, the same screen as shown for local debugging appears, with the following extra line:
KD: No carrier detect - in terminal mode
In this case, the debugger is in terminal mode, and you can issue any of the standard AT commands to your modem. Begin by sending commands to disable hardware compression, flow control, and error correction. These commands will vary from modem to modem, so consult your modem documentation. Once you connect to the target system and have a carrier detect (CD) signal, you are returned to the debugger.
Creating a Memory Dump File
If you do not want to or are unable to do local or remote debugging, you can configure Windows NT Server or Windows NT Workstation to write a memory dump file each time it generates a kernel STOP error. This file contains all the information needed by the dumpexam utility to troubleshoot the kernel STOP error, as if you were connected to a live computer experiencing the problem.
Using the memory dump file enables you to examine the error at any time, so you can immediately restart the computer that failed. Thus, your target computer can be available while you are using the debugger. The only drawback to this method is that you must have sufficient space on a hard disk partition for the resulting memory dump file, which will be as large as your RAM memory. Therefore, whenever a kernel STOP error occurs, a computer with 32 MB of RAM produces a 32-MB memory dump file. You must also have a page file on your system root drive that is at least as large as your RAM memory.
To configure Windows NT to save STOP information to a memory dump file
In Control Panel, double-click System.
In the System Properties dialog box, click the Startup/Shutdown tab.
Under Recovery, select the Write debugging information to check box. Either accept the default path and filename (C:\systemroot\Memory.dmp) or type a path in the text box.
If you want this memory dump file to overwrite any file of the same name, select the Overwrite any existing file check box. If you set the option to overwrite an existing file, rename or move the file so it does not get overwritten before you have time to process it. If you clear this check box, Windows NT will not write a memory dump file if there is already a file by that name.
Using Utilities to Process Memory Dump Files
Included on the Windows NT Server and Windows NT Workstation version 3.51 CDs are three utilities for processing memory dump files: dumpflop, dumpchk, and dumpexam. All three utilities are on the product CDs in the Support\Debug\platform directories, where platform is I386, Alpha, MIPS, or PowerPC.
The primary purpose of these utilities is to create files on floppy disks or a text file that you can send to technical support personnel for analysis.
Dumpflop
Dumpflop is a command-line utility that you can use to write a memory dump file in segments to floppy disks, so it can be sent to a support engineer. This is rarely the most efficient way to send a memory dump file, but it is sometimes the only way. Dumpflop compresses the information it writes to the floppy disks, so a 32 MB memory dump file can fit onto 10 floppy disks, rather than 20 or more. Dumpflop does not require access to symbols.
To store the crash dump onto floppy disks, use dumpflop with the following command-line syntax: dumpflop options CrashDumpFile Drive:
To assemble a crash dump from floppy disks, use dumpflop with the following command-line syntax: dumpflop options Drive: CrashDumpFile
In either case, Options can include:
Option |
Action |
---|---|
-? |
Displays the command syntax. |
-p |
Only prints the crash dump header on an assemble operation. |
-v |
Shows compression statistics. |
-q |
Formats the floppy disk, when necessary, before writing the memory dump file to the floppy disk. When reading the floppy disks to assemble the file, overwrites an existing memory dump file. |
If executed with no parameters, dumpflop attempts to find a memory dump file in the \systemroot directory (the default location for creating a memory dump file) and writes it to floppy disks on the A drive.
Dumpchk
Dumpchk is a command-line utility that you can use to verify that a memory dump file has been created correctly. Dumpchk does not require access to symbols.
Dumpchk has the following command-line syntax: dumpchk options CrashDumpFile
The Options can include:
Option |
Action |
---|---|
-? |
Displays the command syntax. |
-p |
Prints the header only (with no validation. |
-v |
Specifies verbose mode. |
-q |
Performs a quick test. |
Dumpchk displays some basic information from the memory dump file and then verifies all the virtual and physical addresses in the file. If any errors are found in the memory dump file, it reports them. The following is an example of the output of a Dumpchk command:
Filename . . . . . . .memory.dmp Signature. . . . . . .PAGE ValidDump. . . . . . .DUMP MajorVersion . . . . .free system MinorVersion . . . . .807 DirectoryTableBase . .0x00030000 PfnDataBase. . . . . .0xffb7e000 PsLoadedModuleList . .0x80196d40 PsActiveProcessHead. .0x80196c38 MachineImageType . . .i386 NumberProcessors . . .1 BugCheckCode . . . . .0xc000021a BugCheckParameter1 . .0xe17b7b68 BugCheckParameter2 . .0xc0000005 BugCheckParameter3 . .0x00000000 BugCheckParameter4 . .0x00000000 ExceptionCode. . . . .0x80000003 ExceptionFlags . . . .0x00000001 ExceptionAddress . . .0x8015f015
FakePre-7847d33d5b214aa5ae75f6add029f785-5a3ab7df7df94410aae2d4b465d43033FakePre-44c57f6f09574857ba55dada5e18ba45-da04f05f5b3b473ea371b87afe8e155cFakePre-f501693806ce44199c2c45fdea0aad09-d275a52a22f04b3587e0c6cbf8910292
In this example, the most important information (from a debugging standpoint) is the following:
MajorVersion . . . . .free system MinorVersion . . . . .807 MachineImageType . . .i386 NumberProcessors . . .1 BugCheckCode . . . . .0xc000021a BugCheckParameter1 . .0xe17b7b68 BugCheckParameter2 . .0xc0000005 BugCheckParameter3 . .0x00000000 BugCheckParameter4 . .0x00000000
This information can be used to determine what kernel STOP error occurred and what version of Windows NT was in use.
Dumpexam
Dumpexam is a command-line utility that examines a memory dump file, extracts information from it, and writes it to a text file. This text file can then be used by support personnel to determine the cause of the kernel STOP error. In many cases, the dumpexam analysis provides enough information for support personnel to determine the cause of the error without directly accessing the memory dump file.
Three files are required to run dumpexam, and they all must be in the same directory. You can find them on the Windows NT Server or Windows NT Workstation CD in the directory Support\Debug\platform, where platform is I386, Alpha, MIPS, or PowerPC. The first two files are:
Dumpexam.exe
Imagehlp.dll
The third file is one of the following, depending on the type of computer on which the memory dump file was generated:
Kdextx86.dll
Kdextalp.dll
Kdextmip.dll
Kdextppc.dll
You can run dumpexam directly from the product CD with no parameters, if
The computer on which the dump occurred was running Windows NT version 4.0.
You have not applied any hot fixes or service packs on that computer.
The memory dump file you want to examine is in the location specified in the Recovery dialog box.
Dumpexam creates a text file called Memory.txt, located in the same directory as the Memory.dmp file, that contains information extracted from the memory dump file.
You can also use dumpexam to examine memory dump files created on computers running earlier versions of Windows NT. However, you can run it only with Windows NT version 3.51 or 4.0. Therefore, if your memory dump file was created in an earlier version of Windows NT, you must move the memory dump file or access it over the network. In addition, you must replace the Kdext*.dll files listed above with copies from the version of Windows NT that was running on the computer on which the dump occurred. These files contain debug information specific to that version of Windows NT. You must also specify the path to the symbols for the operating system version that was running on that computer.
Syntax for Dumpexam
The syntax for dumpexam is: dumpexam options CrashDumpFile
where options can include:
Option |
Action |
---|---|
-? |
Displays the command syntax. |
-p |
Prints the header only. |
-v |
Specifies verbose mode. |
-f filename |
Specifies the output filename and path |
-y path |
Sets the symbol search path. |
You need to specify the memory dump file path only if you have moved the memory dump file.
You need to specify the symbol search path (using the -y option) only if you are using an alternative symbol path. The symbol path for dumpexam can contain several directories, separated by semicolons(;). Because these directories are searched in the order in which they are listed, you list directories with the most recently installed hot fixes or service packs first.
Examples
In the first example, the memory dump file was created on a computer running Windows NT Workstation version 3.51, and no service packs were installed. The symbols are all in the directory C:\Symbols. The memory dump file is in the directory C:\Dump and is called Machine1.dmp. The command line reads as follows:
dumpexam -y c:\symbols c:\dump\machine1.dmp
The results of the exam will be in \Systemroot\Memory.txt.
In the next example, the memory dump file was created on a DEC Alpha computer running Windows NT Server version 3.5, with Service Pack 2 installed. The Service Pack 2 symbols are in D:\Sp2\Symbols. The Windows NT Server 3.5 symbols are on the product CD, which is in the E drive. The memory dump file Memory.dmp is in D:\Temp. The output file is to be put in the same directory as the memory dump file. The command line reads as follows:
dumpexam -y d:\sp2\symbols;e:\support\debug\alpha -f d:\temp\memory.txt d:\temp\memory.dmp
Using the Dumpexam Output File
Dumpexam reads a memory dump file, executes debugger commands on it, and writes the output in a text file, called Memory.txt, by default. The same debugger commands are executed on each memory dump file.
A full interpretation of the output requires knowledge of Windows NT kernel processes and the ability to read assembly language; however, there are some guidelines you can follow to get an idea of what the output means. This section first describes each part of the memory dump file output, giving sample output and a description. Then several common traps are discussed, along with guidelines on which sections of the Memory.txt file can help you determine what caused the kernel STOP error.
Because the primary purpose of the dumpexam utility is to create a text file to send to support personnel, the descriptions in this section do not provide complete details of the contents of the Memory.txt file.
The following sections of the Memory.txt file each occur once, as they include information that applies to the whole system. These sections are listed in the order in which they appear in Memory.txt.
Windows NT Crash Dump Analysis
The first section of output is Windows NT Crash Dump Analysis, which looks like the following:
**************************************************************** ** Windows NT Crash Dump Analysis _**************************************************************_ Filename . . . . . . .c:\temp\dumps\mac.dmp Signature. . . . . . .PAGE ValidDump. . . . . . .DUMP MajorVersion . . . . .free system MinorVersion . . . . .1057 DirectoryTableBase . .0x0006f005 PfnDataBase. . . . . .0x83fce000 PsLoadedModuleList . .0x800ee5c0 PsActiveProcessHead. .0x800ee590 MachineImageType . . .alpha NumberProcessors . . .2 BugCheckCode . . . . .0x0000002e BugCheckParameter1 . .0x00000000 BugCheckParameter2 . .0x00000000 BugCheckParameter3 . .0x00000000 BugCheckParameter4 . .0x00000000 ExceptionCode. . . . .0x80000003 ExceptionFlags . . . .0x00000001 ExceptionAddress . . .0x800bc140
Most of the information here is useful only for determining whether the memory dump file is corrupted. The following items are most important, especially if you did not record any information from the blue screen generated when the computer trapped:
Parameter |
Meaning |
---|---|
BugCheckCode |
This code lists the number of the stop that occurred. The stop code can be used by support personnel to determine what trap occurred. For information on bug check codes, see Chapter 4, "Message Reference," in Windows NT Messages. Descriptions of the STOP code message start on page 441 in chapter 4 and are in numerical order. In the preceding example, the code was 0x0000002e, which is a DATA_BUS_ERROR. |
BugCheckParameters |
These are the four parameters that are normally included with each STOP code. The description of the STOP code in Windows NT Messages includes the meaning of the parameters for some of the kernel STOP Errors. |
Symbol File Load Log
This section of the Memory.txt file includes any errors that were generated when the symbols were loaded. If no errors were generated, this section will be blank.
!drivers
The !drivers command is a debug command that you use to list information on all the device drivers loaded on the system. The information for the device drivers looks like this:
**************************************************************** ** !drivers _**************************************************************_ Loaded System Driver Summary Base Code Size Data Size Driver Name Creation Time 80080000 f76c0 (989 kb) 1f100 (124 kb) ntoskrnl.exe Fri May 26 15:13:00 1995 80400000 d980 ( 54 kb) 4040 ( 16 kb) hal.dll Tue May 16 16:50:34 1995 80654000 3f00 ( 15 kb) 1060 ( 4 kb) ncrc810.sys Fri May 05 20:07:04 1995 8065a000 a460 ( 41 kb) 1e80 ( 7 kb) SCSIPORT.SYS Fri May 05 20:08:05 1995
The following information can be determined from the above output:
Parameter |
Meaning |
---|---|
Base |
The starting address of the device driver code, in hexadecimal. When the code that causes a trap falls between the base address for a driver and the base address for the next driver in the list, then that driver is frequently the cause of the fault. For instance, the base for Ncrc810.sys is 0x80654000. Any address between that and 0x8065a000 belongs to this driver. |
Code Size |
The size in kilobytes of the driver code, in both hexadecimal and decimal. |
Data Size |
The amount of space in kilobytes allocated to the driver for data, in both hexadecimal and decimal. |
Driver Name |
The driver filename. |
Creation Time |
The link date of the driver. Do not confuse this with the file date of the driver, which can be set by external utilities. The link date is set by the compiler when a driver or executable file is compiled. It should be close to the file date, but it will not always be the same. |
!locks
The !locks command is a debugger command that displays all locks held on resources by threads. A lock can be shared or exclusive, which means no other threads can access that resource. This information is useful when a deadlock occurs on a system, because a deadlock is caused when one nonexecuting thread holds an exclusive lock on a resource needed by an executing thread.
**************************************************************** ** !locks -p -v -d _**************************************************************_ DUMP OF ALL RESOURCE OBJECTS **** KD: Scanning for held locks................. Resource @ 0xffb6ed14 Shared 2 owning threads Threads: ffb3bb70-01 0012fb50: Unable to read ThreadCount for resource Resource @ 0xffb6ecdc Shared 2 owning threads Threads: ffb3bb70-02 0012fb50: Unable to read ThreadCount for resource
!memusage
The !memusage command gives a short description of the current memory use of the system. Then it gives a much longer listing of the memory usage summary. The output looks something like this:
**************************************************************** ** !memusage _**************************************************************_ *
loading PFN database...................................................
Zeroed: 405 ( 3240 kb) Free: 0 ( 0 kb) Standby: 3242 ( 25936 kb) Modified: 135 ( 1080 kb) ModifiedNoWrite: 0 ( 0 kb) Active/Valid: 4410 ( 35280 kb) Transition: 0 ( 0 kb) Unknown: 0 ( 0 kb) TOTAL: 8192 ( 65536 kb) Usage Summary in KiloBytes (Kb): Control Valid Standby Dirty Shared Locked PageTables name 80975548 0 56 0 0 0 0 mapped_file(oemnxpip.inf) 80975248 0 16 0 0 0 0 mapped_file(oemnxpnb.inf) 8096aa68 0 160 0 0 0 0 mapped_file(SFMATALK.SY_) 80974f48 0 104 0 0 0 0 mapped_file(oemnxpsm.inf) 809758e8 0 96 0 0 0 0 mapped_file(utility.inf)
This section provides information for some memory leak issues, but it is more useful to refer to the !vm section for memory information for most common kernel STOP errors.
!vm
The !vm command lists the system's virtual memory usage. The output of !vm looks like this:
**************************************************************** ** !vm _**************************************************************_ _** Virtual Memory Usage **_ Physical Memory: 32784 (131136 Kb) Available Pages: 27435 (109740 Kb) Modified Pages: 33 ( 132 Kb) NonPagedPool Usage: 461 ( 1844 Kb) PagedPool 0 Usage: 1519 ( 6076 Kb) PagedPool 1 Usage: 125 ( 500 Kb) PagedPool 2 Usage: 149 ( 596 Kb) PagedPool Usage: 1793 ( 7172 Kb) Shared Commit: 173 ( 692 Kb) Process Commit: 254 ( 1016 Kb) PagedPool Commit: 1793 ( 7172 Kb) Driver Commit: 321 ( 1284 Kb) Committed pages: 4261 ( 17044 Kb) Commit limit: 80792 (323168 Kb)
All memory usage is listed in pages and in kilobytes. The most useful information in the !vm section for diagnosing problems is:
Parameter |
Meaning |
---|---|
Physical Memory |
The total physical memory in the system. |
Available Pages |
The number of pages of memory available on the system, both virtual and physical. If this is low, it might indicate a problem with a process allocating too much virtual memory. |
NonPagedPool Usage |
The amount of pages allocated to the nonpaged pool. The nonpaged pool is memory that cannot be swapped out to the pagefile, so it must always occupy physical memory. This number should rarely be larger than 10% of the total physical memory. If it is larger, this is usually an indication that there is a memory leak somewhere in the system. |
!errlog
The debugger sometimes keeps track of kernel errors logged by the system when a problem occurs. The !errlog section contains a dump of this log. In most cases, the error log is empty. If it is not empty, you can sometimes use it to determine the component or process that caused the blue screen.
!irpzone full
An Interrupt Request Packet (IRP) is a data structure used by device drivers and other kernel mode modules to communicate information to each other. The !irpzone full command displays a list of all the pending IRPs on the system. The following information is displayed in this section:
**************************************************************** ** !irpzone full _**************************************************************_ Small Irp list Irp is from zone and active with 1 stacks 1 is current No Mdl System buffer = fb564000 Thread fb5688a0: Irp stack trace. cmd flg cl Device File Completion-Context > d 0 1 fb56a030 fb56cd48 00000000-00000000 pending \FileSystem\MacSrv Args: 00001000 00000000 00121020 00000000 Large Irp list Irp is from zone and active with 4 stacks 5 is current No Mdl Thread fb4b6860: Irp is completed. Pending has been returned cmd flg cl Device File Completion-Context 0 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 0 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 0 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 d 0 0 fb5e3020 00000000 f8a8c711-fb48df10 \FileSystem\Ntfs SrvCompleteRfcbClose Args: 00000000 00000000 00000000 00000000
Each entry lists information about a different IRP and points to the driver that currently owns the IRP. This information can be useful when the trap analysis (which occurs later in the Memory.txt file) points to a problem with a corrupted or bad IRP. The IRP listing usually contains several entries in both the small and large IRP lists.
!process 0 0
This command lists all processes and their headers. The process header list will contain entries like the following:
**************************************************************** ** !process 0 0 _**************************************************************_ NT ACTIVE PROCESS DUMP **** PROCESS fb667a00 Cid: 0002 Peb: 00000000 ParentCid: 0000 DirBase: 00030000 ObjectTable: e1000f88 TableSize: 112. Image: System PROCESS fb5edde0 Cid: 0018 Peb: 7ffdf000 ParentCid: 0002 DirBase: 01587000 ObjectTable: e11d59a8 TableSize: 48. Image: SMSS.EXE
The important information in the !process 0 0 section is:
Parameter |
Meaning |
---|---|
Process ID |
The 8-character hexadecimal number after the word PROCESS is the process ID. This is used by the system to track the process. For the first process in the example, this is fb667a00. |
Image |
The name of the module that owns the process. In the above example, the first process is owned by System, the second by Smss.exe. |
!process 0 7
This command also lists process information. But instead of just listing the process header, the !process 0 7 command lists all information about the process, including all threads owned by each process. This is a very long listing because each system has a large number of processes and each process has one or more threads. In addition, if the stack from a thread is resident in kernel memory (as opposed to swapped to the page file), it is listed after the thread information. Most process and thread listings look like the following:
**************************************************************** ** !process 0 7 _**************************************************************_ NT ACTIVE PROCESS DUMP ****
FakePre-6351e88d692a4eeba820e9188779a204-fcf5b800511744a1a9317daa533549dbFakePre-56d49c9f17354fd2a71cd426a2a770d9-8e404bd11fbb4daa85cabfdb7abe8882FakePre-04f3b468b46a400f8ba8af416b6f26a6-f0e35700bc854e459a9ec68e7f1dde8f
The following entries in the process information can be important:
Parameter |
Meaning |
---|---|
UserTime |
Lists the amount of time the process has been running in user mode. If the value for UserTime is exceptionally high, it might identify a process that is taking up all the resources and starving the system. |
KernelTime |
Lists the amount of time the process has been running in kernel mode. If the value for KernelTime is exceptionally high, it might identify a process that is taking up all the resources and starving the system. |
Working Set Size |
Lists the current, minimum, and maximum working set size for the process, in pages. An exceptionally large working set size can also be a sign of a process that is leaking memory or using too many system resources. |
QuotaPoolUsage Entries |
List the paged and nonpaged pool used by the process. On a system with a memory leak, looking for excessive nonpaged pool usage on all the processes can tell you which process has the memory leak. |
In addition to the process list information, the thread information also contains a list of the resources on which the thread has locks. This information is listed right after the thread header. In this example, the thread has a lock on one resource, a SynchronizationEvent with an address of 80144fc0. By comparing this address to the list of locks shown in the !locks section, you can determine which threads have exclusive locks on resources.
Processor-Specific Information in Memory.txt
The following sections in the Memory.txt file occur once for each processor on the system. In a four-processor system, these sections will be repeated for processors 0 through 3. In addition, some traps generate a few extra sections, such as STOP 0x0000001E.
Register Dump for Processor #x
A dump of the state of all registers at the time of the trap is included in this section. For an x86-based system, it appears as follows:
**************************************************************** ** Register Dump For Processor #0 _**************************************************************_ eax=ffdff13c ebx=00000000 ecx=00000000 edx=fb5a7db4 esi=00000d31 edi=00000d31 eip=8013b446 esp=f88b6de4 ebp=f88b6df8 iopl=0 nv up di pl nz na pe nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286 cr0=8001003b cr2=00000d31 cr3=00030000 dr0=00000000 dr1=00000000 dr2=00000000 dr3=00000000 dr6=ffff0ff0 dr7=00000400 cr4=00000000 gdtr=80036000 gdtl=03ff idtr=80036400 idtl=07ff tr=0028 ldtr=0000
For a RISC-based system, the register dump varies from processor type to processor type. The following example is from a DEC Alpha system:
v0=80006000 t0=00000000 t1=00000000 t2=800ef538 t3=00000008 t4=00000000 t5=800ec440 t6=00000000 t7=00000000 s0=c53f2000 s1=00000002 s2=00000001 s3=00000000 s4=00000001 s5=0018da83 fp=fc90f940 a0=00000002 a1=c53f2000 a2=c53f2000 a3=00000000 a4=00000000 a5=00000002 t8=800ed580 t9=80a4752c t10=c53f2000 t11=80a4752c ra=8009b0bc t12=80a61ecc at=a0000000 gp=800ed430 sp=fc90f890 zero=00000000 pcr=0000000008000000 softfpcr=0000000000000000 fir=800bf2fc psr=0000000a mode=0 ie=1 irql=2
In general, the register dump is valuable only if you are skilled in reading assembly language on the system you are debugging.
Stack Trace for Processor x
The next section includes a trace of the stack for that processor. The stack trace is important because it tells you what functions were called. You can use it to trace back from a trap to determine why it happened. Included right after each stack trace is a section of disassembled code from the area in memory around the last instruction in the stack. This information also looks different, depending on platform.
The first example is an excerpt from an x86-based computer on which a STOP 0x0000000A occurred:
**************************************************************** ** Stack Trace _**************************************************************_ ChildEBP RetAddr Args to Child f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0]) f88b6df8 fb4a71a0 fb4a6028 f89805b0 fb55ea88 NTSend+0x142 8013B430: 8B 4D 64 mov ecx,dword ptr [ebp+64h] 8013B433: 83 E1 02 and ecx,2 8013B436: D1 E9 shr ecx,1 8013B438: 8B 75 68 mov esi,dword ptr [ebp+68h] 8013B43B: 56 push esi 8013B43C: 51 push ecx 8013B43D: 50 push eax 8013B43E: 57 push edi 8013B43F: 6A 0A push 0Ah 8013B441: E8 00 C6 FD FF call KiTrap0E+24Eh --->8013B446: F7 45 70 00 00 02 test dword ptr [ebp+70h],offset KiTrap0E+255h 00 8013B44D: 74 0D je KiTrap0E+268h 8013B44F: 83 3D EC 05 14 80 cmp dword ptr [KiTrap0E+25Dh],0 00 8013B456: 0F 85 29 FE FF FF jne KiTrap0E+264h 8013B45C: 83 3D 38 49 14 80 cmp dword ptr [KiTrap0E+26Ah],0 00 8013B463: 0F 85 1C FE FF FF jne KiTrap0E+271h 8013B469: 83 3D C0 4D 14 80 cmp dword ptr [KiTrap0E+277h],0 00 8013B470: 0F 85 0F FE FF FF jne KiTrap0E+27Eh 8013B476: B8 FF 00 00 00 mov eax,offset KiTrap0E+283h 8013B47B: EB AC jmp KiTrap0E+235h 8013B47D: A1 52 F0 DF FF mov eax,[KiTrap0E+28Ah] 8013B482: C6 05 52 F0 DF FF mov byte ptr [KiTrap0E+290h],0
The arrow (--->) indicates the line in the assembly code at which the system trap occurred.
The most important information here is the stack trace at the top. This tells you in which part of the code the system trapped. Each line of a stack trace is a different instruction that has been pushed on the stack, with the first line being the last thing pushed on the stack. The following information is included in each line of an x86 stack trace:
Parameter |
Meaning |
---|---|
ChildEBP |
The base pointer. This is an address on the stack. |
RetAddr |
The return address. This is the address that the processor returns to when it finishes executing the current thread. This is also the address of the instruction on the next line of the stack. |
Args to Child |
The first three arguments passed to the function when it was called. These are usually pointers, but can also be other values. |
Function name and offset |
The final piece of information is a function name and an offset into that function that identifies the location, in code, whose address was pushed on the stack. |
The next example is from a DEC Alpha system that experienced STOP 0x0000002E:
Callee-SP Arguments to Callee Call Site fc8e4f90 80403e08 : 80ae1060 00000000 00000000 00000000 KeBugCheckEx+0x58 fc8e5290 800c3ce8 : 80ae1060 00000000 00000000 00000000 HalMachineCheck+0x198 fc8e52d0 800c33b8 : 80ae1060 00000000 00000000 00000000 KiMachineCheck+0x28 fc8e52e0 800c1c20 : 80ae1060 00000000 00000000 00000000 KiDispatchException+0x68 fc8e55e0 800c1bcc : 80ae1060 00000000 00000000 00000000 KiExceptionDispatch+0x50 fc8e5680 80409d4c : 80ae1060 00000000 00000000 00000000 KiGeneralException+0x4 fc8e5880 f7361344 : 80ae1060 00000000 00000000 00000000 READ_REGISTER_UCHAR+0x6c fc8e5880 f71313c4 : 80ae1060 00000000 00000000 00000000 AtalkReceiveIndication+0x654 fc8e5930 f71361a4 : 80ae1060 00000000 00000000 00000000 EthFilterDprIndicateReceive+0x234 fc8e5990 f713218c : 80ae1060 00000000 00000000 00000000 MiniportSendLoopback+0xb14 fc8e5a30 f71308d8 : 80ae1060 00000000 00000000 00000000 MiniportSyncSend+0x20c fc8e5a70 f73628c0 : 80ae1060 00000000 00000000 00000000 NdisMSend+0x158 800BC12C: B21DF170 stl a0,KeBugCheckEx+80x4(gp) 800BC130: 0000001C call_pal rdpcr 800BC134: A0000CA0 ldl v0,KeBugCheckEx+80x4(v0) 800BC138: 22000060 lda a0,KeBugCheckEx+80x5(v0) 800BC13C: D3406778 bsr ra,RtlCaptureContext --->800BC140: 0000001C call_pal rdpcr 800BC144: A0000CA0 ldl v0,KeBugCheckEx+t0x5(v0) 800BC148: 22000060 lda a0,KeBugCheckEx+t0x6(v0) 800BC14C: D34006DC bsr ra,KiSaveProcessorControlState 800BC150: 0000001C call_pal rdpcr 800BC154: 45299801 xor s0,76,t0 800BC158: 221E00D0 lda a0,KeBugCheckEx+o0x7(sp) 800BC15C: A0000CA0 ldl v0,KeBugCheckEx+o0x7(v0) 800BC160: 223F0230 mov KeBugCheckEx+E0x78,a1 800BC164: 22400060 lda a2,KeBugCheckEx+o0x7(v0) 800BC168: D340803D bsr ra,OtsMove 800BC16C: 47EB0402 mov s2,t1
In an Alpha stack trace, the Callee-SP parameter serves the same purpose as the ChildEBP parameter in the x86 stack. The number right after the Callee-SP is the return address, and the next four numbers are the arguments that were pushed on the stack. The values for these are usually 0 because a RISC-based system uses special registers and does not pass arguments on the stack.
!process
A !process command without any parameters lists information on the process currently running on the active processor. Its output looks exactly the same as the output in the !process 0 7 section, except that it is only for one process, and no thread information is listed.
!thread
A !thread command without any parameters behaves exactly as a !process command without any parameters, and lists the thread that is currently running. The thread output looks exactly the same as the output in the !process 0 7 section.
Note There are three very similar versions of the same information so it is easier to find which thread(s) are currently executing. A !process 0 7 command lists all process and thread information, which results in 10–15 pages of data just for the process and thread output. Picking out the process or thread that is currently running from this long list can be difficult.
Dump Analysis Heuristics for Bugcode
This section appears in a dump for the processor that actually caused the trap only. This section includes information specific to the STOP code and can be very important. The exact information presented in this section varies for different STOP codes, but it lists the address at which the STOP occurred and any more information that is available.
This an example from STOP 0x0000000A:
**************************************************************** ** Dump Analysis Heuristics for Bugcode IRQL_NOT_LESS_OR_EQUAL _**************************************************************_ Invalid Address Referenced: 0x00000020 IRQL: 2 Access Type: Write Code Address: 0xfa6325a5
This example is from a STOP 0x0000001E:
**************************************************************** ** Dump Analysis Heuristics for Bugcode KMODE_EXCEPTION_NOT_HANDLED _**************************************************************_ Exception Code: 0xc0000005 Address of Exception: 0x801704a7 Parameter #0: 0x00000001 Parameter #1: 0x00000001
Common STOP Codes
By looking through the Memory.txt output of common STOP codes, you can sometimes identify the module or driver that caused the problem. Given this information, you might be able to determine whether a service pack or update to Windows NT will fix the problem. In many cases, you will still need to contact support personnel, but looking at the Memory.txt output gives you an idea about what is wrong.
STOP 0x0000000A IRQL_NOT_LESS_OR_EQUAL
STOP 0x0000000A indicates that a kernel mode process or driver attempted to access a memory address that it did not have permission to access. The most common cause of this error is a bad or corrupted pointer to an incorrect location in memory. A pointer is a variable used by a program to refer to a block of memory. If the variable has an incorrect value in it, then the program tries to access memory that it should not be using.
When this occurs in a user-mode application, it generates an access violation.
When it occurs in kernel mode, it generates a STOP 0x0000000A message. This trap can be caused by either hardware or software. Contact support personnel to determine the exact cause.
To determine the general cause of a STOP 0x0000000A message, look at the Stack Trace for Processor X section of the Memory.txt file. If you have a multiprocessor system, check the output for all processors and look for a stack trace that has a line similar to the following at the top of the stack:
ChildEBP RetAddr Args to Child f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0])
This is the processor on which the trap occurred. After the stack trace section, additional information on the trap appears in the Dump Analysis Heuristics section. To determine the module that caused the trap, look at the line on the stack trace occurring immediately after the line in the preceding example. This line is usually the line of code that caused the trap. From this information, you can identify the module in which the trap occurred. For example, the top lines of the stack trace can read:
ChildEBP RetAddr Args to Child fa679758 fa6325a5 fcdb0b58 fccd3770 02611e6c KiTrap0E+0x252 fa6797e0 fa63ae8e fcc37528 fa67992e fccd3770 FindNameOrQuery+0x141 fa679838 fa6444a5 fa679854 fa6a33d0 fa6798d0 NbtConnect+0x3ae fa679860 fa630393 fccd3770 fcdb2e08 fa679900 NTConnect+0x2b
The first line of the stack trace contains the reference to KiTrap0E and the second line contains FindNameOrQuery+0x141, which means that the processor trap occurred in the function FindNameOrQuery.
STOP 0x0000001E KMODE_EXCEPTION_NOT_HANDLED
STOP 0x0000001E can also be caused by either hardware or software. It is caused by hardware more often than a STOP 0x0000000A is, but can be caused by software.
When looking at dumpexam output from STOP 0x0000001E, you see two stack trace listings for the processor on which the STOP occurred. The first listing is the stack after the trap occurred, which shows only the kernel calls made to handle the trap and does not include any information about what code caused the trap.
The second listing shows the stack just before the trap occurred. This is the listing you use for your analysis. The register dump for the processor is also duplicated, with the first dump showing the status of the registers after the trap and the second showing the state of the registers when the trap occurred. These two sets of information are separated by a section that looks like the following:
**************************************************************** ** !exr fca49c20 _**************************************************************_ Exception Record @ FCA49C20: ExceptionCode: c0000005 ExceptionFlags: 00000000 Chained Record: 00000000 ExceptionAddress: 801704a7 NumberParameters: 00000002 Parameter[0]: 00000001 Parameter[1]: 00000001
This section includes the following information:
Parameter |
Meaning |
---|---|
ExceptionCode |
A status code that identifies what type of exception occurred. In this case, the code is c0000005, which indicates an access violation. To find out what a particular status code means, contact support personnel. |
ExceptionAddress |
The address of the instruction that caused the STOP. |
The first stack trace from STOP 0x0000001E, the one that does not provide any useful information, looks like the following:
ChildEBP RetAddr Args to Child fca49968 8013387e fca49990 801367ab fca49998 PspUnhandledExceptionInSystemThread+0x18 (FPO: [0,0,0]) fca49970 801367ab fca49998 00000000 fca49998 PspSystemThreadStartup+0x4a (FPO: [0,0,0]) fca49f7c 8013e452 fca54bae 00000001 00000000 _except_handler3+0x47 00000000 00000000 00000000 00000000 00000000 KiThreadStartup+0x16
To determine where the trap occurred, ignore this stack and look at the second listing, after the !exr entry. The first line in this listing indicates the location in code that caused the trap.
With STOP 0x0000001E, it is also useful to compare the exception address listed in the !exr section to the list of device drivers in the !drivers section of the Memory.txt file. If the trap was caused by a specific driver, this address falls into the address range in the drivers list. If this is the case, it can indicate a problem either with the device that the driver controls or with the driver itself. Here is an example:
FramePtr RetAddr Param1 Param2 Param3 Function Name fa1bcda4 8010e244 fcff3940 00000000 00000220 NT!PsReturnPoolQuota+0xe fa1bcdd4 80117085 fcbee668 fcddf648 fcbff020 NT!ExFreePool+0x16c fa1bce24 8011c60b fcddf648 fa1bce58 fa1bce54 NT!IopCompleteRequest+0xbd fa1bce5c 8013de15 00000000 00000000 00000000 NT!KiDeliverApc+0x83 fa1bce7c 8011a1ce 00000000 00000000 80179a01 NT!@KiSwapThread@0+0x15d fa1bcea0 80179b3f fcc4bf60 00000006 80179a01 NT!KeWaitForSingleObject+0x1c2 fa1bcef0 80139b09 00000114 00000001 00000000 NT!NtWaitForSingleObject+0xaf fa1bcef0 77f893eb 00000114 00000001 00000000 NT!KiSystemService+0xa9 00000000 00000000 00000000 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
STOP 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP
STOP 0x0000007F usually occurs in the processor itself and almost always indicates a hardware fault. There are several kinds of STOP 0x0000007F, which you can determine by the first parameter of the STOP code, found in the Windows NT Crash Dump Analysis section at the beginning of the Memory.txt file.
The following are common kernel mode traps:
First Parameter |
Meaning |
---|---|
0x00000000 |
Divide by zero error |
0x00000004 |
Arithmetic overflow |
0x00000006 |
Invalid opcode |
0x00000008 |
Double fault |
A divide by zero error is caused when a DIV instruction is executed and the divisor is 0. This can be caused by problems which need to be investigated further, such as memory corruption, hardware problems, or software failures.
Here's an example of a divide by zero error:
ChildEBP RetAddr Args to Child 8019d778 8013cdcc fe483688 00000000 00000000 NT!_KiSystemFatalException+0xe (FPO: [0,0] TrapFrame @ 8019d778) 8019d7e8 fbb053be 0001440d 000004a9 000004a9 NT!_RtlEnlargedUnsignedDivide+0xc (FPO: [4,0,0]) 8019d80c 8010f613 0001440d 000004a9 fe482bd0 bhnt!_BhStationQueryTimeout+0x44 (FPO: [4,0,1]) 8019d820 fb910aa6 fe50a000 fe44255a fe44254c NT!_KeSetTimer+0x8f 8019d85c fb9409b3 fe4820c8 fe44255a fe44254c NDIS!_EthFilterDprIndicateReceive+0x111 8019d894 fb94044a fe482b98 fe483688 ffdff401 netflx!NetFlexProcessEthRcv+0x85 8019d8ac fb910ba1 fe482aa8 fb910b30 00000001 netflx!_NetFlexHandleInterrupt+0x4a 8019d8c4 80137c06 fe482bac fe482b98 00000000 NDIS!_NdisMDpc+0x71 (FPO: [EBP 0xfb910b30] [4,0,4]) fb910b30 18247c8b 8b34778b 4e8d106f d015ff30 NT!_KiIdleLoop+0x5a kd> !trap 8019d778 eax=0001440d ebx=00000003 ecx=8019d81c edx=000004a9 esi=fe4820c8 edi=fe46a188 eip=8013cdcc esp=8019d7ec ebp=8019d820 iopl=0 nv up ei pl zr na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246 ErrCode = 00000000 8013cdcc f774240c div dword ptr [esp+0xc]
An arithmetic overflow error occurs when the result of a multiplication operation is larger than a 32-bit integer. This error can be caused by a software failure, but it is also frequently a hardware problem.
An invalid opcode error occurs when the processor attempts to execute an instruction that is not defined. This error is almost always caused by hardware memory corruption. If you receive this error, run memory diagnostics on your regular memory and both L1 and L2 cache memory.
A double fault trap occurs when two kernel-mode traps occur simultaneously and the processor is unable to handle them. This trap is almost always caused by hardware failure.
If a particular trap can be caused by either software or hardware, more analysis is required to determine which is the cause. If you suspect a hardware problem, try the following hardware troubleshooting steps:
Run diagnostic software to test the RAM in the computer. Replace any RAM reported to be bad. Also, make sure that all the RAM in the computer is the same speed.
Try removing or swapping controllers, cards, or other peripherals.
Try a different motherboard on the computer.