Demystifying the 'Blue Screen of Death'

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By Brien M. Posey, MCSE for TechRepublic.com

Have you ever had that 3:00 AM phone call from someone saying that one of your servers is displaying the Blue Screen error – affectionately known among IT Pros as the "Blue Screen of Death"? In such a situation, your first instinct would probably be to tell them to reboot the server and let you go back to sleep. However, as you've probably already found out, rebooting isn't always the magic cure all. It can be a gut-wrenching feeling, staring at the incomprehensible blue screen with all its numbers and codes. However, this experience doesn't have to be so traumatic. The secret is to know how to read the Blue Screen of Death. In this article, I'll show you how to read the Blue Screen of Death. I will also discuss some of the errors that you're likely to encounter and provide you with some techniques for eliminating them.

On This Page

Anatomy of a Blue Screen
The Error Message
Modules That Have Loaded
Modules That Were About to Load
Kernel Debugger
An Easier Way
Memory Dump
Last Known Good Configuration
Conclusion

Anatomy of a Blue Screen

There are four basic sections that you should be aware of on a Blue Screen of Death:

  • The first section lists the actual error message.

  • The second section lists the Microsoft® Windows NT® modules that are already loaded into memory.

  • The third section lists the modules that were about to be loaded had the error not occurred.

  • The fourth section lists the current status of the Kernel Debugger.

I'll cover each of these sections in detail.

The Error Message

The section circled (with a white box) in Figure A shows the actual error message. This message contains an error code number, the addresses where the error occurred, and a text code indicating the type of error. Below, I've listed some of the more common error codes and their causes.

Cc750081.bsoda(en-us,TechNet.10).gif

Figure A: This is the actual error message.

DIVIDE_BY_ZERO_ERROR

This error is caused by an application trying to divide by zero. If you receive this error and don't know which application caused it, you might try examining the memory dump.

IRQL_NOT_LESS_OR_EQUAL

The IRQL_NOT_LESS_OR_EQUAL error is caused by a buggy device driver or an actual hardware conflict. If you've recently added new hardware to your system, try removing it and see if the error goes away. Likewise, if you've recently loaded a new device driver, you might try using ERD Commander Professional Edition, by Winternals Software, to temporarily disable the new driver and see if the problem goes away.

KMODE_EXCEPTION_NOT_HANDLED

An incorrectly configured device driver usually causes this type of error. As I'll explain later, you can use another section of the blue screen to figure out which driver is causing the problem.

REGISTRY_ERROR

Such an error indicates a catastrophic failure in the system's registry. However, this error can sometimes be caused by failure to read the registry from the hard disk rather than because the registry itself is corrupt. Most of the time though, if you get this error, you'll have to restore from backup.

INACCESSIBLE_BOOT_DEVICE

Just as the name implies, this error indicates that Windows NT is having trouble reading from the hard disk. This error can be caused by a faulty device driver or a bad small computer systems interface (SCSI) terminator. If you've checked for these problems, but are still receiving the error, check to make sure that a virus hasn't destroyed your boot sector.

UNEXPECTED_KERNEL_MODE_TRAP

This error message is almost always caused by your computer's memory. If you receive this error, check to make sure that all of your single inline memory modules (SIMMs) are the same type and speed. You should also check to make sure that your computer's Complementary Metal Oxide Semiconductor (CMOS) is set for the correct amount of RAM. If all of these suggestions check out, try replacing the memory in the computer.

BAD_POOL_HEADER

This is, perhaps, the most obscure error message. In most cases, if you receive this error, it's related to the most recent change you've made on your system. Try undoing the change to get rid of the error.

NTFS_FILE_SYSTEM

An NTFS_FILE_SYSTEM error indicates hard disk corruption. If your system is bootable, run CHKDSK /F on all of your partitions immediately. If your system isn't bootable, try installing a new copy of Windows NT in a different directory. You can use that copy to run the CHKDSK program. When you're done with the second copy, you can edit your BOOT.INI file to make your computer start your original copy of Windows NT.

KERNEL_DATA_INPAGE_ERROR

This error indicates that Windows NT wasn't able to read a page of kernel data from the page file. Bad memory, a bad processor, incorrectly terminated SCSI devices, or a corrupt PAGEFILE.SYS file may cause this situation. The first step in correcting such an error is to recreate the PAGEFILE.SYS file and see if you can bring your system back online.

NMI_HARDWARE_FAILURE

This is a generic error message in which the hardware abstraction layer can't report on the true cause of the error. In such a situation, Microsoft recommends calling the hardware vendor. This error can sometimes be caused by mixing parity and non-parity SIMMs, or by bad SIMMs.

Modules That Have Loaded

The section that I've circled in Figure B shows the modules that Windows NT has already loaded into memory. You can use this section primarily to look at the modules that are already loaded, and be somewhat confident that none of the modules listed are causing your problem.

Cc750081.bsodb(en-us,TechNet.10).gif

Figure B: These are the modules that Windows NT has already loaded into memory.

Modules That Were About to Load

The section that I've circled in Figure C shows which modules were about to load when the error occurred. Many times, this section can give you an idea of which module is causing your problem. This is especially true if you're receiving a KMODE_EXCEPTION_NOT_HANDLED error. For example, suppose that the next module on the stack to load was tcpip.sys. In such a situation, it's likely that an incorrect network card driver may be causing your problem. If you happen to own ERD Commander Professional Edition by Winternals Software you could disable the network card driver, and try booting your system again. If the system boots, you could correct the driver problem.

Cc750081.bsodc(en-us,TechNet.10).gif

Figure C: These are the modules that were next to load, had the error not occurred.

Kernel Debugger

The section circled in Figure D indicates the current status of the kernel debugger. The kernel debugger enables you to link two computers running Windows NT via a RAS connection or a null modem cable. When a Blue Screen of Death occurs, the crash dump information is sent to the functional computer for diagnosis.

Cc750081.bsodd(en-us,TechNet.10).gif

Figure D: This section lists the status of the Kernel debugger.

To use the kernel debugger, both computers must be running the same version of Windows NT, and have the symbol set installed. You must also install the debugging software from the \SUPPORT\DEBUG\PI386 directory on your Windows NT CD-ROM.

Next, you must add environment variables to both computers, as shown in Table A:

Table A

Variable

Value

_NT_DEBUG_PORT

COM1 or COM2

_NT_DEBUG_BAUD_RATE

baud rate

_NT_SYMBOL_PATH

location of symbol files

Add these environment variables.

At this point, you need to modify the BOOT.INI file on the computer that you plan to use to examine the crash dump information. To do so, add /CRASHDEBUG to the end of the line that you plan to use to boot Windows NT. Reboot Windows NT before continuing.

When both machines are set up, you must run the REMOTE program before triggering the blue screen. On the PC having the problem, type the following command:

REMOTE /s "I386KD –v" DEBUG

In this command, the /s indicates that this computer will act as a server and send the crash dump file to the client. The –v indicates verbose logging mode.

On the computer that you plan to use to examine the crash dump, type the following command:

REMOTE /C computername DEBUG

In this command, the /C indicates that this computer will function as a client and receive the crashdump file from the server. The computername is the name of the computer having problems.

An Easier Way

As you can tell, setting up the kernel debugger can be complicated. If you don't want to go through all of this, there are a couple of other things you can try first.

Memory Dump

If your computer is bootable, you can set Windows NT to create a memory dump file when a Blue Screen of Death occurs. To do so, open the System Properties dialog box from Control Panel and go to the Startup/Shutdown tab. Next, set the options shown in Figure E. Keep in mind that the partition where you store the memory dump file must have at least enough free space to store your page file, plus your physical RAM space, plus 1 MB. For example, if your machine has 128 MB of RAM, the partition must have enough free space for the page file, plus an extra 129 MB.

Cc750081.bsode(en-us,TechNet.10).gif

Figure E: Use these options to create a memory dump file.

Once you've created a memory dump file, you can use the DUMPEXAM.EXE program in the \SUPPORT\DEBUG\I386 directory of your Windows NT CD-ROM to create a report of the crash. You can see an example of such a report in Figure F.

Cc750081.bsodf(en-us,TechNet.10).gif

Figure F: You can use the DUMPEXAM.EXE program to create a report similar to this one.

Last Known Good Configuration

You have undoubtedly heard the phrase, "If it ain't broke, don't fix it." In the world of Windows NT, this can be especially true. Blue screens don't occur without reason. If you have a blue screen that you can't seem to figure out and you've ruled out a hardware failure, chances are that it may be related to a change that you or someone else has recently made. In such a situation, you could try using the Last Known Good Configuration as a last resort. Using this option will sometimes bring your system back to life, but will undo the changes that you've made since the last time the system was rebooted.

Conclusion

In this article, I've discussed the various pieces of information displayed on the notorious Blue Screen of Death. As I did, I explained what each of these items meant, and provided you with several steps you can take to correct the error.

Brien M. Posey is an MCSE who works as a freelance writer. He also works as a systems engineer for the United States Department of Defense. You can contact him at Brien_Posey@xpressions.com. Because of the high volume of e-mail that he receives, it's impossible for him to respond to each message, although he does read them all.

flaglogo

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.