Soussan DAS Computer Consultants


Our Team
Solutions
Projects
Clients
Contact
Cool Stuff
KeyholeKeyboardLaptop ComputerComputer Chip
 

A BSOD Recovery how-to document

Executive summary: How to get out of a BSOD Boot loop when no recovery or startup option available at boot time works. Written specifically for Windows XP, but equally effective on Windows 2000 (Win2K) and Vista. Here was the trail of how I tracked it down, got the system booting, then recovered it back to its normal working state.

My goal was to give a technically competent person enough tools, pointers to resources, and how-to information that you might try this first to bring a system back from the dead before wiping it and re-installing the operating system and all your applications. I do not recommend what is described here to the average computer user. When I first hit this problem, I found nothing on the web that described these steps, so I blazed the trail and left breadcrumbs for others to follow. The second time, I took detailed screen shots and notes which should make it easier for you.

There are MANY different reasons Windows can be stuck in a BSOD reboot loop where you can't get anything to start correctly enough to investigate further. The techniques shown here are two completely different problems that had similar symptoms where similar techniques were used to get the system booting again. These dealt with specific errors with startup, so check your error before trying this! Your BSOD Boot Loop might have an easier fix than these did.

If this helped you, please send me a note! I love hearing success stories! Also, I've got a lot of other interesting bits of information in the 'Cool Stuff' section of my website.

While you are here, please check out a brand new article on recovering from various malware, virus, and spyware article.

Quote
Click Here for Press Release

 

Details

There are now two cases here, both with similar symptoms, different root causes, both causing the customer the exact same complaint. Both fixed with the same method -- manually restoring a corrupted registry file from inside of a system restore's copy of that file when you can't get to system restore's native console.

QUICK UPDATE:

Apparently there are known issues with XP SP3 on AMD processor machines that have certain OEM installations of Windows XP going into a Blue Screen reboot loop. So if you found this page and have that problem, perhaps you'll instead want to look here:

http://msinfluentials.com/blogs/jesper/archive/2008/05/07/does-your-amd-based-computer-boot-after-installing-xp-sp3.aspx

------

Case #1: In this case, it wasn't a client -- it was my godson. There is a lot of background knowledge information in this case.

He had a notebook computer he used for school, email, games, chatting, ... all the things a 13 year old boy uses a computer for in these modern times. One day it just stopped booting -- came up with the Windows XP logo screen, then blue-screened into a reboot loop. None of the other modes worked -- no safe mode, no safe mode command prompt only, no last known good configuration, etc. Everything resulted in the BSOD shortly after the initial Windows XP splash screen appeared.

Over the years I've had the unfortunate stance of talking with many manufacturer's technical support people. This one happened to be an HP/Compaq system, but Dell, Gateway, Toshiba, eMachines, and all the other tech support lines pretty much stink. I don't know the exact reason why, but it is likely a combination of things -- they hire unknowledgeable people that read from a database of problems, those that do know what is going on and can think their way out of a paper bag get sucked into product development, and sometimes I just can't understand the person's words through their thick accent.

Most often their solution is "Can you reinstall from scratch? Great! That fixed the problem. Thank you, come again!"

That is like "My stair squeak when I walk on them." and the repair guy comes over, tears your house completely down, and builds it back with all new materials. Kind of overkill. But believe it or not, sometimes that is the most cost effective way to get rid of that squeak -- wipe it out and start over. At least it is cost effective for them. They don't care about the client, the data on the system, or the time it will take to get all those programs re-installed and re-configured so it will work right. I'm sure within the bowels of Microsoft there are people that are very adept at bringing systems back from death without data loss, but none of us mere mortals will ever talk to them via the usual technical support channels.

My godson gave me his computer and let me have my way with it.

This particular system came with an HP Recovery Disk, which unfortunately was only good for reinstalling the operating system. Most standard XP Install disks allow you to boot into a 'Recovery Console', which lets you boot a system and look around at various files. So I grabbed one of my other XP disks, booted into the recovery console, and had a look at various files.

If you look on your system in c:\windows\minidump, you'll see files from every BSOD that your computer has done. Starting in December 2007, his system had been getting them pretty regularly.

6 times in December
8 times in January
22 times in February

Here is the view off the minidump directory on my notebook:



Mine (a Gateway) has recurring problems with the video drivers. Gateway is long done developing drivers for this model, so I get to live with the bugs and various reboots.

Most of the time BSODs are due to bad or buggy driver code. Drivers are the software that makes specific hardware devices work (like your screen, sound, bluetooth adapter, etc.) Increasing frequency points to something that is getting worse and worse, perhaps a hardware problem or new software that is buggy and causing the BSOD. I needed to do a crash dump analysis on the dump file to see what the cause was, and if it was the same thing over and over again or something different each time. But the recovery console doesn't let you copy files off to anything but a floppy disk, which this notebook doesn't have.

(side note: I could have removed the hard drive and put it into another computer as a secondary drive, but I was trying to be as least invasive as possible.)

So I built up a BartPE bootable CD -- this would let me boot off a CD, read / write the hard drive, and read / write an external USB drive all at the same time. Booting BartPE and I copied off a bunch of the minidumps to my USB drive.

(build a BartPE disk: http://www.nu2.nu/pebuilder/ )

With the dumps copied off, I could then use the Windows Debugging Tools on another system and looking at the crash dump files for analysis.

What is a crash dump?

Back in the old days, we wrote our programs by punching cards, wrapping some JCL (Job Control Language) cards around our program deck, and gave it to an operator through a cubby hole into the computer room. They took our program deck, ran it, and gave it back to us wrapped up with a printout of our program's execution output.

Sometimes our program went completely haywire and crashed badly. When it did, the computer not knowing exactly what to do dumped out all the memory as hex or octal numbers in a list to the line printer. This was called a 'core dump' and by looking over what was in memory when the program did its bad crash we could see exactly what it was doing at the time of the crash and hopefully figure out where our program went haywire.

Everything old is new again. The crash dump file is similar in concept to that memory dump of those old days. In fact, options will let you save different sized memory dumps in case you need more to analyze. See the box 'Write debugging information' in the screen shot here:



What is really nice is Microsoft has automated much of the manual analysis of these dump files -- you can load up a dump file into the debugger, ask it to analyze, and sometimes it will tell you which driver or program caused the problem.

In this case, the system crashed consistently when running NV4_DISP.DLL. These are the Nvidia Display drivers for this particular notebook.

(want to learn about crash dump analysis and using the Windows Debugging Tools? See:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&EventID=1032298076&CountryCode=US
Warning: ** THIS IS VERY ADVANCED STUFF ** The link might wrap, if so you'll have to unwrap it.)

I tried copying another version of that DLL file to this notebook but got the same results, so I restored the NV4_DISP.DLL file back to how it was.

The other thing I noticed was the new minidump files weren't created as the system was doing its reboot loop. They were recent, but not happening every time. So whatever was making the reboots now wasn't far enough into the boot process where the system knew enough to create new minidump files. The system also rebooted pretty quickly -- from the Windows XP splash screen and moving bar at the bottom it would flash the blue screen and immediately reboot. There is an option -- see the screen shot above -- to not automatically restart, but given the system couldn't be booted you had no easy way to uncheck that box so you could read the screen.

In cases like this, stopping time will let you see the screen, which I've had good luck using a small digital camera with a movie mode in it. That or stare intently reading as much of the screen as you possibly can, put together what the error message is.

The message read something like "Windows XP could not start because the following file is missing or corrupt: \Windows\system32\config\software" -- this is the system's software registry file.

Read "How to recover from a corrupted registry that prevents Windows XP from starting" at:
http://support.microsoft.com/kb/307545

This article talks about copying older registry copies from the repair directory on the hard drive to the main registry. Unfortunately, the repair directory has what registry file was created when you first installed your computer's operating system. I'm unaware of anything that updates that repair directory with a newer copy.

Ok, so the software registry hive is corrupted.
Following the article's recommendations, I copied a backup of the registry from when the system was first installed from c:\windows\repair\software to c:\windows\system32\config\software.

The good news: Copying this older software registry file let the system boot. The bad news: None of the software installation keys that were created since the initial install on 5/4/2007 were there. The repaired registry was 8,356KB big. The bad registry was 28,928KB big. 20MB of registry keys were gone.

But at least the notebook would boot now.

All kinds of programs that were attempting to start weren't working right, nor should they be expected to work right -- their critical information had been lobotomized out of the registry.

This was 1:00 AM on Friday. I started a disk scan to see if there were bad sectors and went to bed. I'd been poking at the system for a little over an hour.

Friday 7:30 AM I resumed working on the notebook. Disk scan seemed OK, but the old registry file still gave CRC errors when I tried to copy it, so it is likely a 'soft' error on disk -- one that would be corrected if the bad block was written to with zeroes. These are often caused by abruptly powering off while the disk is writing.

So, how do we get the software registry back? I searched around on the web and the only thoughts related to System Restore. I ran it but system restore found no restore points to restore from except the one just taken that morning when the system booted.

Some interesting system restore reading:
http://www.microsoft.com/windowsxp/using/helpandsupport/learnmore/systemrestore.mspx
http://support.microsoft.com/kb/306084

Apparently this was a chicken and the egg problem -- you need the software registry to know what the system restore points are, and the only registry file that knows about your restore points are stored inside the restore point!

In my opinion, System Restore was the "killer feature" included in XP that made upgrading from Windows 2000 a must-do for most people. Prior to System Restore, it was easy to turn your system into a brick that required reinstallation, and System Restore lets most users take a step back in time and get their system working again after one of these bad program installations, without major file surgery.

I searched the web for how to manually restore a file out of a restore point but found nothing. But I did find where the restore points are saved... Hmmmmm -- another puzzle to solve -- how to manually restore a registry hive out of a system restore point. I couldn't find anything about this topic anywhere on the web.

So again booting BartPE I poked around in the hidden location that stores the restore points (c:\System Volume Information, but you can't look inside from a booted Windows XP system -- if you want to explore it, boot off a BartPE disk), found the structure of the restore points, and located a file "_REGISTRY_MACHINE_SOFTWARE" size 28,218KB dated 2/29/2008. Can you say "Close enough!" ??

I renamed the 1st restore of the software registry file and copied this version onto the software registry hive's correct location, in essence manually restoring just that one file from the system restore point that system restore couldn't see because it was looking at an old registry file.

Rebooted.

And the notebook started normally! All the programs were installed as they were before, his system is still happy and healthy, and he didn't have to wipe it out and reinstall everything from scratch.

End to end, I spent 2 hours on this notebook. The clock time was longer, but I don't count the time overnight while the disk is scanning. In the interest of a shorter article, I skipped the blind alleys like trying to start with debug logging and stuff.

The important take-away: Almost any other tech support group on the planet Earth would re-install this system from scratch to bring it back to life, tearing down and rebuilding the house to get rid of the squeak.

Which is a perfectly valid solution to the problem. Its just not my solution.

I prefer to think through the problem and with a little effort actually fix what was wrong instead of tear the whole house down. That's just my philosophy. I could be wrong.

Follow up: I did update his NVidia display drivers, but the system still BSODed. But now he paid attention to when it happened -- only when playing World of Warcraft. Apparently WoW is notorious for BSODs in the video subsystems. Perhaps vendors should use that to test their drivers? Anyway, he is still tweaking settings to try to get it stable while playing the game.

-----

Case #2: Same symptoms (mostly....)

I'd already written Case #1 and in fact used that as a template to bring the system in this case back to life. I have photos of various screens for this scenario. Curiously, this belonged to a client's teenage daughters. Must be something with teenagers this year.

This system's boot loop came up with the following blue screen error message:

STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\SystemRoot\System32\Config|DEFAULT
or its log or alternate.
It is corrupt, absent, or not writable.
Beginning dump of physical memory

Click on any of the photos to bring up a full-sized image.

From this error message, the system can't read one of its important registry files. Having seen this type of error before in Case #1, the next step was to boot into a BartPE disk

Navigating into the place where Windows XP's System Restore function saves copies of the registry, I was digging for a recent copy of the default file that appeared unreadable.



Here you can see all the restore points. Picking a recent one and drilling inside, I found where the registry files had been saved

Grab the _REGISTRY_USER_DEFAULT file and copy it (you might need a different file if your problem was not in the default file -- for example, in Case #1 I needed _REGISTRY_MACHINE_SOFTWARE)

Navigate out of the "C:\System Volume Information\_restore"... directory and into c:\windows\system32\config, then paste the file there (see the arrow in the photo below)

Now rename the other file -- the one that appears corrupt -- so you can rename the new file to the original name and thus replace it.

A curious thing happened here -- BartPE's file browser froze up on me while renaming the file. If you look carefully at the screen the line with 'default-OldSaved' which is what I renamed the default file to is stuck and hasn't redrawn its gray bar on the file name yet. This was after waiting a few minutes.

So I rebooted BartPE and initiated a chkdsk disk scan

About 15 minutes later, look what came up:

See the 2nd and 3rd lines from the bottom? They say "Windows replaced bad clusters in file 54037 of name \WINDOWS\system32\config\DEFAUL~1"

DEFAUL~1 is the "8.3" form of the filename I was trying to rename. Apparently there was a bad cluster in the disk sector that held the file name, and that prevented both the boot code from finding the file and BartPE from renaming it as well.

With the bad cluster replaced, I could successfully rename both the default file to default-OldSaved as well as the recovered file _REGISTRY_USER_DEFAULT to default. The files renamed without any issues and the system then booted as it had before the bad cluster on the hard drive.

This technique is pretty generic for replacing any corrupted registry file from a system restore's copy of that file.

Unfortunately, this system also suffered from a non-trivial bit of malware. Cleaning that would be the subject of a different article.
-----

If you found this helpful or not, please send me a brief email -- one line will more than do. If I see people need, want, and / or use this kind of information that will encourage me to keep creating this kind of content. Whereas if I never hear from anyone, then why bother?

I can be reached at:

das (at-sign) dascomputerconsultants (dot) com

Enjoy!

David Soussan
(C) 2008 DAS Computer Consultants, LTD. All Rights Reserved.

-----------
Everything below this line is text to help search engines find this content:
BSOD Blue Screen Of Death
Corrupt Registry Recovery
Reboot loop
Boot loop
Manually recover files from System Restore and System Restoration
WoW
Blue Screen Help
Blue Screen Repeating
Blue Screen Loop

STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\Windows\system32\config\software
\Windows\system32\config\default
\Windows\system32\config\security
\Windows\system32\config\system
 

Footer