Details
There are now two cases here, both with similar symptoms, different root
causes, both causing the customer the exact same complaint. Both fixed
with the same method -- manually restoring a corrupted registry file
from inside of a system restore's copy of that file when you can't get
to system restore's native console.
QUICK UPDATE:
Apparently there are known issues with XP SP3 on AMD processor machines
that have certain OEM installations of Windows XP going into a Blue
Screen reboot loop. So if you found this page and have that problem,
perhaps you'll instead want to look here:
http://msinfluentials.com/blogs/jesper/archive/2008/05/07/does-your-amd-based-computer-boot-after-installing-xp-sp3.aspx
------
Case #1:
In this case, it wasn't a client -- it was my godson. There is a lot of
background knowledge information in this case.
He had a notebook computer he used for school, email, games, chatting,
... all the things a 13 year old boy uses a computer for in these modern
times. One day it just stopped booting -- came up with the Windows XP
logo screen, then blue-screened into a reboot loop. None of the other
modes worked -- no safe mode, no safe mode command prompt only, no last
known good configuration, etc. Everything resulted in the BSOD shortly
after the initial Windows XP splash screen appeared.
Over the years I've had the unfortunate stance of talking with many
manufacturer's technical support people. This one happened to be an
HP/Compaq system, but Dell, Gateway, Toshiba, eMachines, and all the
other tech support lines pretty much stink. I don't know the exact
reason why, but it is likely a combination of things -- they hire
unknowledgeable people that read from a database of problems, those that
do know what is going on and can think their way out of a paper bag get
sucked into product development, and sometimes I just can't understand
the person's words through their thick accent.
Most often their solution is "Can you reinstall from scratch? Great!
That fixed the problem. Thank you, come again!"
That is like "My stair squeak when I walk on them." and the repair guy
comes over, tears your house completely down, and builds it back with
all new materials. Kind of overkill. But believe it or not, sometimes
that is the most cost effective way to get rid of that squeak -- wipe it
out and start over. At least it is cost effective for them. They don't
care about the client, the data on the system, or the time it will take
to get all those programs re-installed and re-configured so it will work
right. I'm sure within the bowels of Microsoft there are people that are
very adept at bringing systems back from death without data loss, but
none of us mere mortals will ever talk to them via the usual technical
support channels.
My godson gave me his computer and let me have my way with it.
This particular system came with an HP Recovery Disk, which
unfortunately was only good for reinstalling the operating system. Most
standard XP Install disks allow you to boot into a 'Recovery Console',
which lets you boot a system and look around at various files. So I
grabbed one of my other XP disks, booted into the recovery console, and
had a look at various files.
If you look on your system in c:\windows\minidump, you'll see files from
every BSOD that your computer has done. Starting in December 2007, his
system had been getting them pretty regularly.
6 times in December
8 times in January
22 times in February
Here is the view off the minidump directory on my notebook:
Mine (a Gateway) has recurring problems with the video drivers. Gateway
is long done developing drivers for this model, so I get to live with
the bugs and various reboots.
Most of the time BSODs are due to bad or buggy driver code. Drivers are
the software that makes specific hardware devices work (like your
screen, sound, bluetooth adapter, etc.) Increasing frequency points to
something that is getting worse and worse, perhaps a hardware problem or
new software that is buggy and causing the BSOD. I needed to do a crash
dump analysis on the dump file to see what the cause was, and if it was
the same thing over and over again or something different each time. But
the recovery console doesn't let you copy files off to anything but a
floppy disk, which this notebook doesn't have.
(side note: I could have removed the hard drive and put it into another
computer as a secondary drive, but I was trying to be as least invasive
as possible.)
So I built up a BartPE bootable CD -- this would let me boot off a CD,
read / write the hard drive, and read / write an external USB drive all
at the same time. Booting BartPE and I copied off a bunch of the
minidumps to my USB drive.
(build a BartPE disk: http://www.nu2.nu/pebuilder/ )
With the dumps copied off, I could then use the Windows Debugging Tools
on another system and looking at the crash dump files for analysis.
What is a crash dump?
Back in the old days, we wrote our programs by punching cards, wrapping
some JCL (Job Control Language) cards around our program deck, and gave
it to an operator through a cubby hole into the computer room. They took
our program deck, ran it, and gave it back to us wrapped up with a
printout of our program's execution output.
Sometimes our program went completely haywire and crashed badly. When it
did, the computer not knowing exactly what to do dumped out all the
memory as hex or octal numbers in a list to the line printer. This was
called a 'core dump' and by looking over what was in memory when the
program did its bad crash we could see exactly what it was doing at the
time of the crash and hopefully figure out where our program went
haywire.
Everything old is new again. The crash dump file is similar in concept
to that memory dump of those old days. In fact, options will let you
save different sized memory dumps in case you need more to analyze. See
the box 'Write debugging information' in the screen shot here:
What is really nice is Microsoft has automated much of the manual
analysis of these dump files -- you can load up a dump file into the
debugger, ask it to analyze, and sometimes it will tell you which driver
or program caused the problem.
In this case, the system crashed consistently when running NV4_DISP.DLL.
These are the Nvidia Display drivers for this particular notebook.
(want to learn about crash dump analysis and using the Windows Debugging
Tools? See:
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&EventID=1032298076&CountryCode=US
Warning: ** THIS IS VERY ADVANCED STUFF ** The link might wrap, if so
you'll have to unwrap it.)
I tried copying another version of that DLL file to this notebook but
got the same results, so I restored the NV4_DISP.DLL file back to how it
was.
The other thing I noticed was the new minidump files weren't created as
the system was doing its reboot loop. They were recent, but not
happening every time. So whatever was making the reboots now wasn't far
enough into the boot process where the system knew enough to create new
minidump files. The system also rebooted pretty quickly -- from the
Windows XP splash screen and moving bar at the bottom it would flash the
blue screen and immediately reboot. There is an option -- see the screen
shot above -- to not automatically restart, but given the system
couldn't be booted you had no easy way to uncheck that box so you could
read the screen.
In cases like this, stopping time will let you see the screen, which
I've had good luck using a small digital camera with a movie mode in it.
That or stare intently reading as much of the screen as you possibly
can, put together what the error message is.
The message read something like "Windows XP could not start because the
following file is missing or corrupt: \Windows\system32\config\software"
-- this is the system's software registry file.
Read "How to recover from a corrupted registry that prevents Windows XP
from starting" at:
http://support.microsoft.com/kb/307545
This article talks about copying older registry copies from the repair
directory on the hard drive to the main registry. Unfortunately, the
repair directory has what registry file was created when you first
installed your computer's operating system. I'm unaware of anything that
updates that repair directory with a newer copy.
Ok, so the software registry hive is corrupted.
Following the article's recommendations, I copied a backup of the
registry from when the system was first installed from
c:\windows\repair\software to c:\windows\system32\config\software.
The good news: Copying this older software registry file let the system
boot. The bad news: None of the software installation keys that were
created since the initial install on 5/4/2007 were there. The repaired
registry was 8,356KB big. The bad registry was 28,928KB big. 20MB of
registry keys were gone.
But at least the notebook would boot now.
All kinds of programs that were attempting to start weren't working
right, nor should they be expected to work right -- their critical
information had been lobotomized out of the registry.
This was 1:00 AM on Friday. I started a disk scan to see if there were
bad sectors and went to bed. I'd been poking at the system for a little
over an hour.
Friday 7:30 AM I resumed working on the notebook. Disk scan seemed OK,
but the old registry file still gave CRC errors when I tried to copy it,
so it is likely a 'soft' error on disk -- one that would be corrected if
the bad block was written to with zeroes. These are often caused by
abruptly powering off while the disk is writing.
So, how do we get the software registry back? I searched around on the
web and the only thoughts related to System Restore. I ran it but system
restore found no restore points to restore from except the one just
taken that morning when the system booted.
Some interesting system restore reading:
http://www.microsoft.com/windowsxp/using/helpandsupport/learnmore/systemrestore.mspx
http://support.microsoft.com/kb/306084
Apparently this was a chicken and the egg problem -- you need the
software registry to know what the system restore points are, and the
only registry file that knows about your restore points are stored
inside the restore point!
In my opinion, System Restore was the "killer feature" included in XP
that made upgrading from Windows 2000 a must-do for most people. Prior
to System Restore, it was easy to turn your system into a brick that
required reinstallation, and System Restore lets most users take a step
back in time and get their system working again after one of these bad
program installations, without major file surgery.
I searched the web for how to manually restore a file out of a restore
point but found nothing. But I did find where the restore points are
saved... Hmmmmm -- another puzzle to solve -- how to manually restore a
registry hive out of a system restore point. I couldn't find anything
about this topic anywhere on the web.
So again booting BartPE I poked around in the hidden location that
stores the restore points (c:\System Volume Information, but you can't
look inside from a booted Windows XP system -- if you want to explore
it, boot off a BartPE disk), found the structure of the restore points,
and located a file "_REGISTRY_MACHINE_SOFTWARE" size 28,218KB dated
2/29/2008. Can you say "Close enough!" ??
I renamed the 1st restore of the software registry file and copied this
version onto the software registry hive's correct location, in essence
manually restoring just that one file from the system restore point that
system restore couldn't see because it was looking at an old registry
file.
Rebooted.
And the notebook started normally! All the programs were installed as
they were before, his system is still happy and healthy, and he didn't
have to wipe it out and reinstall everything from scratch.
End to end, I spent 2 hours on this notebook. The clock time was longer,
but I don't count the time overnight while the disk is scanning. In the
interest of a shorter article, I skipped the blind alleys like trying to
start with debug logging and stuff.
The important take-away: Almost any other tech support group on the
planet Earth would re-install this system from scratch to bring it back
to life, tearing down and rebuilding the house to get rid of the squeak.
Which is a perfectly valid solution to the problem. Its just not my
solution.
I prefer to think through the problem and with a little effort actually
fix what was wrong instead of tear the whole house down. That's just my
philosophy. I could be wrong.
Follow up: I did update his NVidia display drivers, but the system still
BSODed. But now he paid attention to when it happened -- only when
playing World of Warcraft. Apparently WoW is notorious for BSODs in the
video subsystems. Perhaps vendors should use that to test their drivers?
Anyway, he is still tweaking settings to try to get it stable while
playing the game.
-----
Case #2:
Same symptoms (mostly....)
I'd already written Case #1 and in fact used that as a template to bring
the system in this case back to life. I have photos of various screens
for this scenario. Curiously, this belonged to a client's teenage
daughters. Must be something with teenagers this year.
This system's boot loop came up with the following blue screen error
message:
STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\SystemRoot\System32\Config|DEFAULT
or its log or alternate.
It is corrupt, absent, or not writable.
Beginning dump of physical memory
Click on any of the photos to bring up a full-sized image.
From this error message, the system can't read one of its important
registry files. Having seen this type of error before in Case #1, the
next step was to boot into a BartPE disk
Navigating into the place where Windows XP's System Restore function
saves copies of the registry, I was digging for a recent copy of the
default file that appeared unreadable.
Here you can see all the restore points. Picking a recent one and
drilling inside, I found where the registry files had been saved
Grab the _REGISTRY_USER_DEFAULT file and copy it (you might need a
different file if your problem was not in the default file -- for
example, in Case #1 I needed _REGISTRY_MACHINE_SOFTWARE)
Navigate out of the "C:\System Volume Information\_restore"...
directory and into c:\windows\system32\config, then paste the file there
(see the arrow in the photo below)
Now rename the other file -- the one that appears corrupt -- so you
can rename the new file to the original name and thus replace it.
A curious thing happened here -- BartPE's file browser froze up on me
while renaming the file. If you look carefully at the screen the line
with 'default-OldSaved' which is what I renamed the default file to is
stuck and hasn't redrawn its gray bar on the file name yet. This was
after waiting a few minutes.
So I rebooted BartPE and initiated a chkdsk disk scan
About 15 minutes later, look what came up:
See the 2nd and 3rd lines from the bottom? They say "Windows replaced
bad clusters in file 54037 of name \WINDOWS\system32\config\DEFAUL~1"
DEFAUL~1 is the "8.3" form of the filename I was trying to rename.
Apparently there was a bad cluster in the disk sector that held the file
name, and that prevented both the boot code from finding the file and
BartPE from renaming it as well.
With the bad cluster replaced, I could successfully rename both the
default file to default-OldSaved as well as the recovered file _REGISTRY_USER_DEFAULT
to default. The files renamed without any issues and the system then
booted as it had before the bad cluster on the hard drive.
This technique is pretty generic for replacing any corrupted registry
file from a system restore's copy of that file.
Unfortunately, this system also suffered from a non-trivial bit of
malware. Cleaning that would be the subject of a different article.
-----
If you found this helpful or not, please send me a brief email -- one
line will more than do. If I see people need, want, and / or use this
kind
of information that will encourage me to keep creating this kind of
content. Whereas if I never hear from anyone, then why bother?
I can be reached at:
das (at-sign) dascomputerconsultants (dot) com
Enjoy!
David Soussan
(C) 2008 DAS Computer Consultants, LTD. All Rights Reserved.
-----------
Everything below this line is text to help search engines find this
content:
BSOD Blue Screen Of Death
Corrupt Registry Recovery
Reboot loop
Boot loop
Manually recover files from System Restore and System Restoration
WoW
Blue Screen Help
Blue Screen Repeating
Blue Screen Loop
STOP: c0000218 {Registry File Failure}
The registry cannot load the hive (file):
\Windows\system32\config\software
\Windows\system32\config\default
\Windows\system32\config\security
\Windows\system32\config\system
|