Saturday, June 21, 2014

Debugging a BSOD

A few posts back I mentioned the ongoing battle with periodic BSOD’s on our Win 7 x64 system at the church house.

So I was finally able to find the time to pull the MEMORY.DMP file and the minidump files for closer and more thoughtful review.

First I loaded up the minidump files in BlueScreenView from NirSoft.

3v4ubhbg.vtr

Turns out there were a whole lot more “MEMORY_MANAGEMENT” crashes than I realized!

Having watched enough recent Channel 9 and TechEd presentations lately…more than a few with BSOD/WinDbg troubleshooting, my confidence was up enough to toss the MEMORY.DMP file at Windbg to let it analyze the output to see if that gave any clues.

So I had to get it updated/loaded on my home system.  That took a bit of work in itself.

I went to download the latest version with WDK 8.1 - Windows 8.1: Download kits and tools

However every single time I tried to install it, it failed.

After about a half-hour I gave up and hit the Google.

And found this: Why does the SDK 7.1 installation fail with an "Installation Failed" message on my Windows system? - MATLAB Answers - MATLAB Central

I was using SDK 8.1 but the result was the same…as was the solution: from a comment in that post by the MathWorks Support Team:

This is an issue with Microsoft Windows SDK 7.1. It may occur under two scenarios:

1. If you have Microsoft Visual C++ 2010 SP1 (Express or Professional) installed.

2. If you have Microsoft Visual C++ 2010 redistributable packages (x64 or x86) installed.

The details on the issue from Microsoft are below:

http://support.microsoft.com/kb/2717426

http://support.microsoft.com/kb/2519277

To avoid this issue:

1. Uninstall the Microsoft Visual C++ 2010 redistributable packages (both x86 as well as x64) from “Control Panel” > “Programs and Features”. If you have trouble uninstalling them, see related solution 1-NBI41W at the bottom.

2. Install the Windows SDK 7.1. During installation, under the "Installation Options" menu, UNCHECK the "Visual C++ Compilers" and "Microsoft Visual C++ 2010" components.

3. Apply the SDK 7.1 patch from below:

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=4422

4. Reinstall the Microsoft Visual C++ 2010 redistributable packages.

x64:

http://www.microsoft.com/en-us/download/details.aspx?id=14632

x86:

http://www.microsoft.com/en-us/download/details.aspx?id=5555

OK. Got it on! Uninstalling the previous Visual C++ packages was the trick.

Next, when trying to run the WinDbg, it kept loading up symbol errors, despite my thinking I had them configured properly.

I vaguely remember covering this ground before…but I was rusty. All the guides said to use this path:

SRV*c:\WINDOWS\symbols*http://msdl.microsoft.com/download/symbols

But it didn’t like it event though it looked perfect.

Eventually, I found a “space” tacked on to the end of the string (user select/copy error I suppose) and got it cleaned up. Then OK.

The default Bugcheck Analysis came back:

Probably caused by : memory_corruption

Followup: memory_corruption

Next I used !analyze -v to get detailed debugging information which netted me this.

MODULE_NAME: memory_corruption

IMAGE_NAME:  memory_corruption

FOLLOWUP_NAME:  memory_corruption

DEBUG_FLR_IMAGE_TIMESTAMP:  0

MEMORY_CORRUPTOR:  LARGE

STACK_COMMAND:  .cxr 0xfffff88005105ee0 ; kb

FAILURE_BUCKET_ID:  X64_MEMORY_CORRUPTION_LARGE

BUCKET_ID:  X64_MEMORY_CORRUPTION_LARGE

Followup: memory_corruption

And pretty much hit the limit of my current mad-crazy debugging skill…but!

I had one other clue still to process.

Although rounds of Memtest86+ and MemTest86 came back clean I did recently note several instances when I booted the system and the BIOS reported the amount of memory in the system shifting between several different sizes.

Because of my DIMM sets, that did give me a clue.  I had two OEM smaller size DIMMs and two larger DIMM sticks. The two newer/larger sticks = the lower RAM and the two OES sticks = the missing RAM.

So I opened up the case after shutting it down, and reseated all the DIMMS.

Rebooted…still lower value.

Shut down again and popped them all out, then reseated them all again, firmly seating them in the slots and making sure they clicked in.

Rebooted…now RAM fully back up.

So far after several weeks, the BSOD’s have stopped.  I suspect (at this time) that at least one of the OEM DIMMS had a flaky seating in the slot and when the system got hot, it broke a contact point, causing the BSOD and memory management error. Time will tell.

Here are some more tools and tips:

Cheers.

--Claus V.

No comments: