[lug] sig 11 errors with Linux

Gary Hodges hodges at srrb.noaa.gov
Wed Jun 28 16:55:50 MDT 2000

I just sent this email to AMD.  Feel free to have a read and let me know 
if I'm in ballpark about what is wrong with my machine.



System is:

AMD K6-III 450
Tyan S1598S
Corsair 256 MB CAS2 RAM
Mylex 958 SCSI adapter
SCSI hard drives
Jaz drive
Plextor SCSI CDR
Soundblaster 16 PCI

Red Hat Linux v6.2

I'm pretty sure my K6-III 450 must be flaky.  I originally bought this
processor with a FIC 2013/2 motherboard and 128 MB of Corsair CAS2 RAM.  I
was getting signal 11 errors during kernel compiles, and suspected the FIC
board was causing the troubles.  I had the vendor exchange the FIC for a
Tyan S1598S (which, BTW, had very good reviews by consumers).  I'm still
getting signal 11, and other, errors at random points during kernel
compiles.  I also purchased another 128 MB of Corsair CAS2 RAM with the
Tyan for FIC motherboard exchange.  Note that not every kernel compile 
results in a signal 11 error, but the frequency does increase with 
repeated compiles.  I also get signal 11 errors when installing Perl 
modules during the compilation process.

I've just finished a battery of tests in an effort to try and determine
what piece of hardware is causing failures.  I wrote a script that will
compile kernels repeatedly to facilitate the tests.  To determine if the
problem was RAM, I ran my machine with one memory stick (128 MB) in each
of the first two memory slots, and got signal 11 errors with both
configurations.  I repeated the tests with the other 128 MB stick, and got
the same results.  Next I turned off all the cache, and also got signal 11
errors during kernel compiles.  I've also tried replacing the AGP video
card with a PCI card.  As a final test, I set the RAM timing and latency
settings to different (slower) values, and still got the signal 11 errors. 

Throughout these tests, I was checking the CPU and system temperatures 
periodically, and they were always withing a couple degrees of:

CPU    31 degrees C
System 55 degrees C

Now for the interesting thing...  All the above tests were run with the 
case cover(s) installed.  When I remove the side and front panels, I do 
not get signal 11 errors.  The temperatures as reported by the BIOS with 
the covers removed are:

CPU    23 degrees C
System 43 degrees C

Obviously there is a heat issue with something.  After all I've done and
replaced, I think the CPU gets flaky at some temperature above 23 degrees
C.  I read on www.amd.com that the operating temperature is 0 to 65
degrees C.  I'm guessing the small wire located under the CPU is the
thermistor that is measuring CPU temperature.  If so, I'm aware that it is
not located where AMD suggests it should be located, but even so, I don't
think it would read ~30 degrees C less than the actual temp.  Any
suggestions or comments from AMD? 

I'd rather not deal with the vendor any more, and I also thought AMD might
be interested in this problem I'm having, that's why I'm sending this

Gary Hodges

