[lug] Dual 1Ghz SMP Server Takes "Long Pause"

Rob Riggs rriggs at doubleclick.net
Fri Nov 17 18:19:10 MST 2000

I have a couple of Dell 2450 1Ghz SMP servers running Oct. KRUD.
Both seem to have the really odd problem of taking 40 minute pauses
when under high load and swapping. The servers respond to pings,
but that is it. Activity that generates multiple log entries per second
cease, with a 40+ minute gap showing in the syslog. No response
from console (either tty0 or ttyS0), nor from ssh sessions.

At first I thought these machines were locking up... until I left one in

its locked up state while going out to lunch. The machine was back
to normal when I returned. It is truly the oddest behaviour I have
ever witnessed from my Linux servers.

Now, I have a number of dual 866Mhz systems configured exactly
the same as the dual 1GHz boxes, and they do not have this
problem. Sure, response slows down when running at 1+ load
average and 256MB swapped out, but nothing siezes up for
40 minutes.

Has anyone seen or heard of this phenomenon before? Any
ideas on what could cause this and, preferrably, a known fix?

Rob Riggs
Unix System Administrator
DoubleClick/DARTmail - Broomfield, CO

