[lug] Dual 1Ghz SMP Server Takes "Long Pause"
stimits at idcomm.com
Sat Nov 18 10:20:28 MST 2000
Rob Riggs wrote:
> I have a couple of Dell 2450 1Ghz SMP servers running Oct. KRUD.
> Both seem to have the really odd problem of taking 40 minute pauses
> when under high load and swapping. The servers respond to pings,
> but that is it. Activity that generates multiple log entries per second
> cease, with a 40+ minute gap showing in the syslog. No response
> from console (either tty0 or ttyS0), nor from ssh sessions.
> At first I thought these machines were locking up... until I left one in
> its locked up state while going out to lunch. The machine was back
> to normal when I returned. It is truly the oddest behaviour I have
> ever witnessed from my Linux servers.
> Now, I have a number of dual 866Mhz systems configured exactly
> the same as the dual 1GHz boxes, and they do not have this
> problem. Sure, response slows down when running at 1+ load
> average and 256MB swapped out, but nothing siezes up for
> 40 minutes.
> Has anyone seen or heard of this phenomenon before? Any
> ideas on what could cause this and, preferrably, a known fix?
> Rob Riggs
> Unix System Administrator
> DoubleClick/DARTmail - Broomfield, CO
> Web Page: http://lug.boulder.co.us
> Mailing List: http://lists.lug.boulder.co.us/mailman/listinfo/lug
Leave top running in a console that is visible at all times. See what
the high cpu user is at the time when it starts delay. You can open a
console in a remote box, telnet in, su root, and have a "last snapshot"
from the other box (try to run a tail -f n 30 /var/log/messages also),
if you can't leave a console open.
Do you know what chipset the servers have?
More information about the LUG