[wplug] killing process that will not die

Keir Josephson kjoseph at stargate.net
Wed Aug 6 12:31:42 EDT 2003


> > for future reference. what would i do in the case where the process is
> > waiting for disk io?
> 
> Wait? :-)
> 
> Theoretically I don't see why, but in practice I've had to do that some
> times. Perhaps the priority assigned to a kill signal is higher than
> anything, when sent by root, but is "queueable" when sent by a mortal.
> Speculating here, someone with firm knowledge of signals please shed some
> light. :-)

I don't know if this applies here, but I've seen many instances on Unix
bases OS's (i.e. HP-UX, Solaris, Linux, etc.) where occasionally an IO
operation will hang causing the process itself to freeze, and the process
in question cannot be killed even with a -9. This seems to start a chain
reaction over time that eventually affects the entire system. The load
average will slowly climb because the hung processes will take up a very
small amount of cpu time while they wait(this may take anywhere from 2-8
hours before it's noticable). Eventually even simple commands like cp, rm,
or df will hang. I can't give you a whole lot of detail as to why it
happens, but it seems to be related to the nfs subsystem in the kernel and
the role it plays in handling IO functions (a role that I'm, honestly, not
quite clear on). 

Also, please note that I've only seen this once on a Linux server. Most of
my experience with these is on HP-UX & Solaris. Unfortunately, the only
fix that I'm aware of to date is a reboot.

Hope this helps,

-Keir


> 
> -A
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
> 




More information about the wplug mailing list