[wplug] crashh! ext2, fsck, and duplicate blocks

Bill Moran wmoran at potentialtech.com
Mon Nov 1 09:17:45 EST 2004


Brandon Kuczenski <brandon at 301south.net> wrote:
> I have a few questions about system administration practices.
> 
> I am running a FreeBSD server (okay, okay, should I send it to wplug-bsd?
> I think these questions are of general interest, though)

It is rather BSD-specific.

The ext2 drivers are not as well maintained as the UFS drivers in FreeBSD
(since, there's generally no reason to use ext2)

If this is a production system, I'd take the time/effort to move these
partitions to UFS.

Looking through the CVS tree, there hasn't been any work on the 4.x
version of the ext2 drivers in almost 2 years.  The stuff in 5.x is
a bit more active than that.  That's either the sign of an extremely
mature driver, or a driver that developers aren't interested in.  I
suspect the latter is the case.

If there weren't any events that could have triggered this (i.e. crashes
or hardware problems) then you may have tripped across a bug in the ext2
code.

> and I had a
> peculiar problem.  A configuration file for one of my scripts got
> suddenly and unexpectedly filled with garbage.  The garbage looked like
> this:
> 
> ...
> t 1099176433 N:N.N.N
> t 1099176433 greetings
> t 1099176433 HTime-Received:Oct
> t 1099176433 H*F:D*com.br
> t 1099176433 H*F:D*br
> ...
> 
> When I fixed the file, it soon became garbage again.  I disabled the
> script and decided to puzzle over it for awhile.
> 
> But wait! There's more!
> 
> My server just crashed.  When it restarted, I fsck'ed the disks and found
> "duplicate blocks" -- shared by my Bayes tokens database and that
> configuration file.  Aha!  So my config file got overwritten by spam
> data.  fsck fixed the problem.
> 
> So, given all this, I have three questions: One, how in blazes (I wanted
> to say something more R-rated) did this happen?  As I said, the OS was
> FreeBSD, the filesystem was ext2 (FreeBSD doesn't support ext3, and these
> disks were migrated from a Linux system).
> 
> Two: should I run fsck on a routine (i.e. cron) basis, to catch glitches
> like this?  How often do they happen?  Or should I just wait for
> random reboots to check the disks? What is the "Right thing to do"?
> 
> Three: when my server crashes and leaves no helpful information in
> /var/log/messages (in fact, even the startup log is missing), am I just
> supposed to pretend like nothing happened?  How do I find bugs if there
> are no logs?
> 
> Here's my /var/log/messages surrounding the time of the reboot (the first
> several lines are just IP filter data):
> 
> 
> Oct 31 14:27:49 ocean ipmon[83]: 14:27:48.985723 rl0 @0:17 b 209.195.143.195,2077 -> 209.195.172.207,3127 PR tcp len 20 48 -S IN
> Oct 31 14:28:46 ocean ipmon[83]: 14:28:45.874701 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
> Oct 31 14:28:49 ocean ipmon[83]: 14:28:48.785276 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
> Oct 31 14:28:55 ocean ipmon[83]: 14:28:54.896598 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
> Oct 31 14:34:27 ocean ipmon[83]: 14:34:27.219536 rl0 @0:17 b 63.205.221.242,4325 -> 209.195.172.207,1433 PR tcp len 20 48 -S IN
> Oct 31 14:34:31 ocean ipmon[83]: 14:34:30.216077 rl0 @0:17 b 63.205.221.242,4325 -> 209.195.172.207,1433 PR tcp len 20 48 -S IN
> Oct 31 17:03:37 ocean /kernel: e
> Oct 31 17:03:37 ocean /kernel: dscheck(#ad/0x3000a): b_bcount 1 is not on a sector boundary (ssize 512)
> Oct 31 17:03:37 ocean last message repeated 11 times
> Oct 31 17:03:37 ocean /kernel: IP Filter: v3.4.31 initialized.  Default = pass all, Logging = enabled
> Oct 31 17:03:40 ocean ipmon[84]: 17:03:40.513362 rl0 @0:17 b 209.195.138.37,2091 -> 209.195.172.207,3127 PR tcp len 20 48 -S IN
> Oct 31 17:03:44 ocean ntpd[116]: ntpd 4.1.0-a Tue May 25 21:15:34 GMT 2004 (1)
> Oct 31 17:03:44 ocean ntpd[116]: kernel time discipline status 2040
> Oct 31 17:04:02 ocean login: ROOT LOGIN (root) ON ttyv0
> 
> Sometime before the line that reads "/kernel: e", the system rebooted.  I
> mean, WTF?
> 
> Little help?
> -Brandon
> 
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug


-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com


More information about the wplug mailing list