[wplug] crashh! ext2, fsck, and duplicate blocks

Brandon Kuczenski brandon at 301south.net
Sun Oct 31 17:33:30 EST 2004


I have a few questions about system administration practices.

I am running a FreeBSD server (okay, okay, should I send it to wplug-bsd?
I think these questions are of general interest, though) and I had a
peculiar problem.  A configuration file for one of my scripts got
suddenly and unexpectedly filled with garbage.  The garbage looked like
this:

...
t 1099176433 N:N.N.N
t 1099176433 greetings
t 1099176433 HTime-Received:Oct
t 1099176433 H*F:D*com.br
t 1099176433 H*F:D*br
...

When I fixed the file, it soon became garbage again.  I disabled the
script and decided to puzzle over it for awhile.

But wait! There's more!

My server just crashed.  When it restarted, I fsck'ed the disks and found
"duplicate blocks" -- shared by my Bayes tokens database and that
configuration file.  Aha!  So my config file got overwritten by spam
data.  fsck fixed the problem.

So, given all this, I have three questions: One, how in blazes (I wanted
to say something more R-rated) did this happen?  As I said, the OS was
FreeBSD, the filesystem was ext2 (FreeBSD doesn't support ext3, and these
disks were migrated from a Linux system).

Two: should I run fsck on a routine (i.e. cron) basis, to catch glitches
like this?  How often do they happen?  Or should I just wait for
random reboots to check the disks? What is the "Right thing to do"?

Three: when my server crashes and leaves no helpful information in
/var/log/messages (in fact, even the startup log is missing), am I just
supposed to pretend like nothing happened?  How do I find bugs if there
are no logs?

Here's my /var/log/messages surrounding the time of the reboot (the first
several lines are just IP filter data):


Oct 31 14:27:49 ocean ipmon[83]: 14:27:48.985723 rl0 @0:17 b 209.195.143.195,2077 -> 209.195.172.207,3127 PR tcp len 20 48 -S IN
Oct 31 14:28:46 ocean ipmon[83]: 14:28:45.874701 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
Oct 31 14:28:49 ocean ipmon[83]: 14:28:48.785276 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
Oct 31 14:28:55 ocean ipmon[83]: 14:28:54.896598 rl0 @0:17 b 209.195.87.230,2728 -> 209.195.172.207,6129 PR tcp len 20 48 -S IN
Oct 31 14:34:27 ocean ipmon[83]: 14:34:27.219536 rl0 @0:17 b 63.205.221.242,4325 -> 209.195.172.207,1433 PR tcp len 20 48 -S IN
Oct 31 14:34:31 ocean ipmon[83]: 14:34:30.216077 rl0 @0:17 b 63.205.221.242,4325 -> 209.195.172.207,1433 PR tcp len 20 48 -S IN
Oct 31 17:03:37 ocean /kernel: e
Oct 31 17:03:37 ocean /kernel: dscheck(#ad/0x3000a): b_bcount 1 is not on a sector boundary (ssize 512)
Oct 31 17:03:37 ocean last message repeated 11 times
Oct 31 17:03:37 ocean /kernel: IP Filter: v3.4.31 initialized.  Default = pass all, Logging = enabled
Oct 31 17:03:40 ocean ipmon[84]: 17:03:40.513362 rl0 @0:17 b 209.195.138.37,2091 -> 209.195.172.207,3127 PR tcp len 20 48 -S IN
Oct 31 17:03:44 ocean ntpd[116]: ntpd 4.1.0-a Tue May 25 21:15:34 GMT 2004 (1)
Oct 31 17:03:44 ocean ntpd[116]: kernel time discipline status 2040
Oct 31 17:04:02 ocean login: ROOT LOGIN (root) ON ttyv0

Sometime before the line that reads "/kernel: e", the system rebooted.  I
mean, WTF?

Little help?
-Brandon



More information about the wplug mailing list