[wplug] FileSystem problems

Mike Griffin mike at dmrnetworks.com
Tue Jul 15 14:07:19 EDT 2003


I understand the points you're making, and they're good ones. What I'm 
getting at is this.
What happened to the first HDD happened to the second HDD one day 
later. I know that that data is gone from the second HDD until a 
format, even then that integrity of the HDD cannot be trusted, but I 
should just get new HDDs to replace the ones I have. I don't know if I 
want to stick new drives into this system seeing how both drives went 
"bad" less than 24 hours a part.
The power on the system is the same as it's been, short of sticking an 
oscilloscope in the power socket. I have the mail server running from 
the same circuit (these two machines are actually exactly the same 
(hardware wise and OS)  and the second machine hasn't shown anything of 
a hitch. Neither machine gets moved nor touched except via ssh.
Perhaps the power supply could be going bad. I've had that happen 
before where it started killing HDDs. hmm, thanks for reminding me!

Mike


On Tuesday, July 15, 2003, at 01:31  PM, Vanco, Donald wrote:

> Mike Griffin <mailto:mike at dmrnetworks.com> wrote:
>> I seem to be having some problems with one of my servers and was
>> wondering where to start troubleshooting the machine. I would imagine
>> that it's at the hardware level.
>>
>> A fileserver crashed yesterday with a kernel panic. This machine has
>> been running for nearly a year, solid. I started getting errors on the
>> hardrive during writes to the drive, and accessing was very slow. I
>> ran maxtor utilites on the drive and it found a few problems that the
>> software fixed. I reinstalled the server OS (RH7.3) and performed my
>> data recovery from backups. My backups are stored on the same system
>> but on a different HDD, which gets mounted with a script everynight
>> and has tarballs written to it, I also ran fsck -t ext3 on this drive
>> (/dev/hdb1).  I checked for a new backup this morning, and all was
>> well. I just tried mounting the drive a few minutes ago and had a ton
>> of bad sector attempt timeouts saying it cannot find a valid FAT
>> partition. I tried to run fsck on this partition I get this as a
>> result:
>>
>> [root at fileserver root]# fsck /dev/hdb1
>> fsck 1.27 (8-Mar-2002)
>> e2fsck 1.27 (8-Mar-2002)
>> fsck.ext2: Attempt to read block from filesystem resulted in short
>> read while trying to open /dev/hdb1
>> Could this be a zero-length partition?
>> [root at fileserver root]# fsck -t ext3 /dev/hdb1
>> fsck 1.27 (8-Mar-2002)
>> e2fsck 1.27 (8-Mar-2002)
>> fsck.ext3: Attempt to read block from filesystem resulted in short
>> read while trying to open /dev/hdb1
>> Could this be a zero-length partition?
>>
>> These are two different ATA drives. One is a 20G and one is a 10G. I
>> thought it was kind of weird that this would happen to both drives one
>> day apart. possibly a controller problem on the motherboard?
>
> 	Are all your cables (power and data) in good shape and well seated?
> 	Any chance there's been a change in quality of the power?
> 	Do BIOS and kernel see the drive geometry in like fashion? (there
> was a time, circa RH6.1 or .2, that fdisk added an "extra" cylinder to 
> the
> drive - fun!)
> 	Do you have a "like system" you can swap the drives into to see how
> they behave?
>
> 	IMHO - using IDE HDD tools to "fix" a drive that's reporting media
> errors is like putting a band-aid on a leper.  Underneath, it's still
> rotten.  If you've got a drive spitting bad sector errors it's time to
> $h!tcan the drive and get a new one... MaxTools may delay death, but 
> it's
> still terminal, and data loss is almost assured.
>
> Don
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
>




More information about the wplug mailing list