[wplug] FileSystem problems

Mike Griffin mike at dmrnetworks.com
Tue Jul 15 14:42:13 EDT 2003


First to address Brandon. The machines are in an air conditioned room. 
Heat isn't much of an issue.

As for the OS.  The machine is running stock RH7.3 installed from the 
CDs without any updates (The machine had to be accessible yesterday as 
fast as possible) before the first drive went it was running RH7.3 with 
all the updates applied from RH's security mailing list.
I can't really say what version of all the utilities were installed 
before, but it was running kernel 2.4.20. The server is now running as 
base:
kernel-2.4.18-3
fileutils-4.1-10

This machine is not accessible from the internet. Its data is actually 
served via NFS to another machine which is accessed from within the  
LAN.

any other information you would like/need to know?

Mike

On Tuesday, July 15, 2003, at 02:19  PM, Vanco, Donald wrote:

> Mike Griffin <mailto:mike at dmrnetworks.com> wrote:
>> I don't know if I
>> want to stick new drives into this system seeing how both drives went
>> "bad" less than 24 hours a part.
> 	I can understand that.  You can try using Check-it Pro or another
> bootable utility to test the IDE HW.  But if you want us to look at 
> the OS
> as a possible issue we'll need to know what kernel you're running (and
> possibly what version of .... ??  fstool?  fileutils?  the name escapes
> me...)  It looks like you're running EXT3.
>
>> The power on the system is the same as it's been, short of sticking an
>> oscilloscope in the power socket.
> 	I was really referring more to the power supply in the system - they
> die frequently too.  I've had several drive issues on HPT IDE 
> controllers
> and they were resolved, oddly enough, by going to a much larger PS.  A
> voltmeter would help here....
>
>> Perhaps the power supply could be going bad. I've had that happen
>> before where it started killing HDDs. hmm, thanks for reminding me!
> 	:)
>
> Don
>
>> On Tuesday, July 15, 2003, at 01:31  PM, Vanco, Donald wrote:
>>
>>> Mike Griffin <mailto:mike at dmrnetworks.com> wrote:
>>>> I seem to be having some problems with one of my servers and was
>>>> wondering where to start troubleshooting the machine. I would
>>>> imagine that it's at the hardware level.
>>>>
>>>> A fileserver crashed yesterday with a kernel panic. This machine has
>>>> been running for nearly a year, solid. I started getting errors on
>>>> the hardrive during writes to the drive, and accessing was very
>>>> slow. I ran maxtor utilites on the drive and it found a few
>>>> problems that the software fixed. I reinstalled the server OS
>>>> (RH7.3) and performed my data recovery from backups. My backups are
>>>> stored on the same system but on a different HDD, which gets
>>>> mounted with a script everynight and has tarballs written to it, I
>>>> also ran fsck -t ext3 on this drive (/dev/hdb1).  I checked for a
>>>> new backup this morning, and all was well. I just tried mounting
>>>> the drive a few minutes ago and had a ton of bad sector attempt
>>>> timeouts saying it cannot find a valid FAT partition. I tried to
>>>> run fsck on this partition I get this as a result:
>>>>
>>>> [root at fileserver root]# fsck /dev/hdb1
>>>> fsck 1.27 (8-Mar-2002)
>>>> e2fsck 1.27 (8-Mar-2002)
>>>> fsck.ext2: Attempt to read block from filesystem resulted in short
>>>> read while trying to open /dev/hdb1
>>>> Could this be a zero-length partition?
>>>> [root at fileserver root]# fsck -t ext3 /dev/hdb1
>>>> fsck 1.27 (8-Mar-2002)
>>>> e2fsck 1.27 (8-Mar-2002)
>>>> fsck.ext3: Attempt to read block from filesystem resulted in short
>>>> read while trying to open /dev/hdb1
>>>> Could this be a zero-length partition?
>>>>
>>>> These are two different ATA drives. One is a 20G and one is a 10G. I
>>>> thought it was kind of weird that this would happen to both drives
>>>> one day apart. possibly a controller problem on the motherboard?
>>>
>>> 	Are all your cables (power and data) in good shape and well seated?
>>> 	Any chance there's been a change in quality of the power?
>>> 	Do BIOS and kernel see the drive geometry in like fashion? (there
>>> was a time, circa RH6.1 or .2, that fdisk added an "extra" cylinder
>>> to the drive - fun!)
>>> 	Do you have a "like system" you can swap the drives into to see how
>>> they behave?
>>>
>>> 	IMHO - using IDE HDD tools to "fix" a drive that's reporting media
>>> errors is like putting a band-aid on a leper.  Underneath, it's still
>>> rotten.  If you've got a drive spitting bad sector errors it's time
>>> to $h!tcan the drive and get a new one... MaxTools may delay death,
>>> but it's still terminal, and data loss is almost assured.
>>>
>>> Don
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
>




More information about the wplug mailing list