[wplug] NFS mount problem

Gentgeen gentgeen at linuxmail.org
Sat Jul 3 10:44:13 EDT 2004


This is fun - ain't it ;-}  Ready for the next step --

OK, I VNCed to kingpin, (remember that kingpin is headless, and putting
a monitor requires some work) and mounted an NFS share on linuxbox.  I
started the rsize and wsize both at 8192.  Then tried to copy an entire
directory from kingpin to linuxbox (ie)

   [kevin at kingpin:~] mount /mnt/linuxbox
   [kevin at kingpin:~] cp -r ~/duality.divx.avi /mnt/linuxbox

It then froze.  (duality is a 25MB file) Looking at 'netstat -i' from
kingpin showed nothing out of the ordinary.  When I looked at 'netstat
-i' from linuxbox, I found the RX-ERRs again. 

So I started fiddling with the rsize and wsize on kingpin,
  * When rsize was 8192, the copy froze - The RX-ERRs were showing up
    quickly on linuxbox, with nothing on kingpin
  * When rsize was 4096, the copy completed, but took a while. - The 
    RX-ERRs were slowly showing up on linuxbox, with nothing on kingpin.
  * Once rsize was 2048, the copy completed, but seemed slower then at
    4096. The RX-ERRs did not show up on either box. 

So to try something, I copyed the "duality" file over to my laptop
('thinkpad'), and tried all the same things.  Coping from 'thinkpad' to
'linuxbox' and from 'linuxbox' to 'thinkpad' produced NO ERRORS on
either side.

If it helps any, linuxbox is a dual boot, with win98.  When I tried
coping the same file from kingpin to linuxbox while in Windows, no
problem seemed to show up.  I booted back into Linux, and did the same
thing with samba, and everything seemed OK, but I did get 1 RX-ERR 

As of now, the only thing (I think) that I haven't done is trade out the
cards, but the problem seems to be isolated only to linuxbox, and only
when linuxbox is connected to kingpin via NFS.



On Fri, 02 Jul 2004 20:06:00
-0400 Michael Skowvron <skowvron at verizon.net> wrote:

> Gentgeen wrote:
> 
> > I guess I have used the wrong term and/or have the wrong
> > understanding. When I did consecutive 'ifconfig' while NFS was
> > 'frozen' I could watch the RX errors going up and up and up on
> > 'linuxbox' but stayed at zero on'kingpin'.
> > So is this still a hardware problem, or something else? 
> 
> You've got yourself a real interesting problem. Receive errors almost
> always indicate a hardware problem. But you're not getting a solid
> hardware failure because we know that small, interactive traffic works
> fine even when NFS is "hung." Knowing that there are receive errors is
> much better than just knowing packets are dropped. They're being
> dropped because they are bad.
> 
> The problem is definitely related to the "amount" of data being passed
> because small NFS request sizes show no errors and larger request
> sizes get progressively worse. An important thing to remember is that
> kingpin is going to send "rsize" amount of data as fast as it can when
> it transmits. Smaller "rsize"-s mean a smaller amount of packets that
> linuxbox has to ingest at one time.
> 
> >    When rsize=8192, netstat showed RX-ERR increasing by 5 for every
> >         increase of 7 in RX-OK   
> >    When rsize=4096, netstat showed RX-ERR increasing by 1 for every
> >         increase of 2 in RX-OK
> >    When rsize=2048, netstat showed RX-ERR remaining at 0, and RX-OK
> >         increasing.
> > 
> > As a note, the above is not very scientific, just a matter of
> > continously running 'netstat -i' and doing some quick
> > math.
> 
> That's as scientific as I think you need to be. Netstat can be a
> diagnostic tool just like anything else. The numbers you are showing
> are absolutely horrible. On my network of 12 or so hosts I've got
> packet counts of over 400 million and I have 0 errors. I use a lot of
> NFS with 32K request sizes.
> 
> The problem appears to be isolated to linuxbox receiving packets. To
> confirm, you should also test kingpin for heavy receive load. Instead
> of reading data from your NFS filesystem, write a lot of data to it
> from linuxbox. Use an rsize of 4k or 8k and see if rx errors start to
> show up on kingpin.
> 
> If kingpin still doesn't show errors, the problem must be isolated to
> linuxbox. At this point, I would begin to suspect that the ethernet
> card is bad or there is some sort of interrupt servicing problem. If
> kingpin does start to show errors, it could be a bug in the receive
> packet handling of the tulip driver.
> 
> I notice that you always seem to have the problem when playing audio
> files. It could be that you don't do anything else on your NFS
> filesystem, or it could be that the audio board is effecting the PCI
> bus and corrupting data coming from the ethernet card. Maybe it's a
> funny interaction beween the audio driver and the tulip driver.
> 
> Do large file copies also hang NFS? If they do, then swap the ethernet
> cards between kingpin and linuxbox. Do the receive errors follow the
> card? If they do, bad card. If not, maybe it's an IRQ problem or some
> other resource conflict on linuxbox.
> 
> On the other hand, if NFS only hangs when playing audio files, look
> for the problem to be related in some way to the audio board or the
> audio driver in the kernel.
> 
> Whew! I'll be anxious to see what you post next!
> 
> Michael
> 
> 
> 
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug


******************************************************************
Associate yourself with men of good quality if you esteem your own 
reputation; for 'tis better to be alone then in bad company.
                    - George Washington, Rules of Civility        



More information about the wplug mailing list