[wplug] File systems and Defrag

Gentgeen gentgeen at linuxmail.org
Sun Sep 19 10:23:49 EDT 2004


That was a really cool read. I know I am reading it 4 days after fact, but I just wanted to second Robert's comment -- Thanks.

To me that is another benifit (to me at least) of linux, I have learned more things while running linux then I ever did while in the windows word.


On Thu, 16 Sep 2004 18:10:07 -0400
Bill Moran <wmoran at potentialtech.com> wrote:

> "Teodorski, Chris" <cteodorski at mahoningcountyoh.gov> wrote:
> 
> > After a coworker read the article about Sun's new file system ZFS
> > (http://www.sun.com/2004-0914/feature/) a conversation about file
> > systems started.  The meat of the conversation revolved around why
> > Microsoft's OS all have file systems that need defragmenter and how come
> > *nix file systems do not.  From what I've read it is not so much that
> > *nix file systems do not fragment, but that they are virtually
> > unaffected by this fragmentation.  Can anyone shed any additional light
> > on this?  The information I've found online seems to contradict itself.
> 
> Believe it or not, this has been a subject of interest and independent
> study of mine for a while now.
> 
> First off, I understand Unix-like filesystems better than I do MS-based,
> so I may be off a bit on how MS filesystems work.  Secondly, my study
> has been of BSD's FFS, which (to my understanding) is _very_ similar to
> ext2fs ... 
> 
> FFS is based on locality.  Data is intentionally spread across the disk, but
> always kept close to similar data (i.e. a directory is alway kept close
> to the data to the files it contains, whereas an unrelated directory is
> stored somewhere else on disk)
> 
> Additionally, a _shitload_ of research at Berkely demonstrated that small
> files are more common than large files, and that large files are seldom
> accessed all at once.  As a result, files are stored contiguously up to
> a certain size, then intentionally fragmented.  This has a number of
> effects: First, the disk near a directory entry has room to add new file
> data in the directory that stays close to the directory data ... thus,
> when you read a directory to find the file, then read the file, the
> head doesn't have to do a long seek.  Second, when a large file is
> created, the initial part of the file is close by.  If the file is a
> program, it's likely that the on-demand pager won't need the whole
> thing right away, thus, the fact that the whole file isn't in one
> spot isn't a problem.  Additionally, if the file grows, there is a
> pattern of filesystem usage that allows the file to grow, while still
> being laid out on the disk in a predictable manner.  Additionally, if
> a small file is added to the directory with the large file in it, there
> will still be space close by the directory to store the new file data.
> 
> Additionally, FFS allocates space in big blocks that can be broken into
> smaller pieces (fragments) on demand.  As a result, when a big file
> does fragment, it fragments into (for example) 8K fragments, which the
> disk can retrieve pretty quickly, since it's doing 8K at a time.  NTFS,
> etc al, use 512byte blocks, so a big file can turn into a real mess if
> it fragments, causing seeks all over the disk.  Additionally, since
> NTFS tries to keep everything contiguous all the time, a large file
> can seperate other files that are normally accessed together, thus
> causing lots of disk seeks.  Think of it this way, worste case for
> an 800K file on FFS is 100 long seeks, whereas a maximally fragmented
> file on NTFS can be 1500 long seeks.  Thus, fragmentation on FFS
> doesn't cause as much performance degradation as is possible on NTFS.
> 
> Also, in practice, 100% contiguous files don't really improve disk
> performance noticably.  Since 100% contiguous files are difficult to
> maintain, it just becomes a management issue.  This may not be
> intuitive, but the research a Berkely determined that the work
> involved in keeping files 100% contiguous was not justified by the
> performance improvement, and they came up with a better scheme.
> 
> It boils down to this:
> Tons of research into disk speed had identified which considerations
> are important, and which are negligable.  Fact is, certian types of
> fragmentation cause negligable performance degradation, while other
> typs of fragmentation cause major performance degradation.  FFS is
> carefully designed to minimize very bad fragmentation by carefully
> fragmenting in a manner that doesn't cause noticable performance
> problems, and allows the data on the disk to stay organized in a
> manner that is efficient.  The FFS code itself optimizes disk layout
> during normal use.  It's kind of a Zen thing, I think ... if you
> resist something 100%, it will overcome you, but if you flow with
> it, you can manage it.
> Microsoft grabbed a crappy FAT filesystem and threw it together to
> be easy to use, then they upgraded it without much thought (into
> NTFS) and never really thought through how to make it self-maintaining.
> As a result, NTFS does not optimize disk layout to any degree, thus
> a third-party program (defragmenter) has to be used regularly to
> maintain some semblence of an optimized layout.
> 
> Why Microsoft chose this route when there was a crapload of published
> literature on all the research that was done a Berkely, is beyond me.
> It's possible that nobody at Microsoft is smart enough to understand
> the work that McKusick did, but that's just my theory.
> 
> -- 
> Bill Moran
> Potential Technologies
> http://www.potentialtech.com
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug


******************************************************************
Associate yourself with men of good quality if you esteem your own 
reputation; for 'tis better to be alone then in bad company.
                    - George Washington, Rules of Civility        


More information about the wplug mailing list