[wplug] File systems and Defrag
Gentgeen
gentgeen at linuxmail.org
Sun Sep 19 10:23:49 EDT 2004
That was a really cool read. I know I am reading it 4 days after fact, but I just wanted to second Robert's comment -- Thanks.
To me that is another benifit (to me at least) of linux, I have learned more things while running linux then I ever did while in the windows word.
On Thu, 16 Sep 2004 18:10:07 -0400
Bill Moran <wmoran at potentialtech.com> wrote:
> "Teodorski, Chris" <cteodorski at mahoningcountyoh.gov> wrote:
>
> > After a coworker read the article about Sun's new file system ZFS
> > (http://www.sun.com/2004-0914/feature/) a conversation about file
> > systems started. The meat of the conversation revolved around why
> > Microsoft's OS all have file systems that need defragmenter and how come
> > *nix file systems do not. From what I've read it is not so much that
> > *nix file systems do not fragment, but that they are virtually
> > unaffected by this fragmentation. Can anyone shed any additional light
> > on this? The information I've found online seems to contradict itself.
>
> Believe it or not, this has been a subject of interest and independent
> study of mine for a while now.
>
> First off, I understand Unix-like filesystems better than I do MS-based,
> so I may be off a bit on how MS filesystems work. Secondly, my study
> has been of BSD's FFS, which (to my understanding) is _very_ similar to
> ext2fs ...
>
> FFS is based on locality. Data is intentionally spread across the disk, but
> always kept close to similar data (i.e. a directory is alway kept close
> to the data to the files it contains, whereas an unrelated directory is
> stored somewhere else on disk)
>
> Additionally, a _shitload_ of research at Berkely demonstrated that small
> files are more common than large files, and that large files are seldom
> accessed all at once. As a result, files are stored contiguously up to
> a certain size, then intentionally fragmented. This has a number of
> effects: First, the disk near a directory entry has room to add new file
> data in the directory that stays close to the directory data ... thus,
> when you read a directory to find the file, then read the file, the
> head doesn't have to do a long seek. Second, when a large file is
> created, the initial part of the file is close by. If the file is a
> program, it's likely that the on-demand pager won't need the whole
> thing right away, thus, the fact that the whole file isn't in one
> spot isn't a problem. Additionally, if the file grows, there is a
> pattern of filesystem usage that allows the file to grow, while still
> being laid out on the disk in a predictable manner. Additionally, if
> a small file is added to the directory with the large file in it, there
> will still be space close by the directory to store the new file data.
>
> Additionally, FFS allocates space in big blocks that can be broken into
> smaller pieces (fragments) on demand. As a result, when a big file
> does fragment, it fragments into (for example) 8K fragments, which the
> disk can retrieve pretty quickly, since it's doing 8K at a time. NTFS,
> etc al, use 512byte blocks, so a big file can turn into a real mess if
> it fragments, causing seeks all over the disk. Additionally, since
> NTFS tries to keep everything contiguous all the time, a large file
> can seperate other files that are normally accessed together, thus
> causing lots of disk seeks. Think of it this way, worste case for
> an 800K file on FFS is 100 long seeks, whereas a maximally fragmented
> file on NTFS can be 1500 long seeks. Thus, fragmentation on FFS
> doesn't cause as much performance degradation as is possible on NTFS.
>
> Also, in practice, 100% contiguous files don't really improve disk
> performance noticably. Since 100% contiguous files are difficult to
> maintain, it just becomes a management issue. This may not be
> intuitive, but the research a Berkely determined that the work
> involved in keeping files 100% contiguous was not justified by the
> performance improvement, and they came up with a better scheme.
>
> It boils down to this:
> Tons of research into disk speed had identified which considerations
> are important, and which are negligable. Fact is, certian types of
> fragmentation cause negligable performance degradation, while other
> typs of fragmentation cause major performance degradation. FFS is
> carefully designed to minimize very bad fragmentation by carefully
> fragmenting in a manner that doesn't cause noticable performance
> problems, and allows the data on the disk to stay organized in a
> manner that is efficient. The FFS code itself optimizes disk layout
> during normal use. It's kind of a Zen thing, I think ... if you
> resist something 100%, it will overcome you, but if you flow with
> it, you can manage it.
> Microsoft grabbed a crappy FAT filesystem and threw it together to
> be easy to use, then they upgraded it without much thought (into
> NTFS) and never really thought through how to make it self-maintaining.
> As a result, NTFS does not optimize disk layout to any degree, thus
> a third-party program (defragmenter) has to be used regularly to
> maintain some semblence of an optimized layout.
>
> Why Microsoft chose this route when there was a crapload of published
> literature on all the research that was done a Berkely, is beyond me.
> It's possible that nobody at Microsoft is smart enough to understand
> the work that McKusick did, but that's just my theory.
>
> --
> Bill Moran
> Potential Technologies
> http://www.potentialtech.com
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
******************************************************************
Associate yourself with men of good quality if you esteem your own
reputation; for 'tis better to be alone then in bad company.
- George Washington, Rules of Civility
More information about the wplug
mailing list