[wplug] Block device buffers and queuing -- WAS: cpu load?

Bryan J. Smith b.j.smith at ieee.org
Thu Aug 16 15:19:58 EDT 2007


Brandon Poyner <bpoyner at gmail.com> wrote:
> Just jumping in here to say that some disks are better than
> others at handling I/O requests.

To start there are several levels of buffering and queuing in any
block device operation.

- Host Kernel buffers and queuing
- Host Controller buffers and queuing
- End Device buffers and queuing

First off, the Linux kernel does a fairly good job at filling its
buffers and scheduling writes.

Secondly, if you have an "intelligent" (i.e., _true_, _onboard_
hardware intelligence -- e.g., Intel X-Scale/IOP, AMCC PowerPC,
etc...) on a host based adapter (HBA), then it really handles most of
the buffering/queuing off-load.  Furthermore, even some entry-level
host controllers (e.g., SAS, Serial Attached SCSI) are now sporting
more and more IOP (I/O Processing) with SRAM on-IC, and are more of a
"storage switch" of the bus-speed performance level (i.e., do RAID-1
or RAID-10 "for free").

[ NOTE:  Do _not_ confuse "intelligent" HBAs with Fake RAID (FRAID)
controllers, which dominate mainboard-integrated ATA RAID.  They are
little more than the 16-bit PC/AT bus on steroids -- see below. ]

Lastly, depending on the storage bus, more and more devices are now
adding queuing intelligence on-ASIC in addition to buffering.  SCSI
targets have always had this built into their command set.  But
nowdays, even Intelligent Drive Electronics (IDE) end devices on AT
Attachment (ATA) can do command queuing at the end-device in newer
3Gbps Serial ATA implementations.

More on this follows ...

> Parallel ATA (EIDE) drives are particularly bad at handling
> requests that require many seeks by the drive head. SCSI and
> SATA II have tagged command queuing and native command queuing,
> respectively, that permit the drive to make smarter decisions
> about seeking.

First off, not trying to be "anal," but I want to break some common
myths and lay-down some technical concepts ...

1.  Integrated Drive Electronics (IDE) are a generic concept.  The
idea removes the separate disk controller (e.g., MFM, RLL) and puts
the controller right on the drive.  IDE has two end-points:  the
system bus and the end device and nothing else.

2.  IDE begins with the Enhanced System Disk Interface (ESDI).  ESDI
was designed for the 4.77-8MHz, 16-bit IBM PC/AT aka "16-bit ISA"
bus, eventually allowing up to 16MBps (2byte @ 8MHz).  ESDI was still
programmed I/O (PIO) -- i.e., it leveraged the main CPU interconnect
for data transfer. 

3.  Enhanced IDE(R) -- EIDE -- is a trademark of Western Digital.  It
was a hack to the ESDI-IDE model.  Western Digital introduced the
master/slave concept with EIDE, and some new "Programmed I/O" modes. 
Since the CPU controls the bus in PIO, it was a hack at the IDE end
to allow multiple devices to share the bus.

4.  Many vendor developments on various IDE implementations
eventually resulted in the PC/AT Attachment, or simply AT (Advanced
Technology) Attachment (ATA).  ATA retains its legacy 16-bit ISA
compatibility, but augments the original block transfer command set
-- including Direct Memory Access (DMA) modes.  The end-device
directly negotiates with the end-bus a direct, burst transfer into
memory, direct memory access, without requiring the CPU interconnect
to be involved. 

[ SIDE NOTE:  As you may have guessed, the end-bus to end-device
nature of ATA, and the concept of DMA, is thrown a serious wrench
when you use the master/slave, which was never designed for DMA, only
PIO.  But that is another discussion. ]

5.  All modern ATA implementations, including Serial ATA, offer what
are known as "Ultra" DMA modes.  Although Ultra DMA is commonly known
for its DDR/QDR singaling (e.g., Ultra33 is UltraDMA mode 2, or 8MHz
at 16-bit using DDR), the biggest benefit to "Ultra" modes are CRC
checking.  The lack of any parity/CRC checking when running in DMA
modes, drastically improving integrity.

The "Ultra" DMA modes include ...
  Mode 0    8MBps      
  Mode 1   16MBps
  Mode 2   33MBps
  Mode 3   50MBps -- requires "+40 ground" (80 conductor) cable
  Mode 4   66MBps
  Mode 5  100MBps
  Mode 6  133MBps/150MBps (133MBps ATA and 1.5GBps SATA)
  Mode 7  300MBps (3.0GBps SATA-II or "native PHY" SATA)

6.  The "ATA controller" on the mainboard or add-in card is still
rather "dumb," _always_ has been, _always_ will be.  ;)  It's
glorified bus arbitrator.  It has some registers for control, which
are first setup by the system firmware (e.g., 16-bit PC BIOS Extended
Int13h Disk Services) and then the OS driver (e.g., in the 32/64-bit
OS).  Most problems arise from differences in ATA implementations
and/or setup by the firmware and/or OS driver, versus the IDE
(remember, intelligent drive electronics -- where the actual "brains"
of the controller is at) at the end-device itself.

Again, to re-clarify that, the "ATA controller" is just a "bus
arbitrator" that allows the IDE to do block DMA to/from system
memory.  It's up to the firmware/OS to set it up correct, for the IDE
device in use.

7.  At the data link "protocol" and higher layers, there is _no_
difference between any ATA implementations -- parallel (ATA) or
serial (SATA).  In fact, nearly all 1.5GBps SATA (150MBps)
implementations are a legacy parallel ATA controller with a SATA PHY
added.  It is only newer 3.0GBps SATA-II controllers that have a
"native PHY" (physical interface), but they still rely on the _exact_
same protocol.

8.  Now with all that said, there _are_ newer ATA controllers with a
new option called Native Command Queuing (NCQ).  To date, NCQ has
really only been implemented on these new, native 3.0Gbps
(300MBps/UltraDMA mode 7) SATA controllers.  Now NCQ on the SATA
device on its own doesn't give you anything, you still have to have a
firmware/OS that sets it up and drives its DMA correctly, otherwise
it will fall back to "dumb" buffering.  In fact, because many OS
drivers do not support NCQ, or have various issues (especially when
the 16-bit PC BIOS sets them up improper), many newer UltraDMA mode 7
(300MBps) devices come with an option to "slow down" to legacy
UltraDMA mode 6 (133/150MBps), which often disables the NCQ and
other, newer options.

9.  The Advanced Host Controller Interface (AHCI) is a 100%
software-based approach to give ATA a more uniform and single
management approach that is vendor-agnostic -- up to 32 devices. 
I.e., you only need one (1) AHCI driver to control _all_ ATA devices
in a system, even if some ATA channels are in the chipset, others in
an add-on cards, etc...  To date, most AHCI implementations have
still been heavily vendor-centric (under any and all OSes).

[ Additional Note:  You may see some SATA devices labeled "SAS." 
That is actually a SCSI-2 device, using a SATA-like physical layer. 
They are intelligent target devices, like any SCSI-2 device, that
communicates with an intelligent host based adapter via SCSI-2
(unlike SATA which is "dumb" at the bus arbitrator).  Most are
backward compatible with SATA channels, as are most SAS controllers
capable of using SATA devices, but with all the limitations of SATA. 
Again, AHCI is not intelligent hardware (it's 100% software, with
some 16-bit PC BIOS firmware -- at least until 32/64-bit EFI or other
replacement firmware takes old -- that helps a 32/64-bit OS zilch),
unlike a real SAS HBA. ]

> Also, not really related to updatedb, but another performance hit
> is caused by mounting a partition with atime, usually the default
> for most filesystems.  With atime enabled, very time you read a
> file, it must update (write to) the inode.  That can be a killer
> when doing things like backups.
 
People debate whether the "noatime" option helps at all.  I have seen
benchmarks that show early Ext3 performance absolutely stunk, and
newer benchmarks that show they do not.  Furthermore, XFS doesn't
seem to be affected by it.  And then there are network filesystem
considerations.  Can't comment on other Linux filesystems (no
first-hand experience with using "noatime" no them).

Personally, I always turn them off, unless needed for policy or
auditing purposes.


-- 
Bryan J. Smith   Professional, Technical Annoyance
b.j.smith at ieee.org    http://thebs413.blogspot.com
--------------------------------------------------
     Fission Power:  An Inconvenient Solution


More information about the wplug mailing list