[wplug] RE: Mysteries of md5sum

Bob Supansic rsupansic at libcom.com
Mon Oct 3 20:34:52 EDT 2005


Please excuse the hiatus in reporting back my findings on
the problem of MD5 sums.  (My original email was submitted 
7/30/05.)  The problem seemed to be some undocumented 
behavior of isoinfo and md5sum, apparent differences in 
Windows-based vs Linux-based CD burning, and most of all, 
the need to properly pad a CD burned under Linux.

1.  First, some distinctions about device files.  The base
device for a CD drive (/dev/hdc, dev/hdd, etc.) I will refer
to as the "raw" device.  A link to a raw device (/dev/cdrom) 
I will refer to as a link device.  Finally, a SCSI-emulation
of an IDE device (/dev/scd0) I will refer to as a SCSI device.

2.  isoinfo cannot be used on raw or link devices that are
SCSI-emulated: in such cases, it must be used on the SCSI
device only (/dev/scd0 in the above examples).  (It can be
used on both raw and link IDE devices that are not
SCSI-emulated.)  isoinfo is used by scripts such as Steven
Litt's RawRead to feed blocksize and blockread date to dd.
RawRead-type scripts must therefore follow these constraints.

3.  md5sum may or may not work on device files; sometimes it
produces only an input /output error message after reading
the CD.  To generate an md5 sum on an unmounted CD, the
contents should be first piped to md5sum via dd or a
RawRead-type script.

4.  md5sum produces no error message if input data is
missing.  Instead, it generates the following (seemingly
legitimate) value:
d41d8cd98f00b204e9800998ecf8427e
(This was not a good idea.)

5.  There is apparently a "read ahead bug" in the SCSI
modules of the Linux kernel. (It is universally referred to
as a "bug".  However, given the length of time it has been
around and the seeming lack of interest in fixing it, it may
well be a "feature".)

6.  The read ahead bug produces an I/O error on a CD not
properly padded with null bytes appended to the end of its
recorded data.  Steven Litt says that since a CD drive reads
data at a rate of about 63K per second, a 2-second read
padding would require 126K null bytes at the end of the CD.
  Since a sector is 2K, the option padsize=63s must be
included in any invocation of cdrecord.  (Along with the
-pad option.)

7.  The failure to include the pad options in cdrecord was
at the bottom of the problems I encountered. (The
undocumented behaviors described above just complicated the
problem and delayed its resolution.)  If you are using
cdrecord as apart of a backup system, be sure to include the
pad options.  Otherwise, attempts to read the last file on
the CD will produce an I/O error.

8.  Padding a CD copy does not seem to alter its MD5 sum.
That is, a padded CD copy should still agree with the MD5
sum of its original.

9.  A CD on which copy errors are encountered can be copied
by first mounting the CD.  Then use mksiofs to create a disk
image of the mounted file system.  Finally, burn the disk
image onto a blank CD.  At worst, the defective file will
not be copied, but will not prevent the rest of the CD from
being copied.

I mention this procedure because there is a pad option in
mkisofs.  Unlike the pad option in cdrecord, it is not
necessary; a "padded" mkisofs piped into an "unpadded"
cdrecord will not reliably produce a good copy.  It may,
however, be a good idea to include it.  This will alter a
CD's MD5 sum as compared to the original.

10.  The issue of MD5 compatibility bewteen Linux and
Windows systems is less clear, since I have not had a chance
to fully test disks created on one system and then copied on
the other.  Properly padding a Linux-burned CD can produce
an MD5 equivalent of a CD burned on a Windows system.  But
it may not work the other way; a Linux-burned CD that was
copied using Nero on a Windows machine may not show the same
MD5 sum.  I have to check this more.  If true, it would mean
that there are significant differences in the behavior of CD
burning software on the two systems.

11.  After all of that, I am sorry to admit that I am still
encountering a transient but bothersome variation in the MD5
sums reported on my CD drives.  That is, sometimes I will
get a false reading, which disappears after several
attempts.  Could it be a difference in brands of drives?



More information about the wplug mailing list