[wplug-internet] Backup script

Vance Kochenderfer vkochend at nyx.net
Sat Dec 19 00:31:36 EST 2009


The TL;DR version for readers with short attention spans:
Duplicity by itself doesn't handle a lot of the tasks necessary
to ensure valid and consistent backups.  If we're going to need
a script to do this anyway, why not use the one I already wrote?

Michael Semcheski <mhsemcheski at gmail.com> wrote:
> I appreciate the effort you put into this, so here is my rebuttal...

And I appreciate yours.  I'd be delighted if more people chimed
in with their thoughts.

> Things that duplicity can do that your method can not:
> 1.) Restore to a point in time.  "Give me the state of the VPS as it
> was on mm/dd/yyyy."

This is true.  This was intentional, to keep the backups simple
(no fooling around with full/incremental backup sets).  There is
a very crude versioning effect because targets are used in a
round-robin fashion.  If you need a backup other than the latest
one, you can check with the target owners to see who may have it.
This is not very robust, though (nor is it intended to be).

> Using this method (as part of a cron job, run before duplicity) you
> don't have to shut down mysql.
> http://www.wplug.org/wiki/Simple_MediaWiki_Backup

Yes, we can do a mysqldump.  I wouldn't recommend those specific
scripts, because they don't dump all the databases.  They also
compress the dumpfile, which ruins the ability for duplicity to
only store differences.  A better approach would probably be:
  mysqldump --user=root --all-databases --opt --dump-date > foo.sql
I just ran this (twice) - the first time only took a few seconds
while the second took a minute and a half.  I suspect hits on the
web site were holding up getting table locks the second time.  We
could script things to shut down Apache to do the dump.
Compression isn't necessary as the dumpfile is less than 10MB.

It also requires an extra step to restore.  Fortunately, this
should be as simple as
  mysql --user=root -q < foo.sql

There's also mailman to consider, which uses some kind of Python
DB-like files.  I honestly don't know anything about that, so my
failsafe position was to shut down mailman to copy those.

> > 3. It uses tools that are part of the base distribution....
> > I don't know if there are any compatibility problems between
> > versions.
> 
> This is true, but it seems like in a worst case compatibility problem,
> we could always make sure to use the same version that was last used.

My concern is more with the fact that in order to restore to a
fresh new CentOS install, you need to go through enabling the
RPMForge repository in yum and install duplicity.  If we go with
duplicity, we need to have a bare-metal (or bare-VPS, I guess :)
restore procedure clearly established, tested, and documented.

> (The only thing duplicity needs on the target machine is ssh.)

Same with mine.  :)

> This is true, but by storing incremental backups you get the ability
> to go back to a point in time.

Yes, indeedy.

> And using duplicity won't increase the amount of used storage
> indefinitely, it is just a matter of tweaking the frequency of the
> full backups to get things the way we want.

True.  I don't consider this a strong disadvantage of duplicity,
and it can be mitigated with some scripting for scheduling full
and incremental backups and deleting old backup sets.

> > 5. Things are slightly simpler as viewed from the target machine.
> > ? There is only a single image file. ?Duplicity and other backup
> > ? solutions generate multiple files - if the target owner wants
> > ? to free up disk space, it's not clear which ones he can safely
> > ? delete without ruining the backup's integrity.
> 
> If the target owner wants to free up space on either method, the only
> way to do it is to delete the entire backup set.

My concern here is that the target owner can delete the full
backup (as that's going to be the biggest file) but leave the
signature file in place.  AFAICT, duplicity will happily do an
incremental backup if the signature file exists without checking
that the corresponding full data backup also exists.  It all
appears to work fine, until you need to restore.  I'll note that
this can also be mitigated with clever scripting.

With the method I was using, there is only one file.  If it is
deleted, that will throw errors right away, alerting us to the
problem.

Looking back over this, it seems to me that a non-trivial amount
of scripting is necessary to effectively use duplicity.  Upon
reflection, I feel duplicity is not a replacement for the script I
offered.  Rather, it could be used within the script as a
replacement for rsync.  I have no philosophical opposition to that
as long as there is an ironclad restore procedure.

It might not be clear from the readme, but the actual backing up
part of the script is minimal (around a dozen lines).  The rest is
all various sanity checks and doing things like connecting to the
target and stopping/starting the database.

Vance Kochenderfer        |  "Get me out of these ropes and into a
vkochend at nyx.net          |   good belt of Scotch"    -Nick Danger


More information about the wplug-internet mailing list