Backing up FreeBSD and other Unix systems

I have a 3-Tbyte server running FreeBSD-6.1 that handles versioned backups. I don't bother with encrypting the filenames or hashes because we control the box, and if I'm not at work, other admins might need to restore something quickly.

We have around 3.7 million files from 5 other servers backed up under two 1.5-Tbyte filesystems, /mir01 and /mir02. My setup looks like this:

  +-----mir01
  |      +-----HASH
  |      |      +-----00
  |      |      |      +-----00
  |      |      |      +-----01
                       ...
  |      |      +-----01
                ...
  |      |      +-----fe
  |      |      +-----ff
  |      +-----server1
  |      +-----server2
  +-----mir02
  |      +-----HASH
  |      +-----server3
  |      +-----server4
  |      +-----server5
The HASH directories have two levels of subdirectories named 00-ff. That's been more than sufficient to keep directories from getting too big; I average around 25 files per directory.

I do hourly backups on other fileservers using something like this:

  # find /filesys -newer /time/stamp -depth -print > list
  # pax -x cpio -wd < list > list.pax
  # scp list.pax some.other.box:/backups
  # rm list list.pax
  # touch /time/stamp
I ignore 0-length files because they always hash to the same value. The backup directories for the second fileserver look like this for 5 May 2009:
  +-----mir01
  |      +-----server2
  |      |      +-----2009
  |      |      |      +-----0505
  |      |      |      |      +-----070700
  |      |      |      |      |      +-----doc      (filesystem)
  |      |      |      |      |      +-----home
  |      |      |      |      +-----080700
  |      |      |      |      |      +-----doc
  |      |      |      |      |      +-----home
  ...
  |      |      |      |      +-----190700
  |      |      |      |      |      +-----home
After the backups are rsynced to the backup server, I find any regular files with only one link, compute the RMD160 hash of the contents, and make a hardlink to the appropriate filename under the HASH directory. People love to make copies of copies of files, so this really cuts down on the disk space used.

The hardlinks make it easy to avoid restoring things that aren't what the user had in mind; if a file's been corrupted, I can tell when it happened just by looking at the inode, so I don't restore an earlier version that's also junk. I can also tell if there were duplicates anywhere on the fileserver at the time the user lost the good version; it's a lot faster for them to get a known good copy from somewhere else on the fileserver than it is to restore over the network.

The software is just a few scripts to do things like find files with just one link, compute hashes, do hardlinks, etc. The real heavy lifting is done by bkdiff and mklinks.

 Name Last modified Size Description [DIR] Parent Directory 08-Jun-2009 14:17 - [TXT] backups.tar 24-May-2009 21:44 79k Incremental backup scripts