By robb allan | Wed, 05/19/2021 - 18:59
2014-07-26T19:27:22

Everyone loves rsync. It's a fantastic tool. With rsync you can perform fast, reliable duplication of files across different filesystems, networks, and OSes, because rsync only copies those files that don't exist, or have changed, on the target. IOW, you create an initial duplicate of a set of files, and then each subsequent rsync just copies or updates those files that are new or different. For existing files, it only copies the data within the file that has changed.

This means that backups are quick and easy. It's especially wonderful because it crosses filesystems and OSes, so you can rsync from a Mac to a Windows machine or a Unix box, up to a server, across an Ethernet network, all using the same tool (though rsync must be available on both sides). In my case, I initially found it useful for backing up both of my FreeNAS (ZFS file system) boxes to external drives, one to a second, directly connected ZFS drive, and the other to an HFS+ drive on a Mac on the same Ethernet network. (Yes, I use ZFS snapshots, but for full protection the entire RAIDZ should be backed up elsewhere.)

However, basic rsync usage acts to synchronize two file trees, i.e., makes them identical. This is useful for an initial backup, but not for incremental backups. One of rsync's many options, the backup option, appears to allow for incremental backups, but it's a little cumbersome since it renames files in the process, and has limited ability to relocate earlier versions, so it's really best for an in-place duplication of different file versions, not for true incremental backing up.

That requires a different option, link-dest. Unfortunately, no GUI control is included in the FreeNAS rsync web control pane to use it. However, with just a little thoughtful scripting, a single rsync command line can be constructed that provides full incremental backups, properly dated, without file renaming. This script can then be run from the cron control pane rather than the rsync control pane, and all is well with the world.

So here's the command line:

/usr/local/bin/rsync -avm --delete --exclude=/backups --link-dest=$BACKUPDIR/`/bin/date -v-1d +%Y-%m-%d`/  /mnt/tank/  $BACKUPDIR/`/bin/date +%Y-%m-%d`/

where

  • -avm archives (set all the right permissions, etc.) the file tree, prints a verbose list of changes, and removes empty directories on the target;
  • --delete removes files and directories on the target that have moved or no longer exist on the source;
  • --link-dest compares the files to be copied against the directory listed with this option, and only copies those that have changed;
  • $BACKUPDIR is a variable specified earlier in the script (not shown);
  •  `/bin/date -v-1d +%Y-%m-%d` is a command that gets the current date, subtracts one day from it and adds that text as the directory name for link-dest (IOW, a folder created and named for the day before the script runs);
  • /mnt/tank specifies the directory with the source files to be copied (in this example, the top level filesystem of the FreeNAS box);
  • $BACKUPDIR/`/bin/date +%Y-%m-%d`/ specifies the target directory for the copy, in which the files that survive the earlier link-dest comparison will be placed; here it combines the $BACKUPDIR variable with the text resulting from the `/bin/date +%Y-%m-%d` command (IOW, the text for today's date);
  • finally, the link-dest option also instructs rsync to hard link the files in its specified directory into the target directory, so that the resulting directory will appear to contain all files from the source as though it were a full backup.

So, this command will test the files in the FreeNAS's top-level filesystem against the link-dest directory (named for yesterday), selects those that don't match, copies them to the target into a directory named for today's date, and hard links in the files from the link-dest directory.

All that remains for the script is to

  • set $BACKUPDIR to some location on an external disk to put the backup directories;
  • and perhaps add a few printf's or echo's to add to a log file what has been done;
  • place the final script somewhere in the filesystem, say, your home directory.

Finally, FreeNAS provides a GUI pane to create a cron (repeating) action. Simply choose Add Cron Job, enter /bin/sh path-to-my-script as the command, and set the time of day and month when it should execute.

This simple script will produce a series of directories, named for a sequence of dates, each containing all of the files from the source file tree in their latest version as of the date for which the directory is named. Each of these directories is an incremental backup from the day before.

Note that there are some potential hiccups. One would be if there is no backup directory named for the day before. To get around this, it would be necessary to add some additional script code to look for that directory, and if none is found, then, say, soft link against the most recent one; or, alternatively, set a variable to the date of the last previous directory, and set link-dest to that variable. I simply added a test for the presence of the folder and, if it is missing, choose the latest one:

COPY_DIRECTORY=`/bin/date -v-1d +%Y-%m-%d`
if (/bin/test -d $BACKUPDIR/$COPY_DIRECTORY); then
 LINK_DEST=$COPY_DIRECTORY;
else
 LINK_DEST=`ls $BACKUPDIR | sort -r | head -n 1`;
fi

Another problem occurs with rsyncs across a network. Unfortunately, it appears that link-dest can only look at a directory on the same filesystem on which the rsync command is running, not at one on another host. In the case of a backup to a local disk connected to a FreeNAS box, this is not an issue. But if the desired target is located elsewhere, the easy solution is to run the script on the target host, and have it pull files from the FreeNAS box. This works just fine.

---

Coda: some of the inspiration for this method came from a fine article on a similar approach, Time Machine for every Unix out there, by Michael Jakl. Some of the steps he used required more scripting and perhaps contributed to the problems several commenters had with full duplications being created. I have avoided moving or renaming the target directories to prevent that, though it's not clear such actions were the cause. In any case, Mavericks on the Mac has no problem creating hard links correctly, and of course, running the entire process on FreeNAS works just fine.