Managing Backups

Managing system backups is a task that all system administrators are familiar with, and it’s something that no one thanks you for doing unless something goes horribly wrong. Even on a single-user personal computer running Linux, some sort of backup schedule is essential, and it’s usually only after you’ve been burned once, losing a chunk of data and files, that you realize the value of a regular backup.

One of the reasons so many systems neglect backups is that many of the backup tools are crude and difficult to understand. The dump and restore commands (called ufsdump and restore in Solaris) are typical, with five “dump levels” and an intimidating configuration file required.

A shell script can solve this problem. This script backs up a specified set of directories, either incrementally (that is, only those files that have changed since the last backup) or full backup (all files). The backup is compressed on the fly to minimize space usage, and the script output can be directed to a file, a tape device, a remotely mounted NFS partition, or even a CD burner on compatible systems.

#!/bin/sh

# backup – Creates either a full or incremental backup of a set of
# defined directories on the system. By default, the output
# file is saved in /tmp with a timestamped filename, compressed.
# Otherwise, specify an output device (another disk, or a
# removable storage device).

usageQuit()
{
cat << “EOF” >&2
Usage: $0 [-o output] [-i|-f] [-n]
-o lets you specify an alternative backup file/device
-i is an incremental or -f is a full backup, and -n prevents
updating the timestamp if an incremental backup is done.
EOF
exit 1
}

compress=”bzip2″ # change for your favorite compression app
inclist=”/tmp/backup.inclist.$(date +%d%m%y)”
output=”/tmp/backup.$(date +%d%m%y).bz2″
tsfile=”$HOME/.backup.timestamp”
btype=”incremental” # default to an incremental backup
noinc=0 # and an update of the timestamp

trap “/bin/rm -f $inclist” EXIT

while getopts “o:ifn” arg; do
case “$arg” in
o ) output=”$OPTARG”; ;;
i ) btype=”incremental”; ;;
f ) btype=”full”; ;;
n ) noinc=1; ;;
? ) usageQuit ;;
esac
done

shift $(($OPTIND – 1))

echo “Doing $btype backup, saving output to $output”

timestamp=”$(date +’%m%d%I%M’)”

if [ “$btype” = “incremental” ] ; then
if [ ! -f $tsfile ] ; then
echo “Error: can’t do an incremental backup: no timestamp file” >&2
exit 1
fi
find $HOME -depth -type f -newer $tsfile -user ${USER:-LOGNAME} | \
pax -w -x tar | $compress > $output
failure=”$?”
else
find $HOME -depth -type f -user ${USER:-LOGNAME} | \
pax -w -x tar | $compress > $output
failure=”$?”
fi

if [ “$noinc” = “0” -a “$failure” = “0” ] ; then
touch -t $timestamp $tsfile
fi
exit 0

For a full system backup, the pax command does all the work, piping its output to a compression program (bzip2 by default) and then to an output file or device. An incremental backup is a bit more tricky because the standard version of tar doesn’t include any sort of modification time test, unlike the GNU version of tar. The list of files modified since the previous backup is built with find and saved in the inclist temporary file. That file, emulating the tar output format for increased portability, is then fed to pax directly.

Choosing when to mark the timestamp for a backup is an area in which many backup programs get messed up, typically marking the “last backup time” when the program has finished the backup, rather than when it started. Setting the timestamp to the time of backup completion can be a problem if any files are modified during the backup process (which can take quite a while if the backup is being fed to a tape device). Because files modified under this scenario would have a last-modified date older than the timestamp date, they would not be backed up the next night.

However, timestamping before the backup takes place is wrong too, because if the backup fails, there’s no way to reverse the updated timestamp. Both of these problems are avoided by saving the date and time before the backup starts (in the timestamp variable), but applying the value of $timestamp to $tsfile using the -t flag to touch only after the backup has succeeded.

This script has a number of options, all of which can be ignored to perform the default incremental backup based on the timestamp for the last incremental backup. The flags allow you to specify a different output file or device (-o output), to choose a full backup (-f), to actively choose an incremental backup (-i), or to prevent the timestamp file from being updated in the case of an incremental backup (-n).

$ backup
Doing incremental backup, saving output to /tmp/backup.140703.bz2

As you would expect, the output of a backup program isn’t very scintillating. But the resulting compressed file is sufficiently large that it shows plenty of data is within:

$ ls -l /tmp/backup*
-rw-r–r– 1 taylor wheel 61739008 Jul 14 07:31 backup.140703.bz2