Previously, in this series, we looked at appending to and joining archives. In this tutorial we look at creating incremental backups with tar.
Incremental Backups with Tar
In creating incremental backup with tar we can provide faster and smaller backups during the week. If we want to provide great protection to our dat we may want to backup every day or perhaps more frequently than that. A full backup each time would do the job but it may take a long time, perhaps too long, and the file size of each backup would be large as everything is backed up not matter the state of the file.
An incremental backup will back up the files that have changed since the last backup. For this to work we need to create a meta file that stores information about what has changed in the filesystem since the last backup. In the event of a failure we restore each back up in the correct order.
We nomally start with a full backup of all the data. Let’s say this was Monday. On Tuesday we run an incremental backup which sotes all changes made on Tuesday. On Wednesday we run an incremental backup that backs up only the changes made on Wednesday, etc, etc. This would carry on, normally we wold start each week with anew full backup and start the sequence again.
If we have a failure on Thursday we will need to restore the backups from Monday, Tuesday and then Wednesday.
The meta data is store in the file we reference with the option –listed-incremental or -g. If the file does not exist we perform a full backup or a level 0 backup as tar names it. This would be the Mnday backup in our previous example. Each week you would need to move oe delete the meta data file to ensure that we start each week with a full backup. On Tuesday we reference that same file and as it exists this becomes a level 1 backup. Wednesday backup would be level 2 and so on.
To see this work let’s create a new directory in or home folder with a couple of files:
$ cd $ mkdir data $ echo "File1 Data" > data/file1 $ echo "File2 Data" > data/file2
With our precious data in place we will create a full backup, a full backup is detemined when the meta file does not exist. The meta file is referenced with the -g or –listed-incremental option.
$ tar --create --listed-incremental=data.snar --verbose --verbose --file=data.tar data tar: data: Directory is new drwxrwxr-x ubuntu/ubuntu 0 2018-04-03 14:00 data/ -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file1 -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file2
It is common practice to name with meta files with the extension .snar, think of this as an amalgamation of snapshot and tar. We also turn on the verbose option twice so we can see more detail about what is being backed up.
You will probably have noted that this is where the long options become cumbersome. The listed-incremental option can be shortend to -g. The complete command can be rewritten as:
$ tar -cvvg data.snap -f data.tar data tar: data: Directory is new drwxrwxr-x ubuntu/ubuntu 0 2018-04-03 14:00 data/ -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file1 -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file2
Ensure that you only run one instance of this command, if you run both commands we have listed here, the second running of the command will become a level 1 backup as the meta file exists. A full backup only occurs when the meta file is new.
We will now add a new file to the directory. When we backup after this addition, only the new file has changed so ONLY that file is backed up. This is the advantage of the incremental backups as the storage files become smaller and quicker to create. Make sure that we use the SAME meta file but a DIFFERENT archive.
$ echo "File3 Data" > data/file3 $ tar --create --listed-incremental=data.snar --verbose --verbose --file=data1.tar data drwxrwxr-x ubuntu/ubuntu 0 2018-04-03 14:41 data/ -rw-rw-r-- ubuntu/ubuntu 11 2018-04-03 14:41 data/file3
We see from the verbose output that only the changed data is backed up, in this case file3. Continuing through the week we edit some more files. We will now edit file2 and create the next incremental backup. We append an extra line to file2 to simulate the edit, note the use of >> for append; a single chevron overwrites the file.
$ echo "more data" >> data/file2 $ tar --create --listed-incremental=data.snar --verbose --verbose --file=data2.tar data drwxrwxr-x ubuntu/ubuntu 0 2018-04-03 14:41 data/ -rw-rw-r-- ubuntu/ubuntu 15 2018-04-03 14:47 data/file2
From the verbose output we see that just file2 is backed up in the instance. Each backup uses the same meta file but its own archive. Finally we will text a file deletion and create the next incremental backup.
$ rm data/file1
$ tar --create --listed-incremental=data.snar --verbose --verbose --file=data3.tar data
drwxrwxr-x ubuntu/ubuntu 0 2018-04-03 14:55 data/
We can see, that even with the verbosity we have set, we do not see the file being backed up. The file is not in the backup but the deletion is stored in the meta file. We can see to some extent if we list the archive we just created:
$ tar --list --verbose --verbose --listed-incremental=data.snar --file=data3.tar
drwxrwxr-x ubuntu/ubuntu 15 2018-04-03 14:55 data/
N file2
N file3
The output shows that file2 and file3 are not included in the backup. This tells us that file1 is included in the backup.
We will create our own mini disaster by deleting the data directory from our home directory. Following this disaster we will start the restore process. We do not need the meta file in the restore so we send to the /dev/null file:
$ tar --extract --verbose --verbose --listed-incremental=/dev/null --file=data.tar drwxrwxr-x ubuntu/ubuntu 15 2018-04-03 14:00 data/ -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file1 -rw-rw-r-- ubuntu/ubuntu 5 2018-04-03 14:00 data/file2 $ cat data/file1 data/file2 File1 Data File2 Data
The restoration will recreate the data directory and the original contents. We don’t have any of the edits of the additional file that we added. It was between the first backup and the second backup that we added the 3rd file; we will see this being added when we restore the next archive.
$ tar --extract --verbose --verbose --listed-incremental=/dev/null --file=data1.tar drwxrwxr-x ubuntu/ubuntu 22 2018-04-03 14:41 data/ -rw-rw-r-- ubuntu/ubuntu 11 2018-04-03 14:41 data/file3
The edit of file2 occured between the 2nd and 3rd backups. We will see restoring the next archive:
$ tar --extract --verbose --verbose --listed-incremental=/dev/null --file=data2.tar drwxrwxr-x ubuntu/ubuntu 22 2018-04-03 14:41 data/ -rw-rw-r-- ubuntu/ubuntu 15 2018-04-03 14:47 data/file2 $ cat data/file2 File2 Data more data
Finally we had the deletion of file3 which was included in the last backup:
$ tar --extract --verbose --verbose --listed-incremental=/dev/null --file=data3.tar
drwxrwxr-x ubuntu/ubuntu 15 2018-04-03 14:55 data/
tar: Deleting ‘data/file1’
The filesystem is now up-to-date to the last backup. We have looked at incremental backups in this tutorial, these require that all of the backup are restored. Next, we will look at differential backups that can be implemented with clever control of the meta files. These require only the full backup and the last incremental backup to be restored. Before the next tutorial take a look at the video: