Verifying Backups
How do I know if my backups are working? Ideally, you should test a full restore of a system to make sure that every part of the process works. I do this, but I don't think very many people do. Even just running a test restore onto a temporary directory or partition can tell you quite a bit. But still, how do I know if what I've restored is the same as what I backed up?One way to do this is to compare the trees. A simple way to do this is:
tar -cf - -C /dir1 . | tar -df - -C /dir2
which will compare the contents of dir1 and dir2. The biggest problem with this approach is that the older the backup you are testing is, the more likely it is that the filesystem has changed since running the backup.Another approach, is to compute hashes of the files, and check them: find . -type f | xargs sha1sum > SHA1SUMS
and back the SHA1SUMS file up with the backup. Upon restore you can use the '-c' option to the sha1sum program to test the hashes. This works a lot better, but really only verifies the integrity of the contents, not of metadata.There are programs, most famously Tripwire that do a better job of managing this integrity. As far as I can tell, nobody uses this to verify backups. All of them seem to want absolute paths, and have no real obvious way of verifying the integrity of a backup restored into a temporary directory.A deeper problem, however, is that all of these scans involve computing the hashes of all of the files in the filesystem, for each backup. With good incremental or snapshot backups, this integrity scan can easily take longer than the whole backup itself.Since I couldn't find anything that did what I wanted, I wrote my own program Asure. It's a fairly small Python program that manages integrity snapshots of trees of files. It's update command has the useful feature that if a file's timestamps haven't changed since the last scan, the hash will not be recomputed. This does miss modifications to the file caused by underlying hardware problems, or subversions of the operating system, but tends to make the scan fast enough that it remains useful.I do feel it is important that this scan utility be a completely separate codebase than the backup software being used. It's tempting to use the file scanning libraries from jpool, since they do largely the same thing. However, much of what I'm trying to detect here is to detect bugs in jpool. Hopefully bugs in the Python code and bugs in the Scala code are less likely to happen in the same place.
