Why do we do backups?
A couple of years ago I awoke to the unpleasant smell of gently burning electronics. The power supply in my main mail-server had decided that, as far as it was concerned, 50 volts was as good as 12. Not much was happening with the machine. I don't remember the exact times, but I knew that after a few days, my email would start bouncing. A trip to the computer store and a few hours later, I had a nice shiny new computer all ready to go, with a completely blank hard drive.I popped a rescue disc into the machine, booted it up, setup the partitions and filesystems, connected to the network, and restored the machines backup from the night before. The only thing I had to change was to build a new kernel to support some of the new devices on the new machine.To me, this is the main reason I do backups. RAID would not have helped me in this instance, other than perhaps increasing the unpleasant smell a bit; with two drives to go bad. For this particular scenario, syncing the data to a remote machine (say rsync) would have been sufficient. However, there are other failures where this isn't good. Let's say I discover a file I overwrote with garbage last week. Rsync would be quite diligent to copy the corrupt data to the mirror, and would only give me two copies of the corrupt file.From this point of view, the ideal backup would be to make a complete copy of everything on my computer, at least once a day. Each copy should be to new media, which would allow me to restore any version of any file from the past, at least within a single day. Unfortunately, for most of this, this isn't a very practical method. I'm not really setup to buy a new harddrive everyday for the rest of my life. In addition, my computer is going to be spending a lot of it's time making copies, over and over again, of the same data.So, is there any way to get the best of both worlds? It turns out there is, several in fact. The most common technique used is known as incremental backups (or sometimes differential backups). More about this in a posted titled Incremental Backups Considered Harmful. Something even better, what jpool uses, is known as content addressable storage. More on this once I get incremental backups out of the way.
