As part of my LifeArchive project, I had to verify that I have sufficient methods to back all my valuable assets so well that they will last for decades. Sadly, there isn’t currently any mass media storage available that is known to function for such a long time, and in any way you must prepare for losing a site due to floods, fire and other disasters. This post explains how I solved my backup needs for my entire digital legacy. Be sure to read the first part: LifeArchive – store all your digital assets safely (part 1 of 2)
The cheapest way to store data currently is to use hard disks. Google, Facebook, The Internet Archive, Dropbox etc are all known to host big data centers with a lot of machine with a lot of disks. Also at least Google is known to use tapes for additional backups, but they are way too expensive for this kind of small usage.
Disks have also their own problem. The biggest problem is that they tend to break. Another problem is that they might corrupt your data, which is a problem with traditional raid systems. As I’m a big fan of ZFS, my choose was to build a SAN on top of it. You can read more on this process from this blog post: Cheap NAS with ZFS in HP MicroServer N40L
As keeping your eggs in one basked is just stupid, having a good and redundant backup solution is the key to success. As in my previous post, I concluded that using cloud providers to solely host your data isn’t wise, but they are a viable choice for doing backups. I’ve chosen to use CrashPlan, which is a really nice cloud based software for doing increment backups. Here are the cons and the pros for CrashPlan:
- Nice GUI for both backing up and restoring files
- Robust. The backups will eventually complete and the service will notify you by email if something is broken
- Supports Windows, OS X, Linux and Solaris / OpenIndiana
- Infinitive storage on some of the plans
- Does increment backups, so you can find the lost file from history.
- Allows you to backup to both CrashPlan cloud and to your own storage if you run the client in multiple machines.
- Allows you to backup to your friends machine (this doesn’t even cost you anything), so you can establish a backup ring with a few of your friends.
- It’s still a black-box service, which might break down when you least expect
- CrashPlan cloud is not very fast: Upload rate to CrashPlan cloud is around 1Mbps and download (restore) around 5Mbps
- You have to fully trust and rely on the CrashPlan client to work – there’s no another way to access the archive except using the client.
I setup the CrashPlan client to backup into its cloud and in addition to Kapsi Ry’s server where I’m running a copy of the CrashPlan client. Running your own client is easy and it gives me a much faster way to recover the data when I need to. As the data is encrypted, I don’t need to worry that there’s also a few thousand other users in the same server.
Another parallel backup solution
Even when CrashPlan feels like a really good service, I still didn’t want to trust solely to its services. I can always somehow forget to enter my new credit card number and let the data there expire, only to have a simultaneous fatal accident on my NAS. So that’s why I wanted to have a redundant backup method. I happen to get another used HP MicrosServer for a good bargain, so I setup it similarly to have three 1TB disks which I also happend to have laying around unused from my previous old NAS. Used gear, used disks, but they’re good enough to act as my secondary backup method. I will of course still receiver email notifications on disk failures and broken backups, so I’m well within my planning safety limits.
This secondary NAS lives at another site and it’s connected with an openvpn network to the primary server in my home. It also doesn’t allow any incoming connections from anywhere outside, so it’s also quite safe. I setup a simple rsync script from my main NAS to sync all data to this secondary NAS. The rsync script uses –delete -option, so it will remove files which have been deleted from the primary NAS. Because of this I also use a crontab entry to snapshot the backup each night. This will protect me if I accidentally delete files from the main archive. I keep a week worth of daily snapshots and a few month of weekly snapshots.
One best pros with this when comparing to CrashPlan is that the files are sitting directly on your filesystem. There’s no encryption nor any proprietary client and tools you need to rely, so you can safely assume that you can always get an access to your files.
There’s also another option: Get a few USB disks and setup a schema where you automatically copy your entire archive to one disk. Then every once in a while unplug one of those, put it somewhere safe and replace it with another. I might do something like this once a year.
Verifying backups and monitoring
“You don’t have backups unless you have proven you can restore from them.” - a well known truth that many people tend to forget. Rsync backups are easy to verify, just run the entire archive thru sha1sum on both sides and verify that the checksums match. CrashPlan is a different beast, because you need to restore the entire archive to another place and verify it from there. It’s doable, but currently it can’t be automated.
Monitoring is another thing. I’ve built all my scripts so that they will email me if there’s a problem, so I can react immediately on error. I’m planning to setup a Zabbix instance to keep track, but I haven’t yet bothered.
Currently most of our digital assets aren’t stored safely enough that you can count that they all will be available in the future. With this setup I’m confident that I can keep all my digital legacy safe from hardware failures, cracking and human mistakes. I admit that the overal solution isn’t simple, but it’s very well doable for an IT-savvy person. The problem is that currently you can’t buy this kind of setup anywhere as a service, because you can’t be 100% sure that the service will keep up in the upcoming decades.
This setup will also work as a personal cloud, assuming that your internet link is fast enough. With the VPN connections, I can also let my family members to connect into this archive and let them store their valuable data. This is very handy, because that way I will know that I can access my parents digital legacy, who probably can’t do all this by themselves alone.