You don’t need to work with technology for long before you realise the importance of having a backup strategy.
The two main use cases are disaster recovery, and mitigation against accidental deletion or edit.
The first is generally more straightforward – you are simply looking to be able to restore all your data in the case of hardware failure, or catastrophic user error. The scenarios are losing or dropping your phone or laptop, hard drive failure, memory stick loss or corruption, cloud provider failure, malware, accidentally deleting an account or formatting a hard drive, and so on. Furthermore, you also need to think about robbery, and flood or fire.
And let’s be clear – with any storage media, but especially hard disks, failure is a when not an if.
The received wisdom on this is to have a 3-2-1 plan. Have 3 copies of your data, 2 on different devices, and 1 offsite. It is suggested that a further ‘offline’ copy is also taken, so that should malware or ransomware hit and all connected copies are affected, there is a copy which you can be sure is secure.
My take is what one of my lecturers told me when I was a undergraduate – If your data doesn’t exist in 3 different locations, it doesn’t exist at all! Locations here meaning virtual, rather than physical, although the data does need to be in 2 physical locations.
I use sync.com (full disclosure – this is a referral link) to store all my documents and data (including e-mail backups).
In this way, I have at least 3 copies (which are kept in sync), spread across multiple devices, including offsite. Sync.com also offer file versioning and deleted file retrieval. Nice.
I do also have a NAS (Network Attached Storage) which has all my photos and videos, but this is also mirrored on Sync.com.
My main backup strategy used to be a combination of a NAS drive and USB memory sticks. The NAS has two hard disks setup in a RAID-1 configuration, so the data is mirrored over both disks. If either disk fails, it can be replaced and will automatically re-mirror from the other one. It relies on the likelihood of both disks failing at the same time being low, which it is. The slight hesitation is that I bought both hard disks at the same time, so they are both likely to fail around the same time.
I had a script which mirrored the NAS onto one of my PC hard disks, and then periodically I would back it all up to USB memory sticks which I kept in a fireproof safe. The NAS is also useful in terms of documents which are shared (like utility bills, and photos).
This was fine as far as it went. The NAS took take of 2 of the copies on it’s own, but all the copies of the data were in the same physical location, and it relied on me being bothered to save to the USB sticks regularly, which I wasn’t great at. It also was limited in terms of recovery from accidental deletion.
So instead I now use the cloud for backup storage.
Google Drive, OneDrive, Dropbox and other storage providers have a very robust infrastructure, with multiple geographically distributed copies. I personally wouldn’t rely solely on these, but as most of them sync a copy to your hard drive, even if (say) Microsoft goes go down you haven’t lost it all. Plenty of people do rely on them, and they are whole lot better than no backup!!!
My issue with this is that Microsoft/Google/Dropbox can read your data. For some stuff I’m not too fussed about this, and they are an excellent way of sharing photos or distributing a newsletter. But I don’t really want Dropbox having access to my bank statements, say.
Instead of these I now use Sync.com. They are a zero knowledge cloud data storage provider, which means I am the only one who can access my data. It integrates with Windows and MacOs like OneDrive, so that changes are automatically synced up to the cloud.
Their free account is pretty good – 5GB of storage, which you can extend to 10GB fairly easily by getting rewarded for various actions, like installing it on a phone. If you refer a friend, you also get an extra 1GB each time. They also provide file versioning, so you can restore an older or deleted file. My family members have free accounts, and Sync.com’s excellent sharing facilities allows me to share documents, and lets them to use ‘my’ 2TB of storage with my paid plan.
I opted for a paid plan, which is $96 a year, for 2TB of storage. This is more storage than I will need for some time (my NAS also has 2TB capacity, but I’m only using 500GB). All my local documents on Windows are automatically synced with Sync.com, which satisfies my 3-2-1. The stuff on the NAS still gets mirrored to the Hard Disk, but the active folders also get mirrored to my Sync.com space.
Sync.com isn’t perfect – the desktop client is a bit clunky, there’s no API or Linux client (which is a nuisance). But in terms of value and getting zero-trust encryption it ticks my boxes, plus it’s great for sharing, and really good to get file versioning and delete recovery.
NAS backup scripts
All the copying is managed using Task Scheduler and (e.g.) Robocopy to mirror the NAS into my Sync.com directories. The scripts themselves are a simple batch file, such as this one, which mirrors a shared folder from the NAS onto my local D: drive.
So far so good
The upshot is, all my documents are stored on my local hard disk, and also on Sync.com.
All my photos and videos are stored on my NAS, but mirrored to my local hard disk, and also uploaded to Sync.com.
That just leaves e-mail.
The final piece of the jigsaw is e-mail, which is primarily stored on my e-mail provider’s IMAP server, and is partially replicated on my hard disk by my e-mail client.
Rather than assume anything about my e-mail client’s (or provider’s!) retention strategy, I manually sync all my e-mail accounts to a folder, which is in turn synced with Sync.com. I don’t bother with my Microsoft or Google e-mail addresses (I figure they have probably got that covered), but the addresses at eutony.net I do back up.
This is a little more technical as it needs a tool to do the IMAP mirroring – I use imap-backup running in a Docker container (so I don’t need to fight to the death with Windows over installing ruby, and all the scripts!!)
The Dockerfile spins up an ubuntu image with imap-backup installed:
The only piece of magic is copying
config.json, which has the account configurations, and looks a bit like this, with sensitive information removed.
docker-compose.yml then mounts a local Windows directory as
/imap-backup, so the script can save the data locally. As this folder is under my Sync.com folder, the data gets automatically stored and versioned in the cloud.
Lastly, we just need a Scheduled Task to run
docker compose up periodically.
Of course, backups are only any use if you can restore them.
With a cloud provider based approach (such as Sync.com), the files are just ‘there’. Accessing them is via the web client, or phone all, and restoring them on a new device as simple as installing the desktop client and letting it sync.
Imap-backup backs up in a standard text-based format on the disk, but also supports ‘restoring’ to a new IMAP account.
Last thing to mention is the important of logging and checking your backups. Scripts go wrong, get de-scheduled, etc. You don’t want to find this out after you’ve lost everything.
My approach is to have every backup script called by a wrapper script than handles logging what’s going on. This wrapper script is invoked by the Task Scheduler.
The sharp-eyed will notice a
curl call to ntfy.sh. This is simply a way to ping the backup result to my phone, so I can scan the logs for errors, and hopefully notice if I haven’t received one for a whole. I actually self-host my own ntfy instance, but I started off using ntfy.sh, and it works really well.
But wait, there’s more…
I don’t only have a Windows box, but also a Linux box which runs all my databases.
As I mentioned last time, the config and code is all in source control, so automatically backed-up. However the database and media files associated with the websites also need backing up, which is what I will cover next time…