In my last post, I talked about the important of backing up, and how I do it. The upshot is I use a cloud file provider, which automatically synchronises my data, keeps a file history, and allows delete restore. There are many options here – I settled on Sync.com because it is zero-trust out of the box, is reasonable value, enables file and folder sharing, and generally seems good.
In the post before last, I outlined my website setup, which is a Linux box running a collection of Docker containers. Unfortunately Sync.com doesn’t have a Linux client or API yet, so I can’t directly use the same approach. Also, part of the backup is of the MySql database.
There is also a much stronger requirement for incremental backups – you want a backup every day, and the data probably hasn’t changed very much since the previous day’s backup. However, you may also need to go back several days or weeks to a specific backup.
This is very much like source control for code, which in fact I also use as part of the backup strategy.
cron to run a script which:
mysqldumpof the databases to text files,
- Uses Restic to make an incremental encrypted backup of these and the filesystem to my NAS and (via rclone) to Google Drive.
- ntfys me of the outcome.
All the configuration, static HTML, CSS and so in is stored in a BitBucket.org repository.
Initially I had some very simple backup scripts, which did the following:
- Create a dated backup folder on the NAS.
- Dump the database into the folder.
- Make a tarball of all the files.
- Also mirror the files to a ‘live’ folder on the NAS.
This was ok as far as it went. It did need manual intervention to delete the old folders from time to time, and to copy to a USB stick occasionally. It’s pretty inefficient in terms of runtime and storage. Each backup takes up around 5GB (mainly due to all the photos on photo.eutony.net). It also doesn’t provide an offsite backup, so not really ideal.
Shiny and new
In my general overhaul of everything I decided I needed a new backup approach as well. It had to satisfy the 3-2-1 requirement, be fully automated, but also be elegant and efficient.
In my research I came across Restic, which ticks all the boxes. It is a encrypted, block-based incremental/differential backup system. So I can backup the the entire filesystem every day, but only the changes since the previous backup will be stored. Furthermore, a full history of the filesystem going to back to the first backup is retrievable. Restoring a particular snapshot will provide the entire filesystem at that point of snapshot.
In that regard, it is very much like a Git repository, just minus the branches.
The output from Restic looks like this:
So you can see it’s processing 10,136 files across 734 directories in a 5.5 GB archive, and added 20MB for the 4 changed files. And all in 46 seconds.
This is all good and well for the file-system, but what about the database?
Well, I use
mysqldump to write a plain text file of SQL to a folder that is including in the Restic backup. Actually I’ve got 3 databases, so it’s 3 files. The plain text obviously makes the individual files bigger, but it makes it easier for Restic for chunk it up, and only store the deltas, not the whole file.
So Restic will roll up my backups into a nice snapshotted repository – but where does that live?
Well, in keeping with the 3-2-1 approach, I actually use two repositories. One is hosted on my NAS (Restic plays nicely with ssh, using sftp), and the other is on Google Drive.
“But wait”, I hear you say, “how to you access Google Drive from a Linux command shell – and anyway, didn’t you say you didn’t trust Google not to look at your data?”. Turns out both of these are simple to address, using Rclone to access Google Drive, and Restic’s built in file encryption.
Setting up Restic and Rclone was pretty straightforward, and the docs are good. I’ve done a single test restore, which went without a hitch. And my backup script verifies the integrity of the repository every day, and pushes the log file to my phone via ntfy.
So, in all it’s glory, my backup script, which is run from
crontab every night looks like this. You will of course understand that I’ve removed credentials and network information.
I’ve chosen to omit a few files and directories from the restic backup which don’t need to be backed up in this way, which has made the restic command look more complex then it really is.
The files are encrypted with a key stored in
~/.secret/resticpw.txt which need to be stored securely in multiple places, as without it you can access the backup!
Mine key looks a bit like a Bitwarden fingerprint phrase – but you’ll have to forgive me for not going into any more details than this.
Speaking of Bitwarden, watch this space for all things password, coming soon.