Back it up – pt II (restic, rdrive, and google)

Fri May 12 2023

In my last post, I talked about the important of backing up, and how I do it. The upshot is I use a cloud file provider, which automatically synchronises my data, keeps a file history, and allows delete restore. There are many options here – I settled on Sync.com because it is zero-trust out of the box, is reasonable value, enables file and folder sharing, and generally seems good.

In the post before last, I outlined my website setup, which is a Linux box running a collection of Docker containers. Unfortunately Sync.com doesn’t have a Linux client or API yet, so I can’t directly use the same approach. Also, part of the backup is of the MySql database.

There is also a much stronger requirement for incremental backups – you want a backup every day, and the data probably hasn’t changed very much since the previous day’s backup. However, you may also need to go back several days or weeks to a specific backup.

This is very much like source control for code, which in fact I also use as part of the backup strategy.

TL;DR

I use cron to run a script which:

Does mysqldump of the databases to text files,
Uses Restic to make an incremental encrypted backup of these and the filesystem to my NAS and (via rclone) to Google Drive.
ntfys me of the outcome.

All the configuration, static HTML, CSS and so in is stored in a BitBucket.org repository.

Old school

Initially I had some very simple backup scripts, which did the following:

Create a dated backup folder on the NAS.
Dump the database into the folder.
Make a tarball of all the files.
Also mirror the files to a ‘live’ folder on the NAS.

This was ok as far as it went. It did need manual intervention to delete the old folders from time to time, and to copy to a USB stick occasionally. It’s pretty inefficient in terms of runtime and storage. Each backup takes up around 5GB (mainly due to all the photos on photo.eutony.net). It also doesn’t provide an offsite backup, so not really ideal.

Shiny and new

In my general overhaul of everything I decided I needed a new backup approach as well. It had to satisfy the 3-2-1 requirement, be fully automated, but also be elegant and efficient.

In my research I came across Restic, which ticks all the boxes. It is a encrypted, block-based incremental/differential backup system. So I can backup the the entire filesystem every day, but only the changes since the previous backup will be stored. Furthermore, a full history of the filesystem going to back to the first backup is retrievable. Restoring a particular snapshot will provide the entire filesystem at that point of snapshot.

In that regard, it is very much like a Git repository, just minus the branches.

The output from Restic looks like this:

using parent snapshot 6cc86ebd

Files:           0 new,     4 changed, 10136 unmodified
Dirs:            0 new,    11 changed,   723 unmodified
Added to the repository: 20.222 MiB (20.224 MiB stored)

processed 10140 files, 5.500 GiB in 0:46
snapshot 0b1c0bc4 saved

So you can see it’s processing 10,136 files across 734 directories in a 5.5 GB archive, and added 20MB for the 4 changed files. And all in 46 seconds.

This is all good and well for the file-system, but what about the database?

Well, I use mysqldump to write a plain text file of SQL to a folder that is including in the Restic backup. Actually I’ve got 3 databases, so it’s 3 files. The plain text obviously makes the individual files bigger, but it makes it easier for Restic for chunk it up, and only store the deltas, not the whole file.

Backup Storage

So Restic will roll up my backups into a nice snapshotted repository – but where does that live?

Well, in keeping with the 3-2-1 approach, I actually use two repositories. One is hosted on my NAS (Restic plays nicely with ssh, using sftp), and the other is on Google Drive.

“But wait”, I hear you say, “how to you access Google Drive from a Linux command shell – and anyway, didn’t you say you didn’t trust Google not to look at your data?”. Turns out both of these are simple to address, using Rclone to access Google Drive, and Restic’s built in file encryption.

Setting up Restic and Rclone was pretty straightforward, and the docs are good. I’ve done a single test restore, which went without a hitch. And my backup script verifies the integrity of the repository every day, and pushes the log file to my phone via ntfy.

So, in all it’s glory, my backup script, which is run from crontab every night looks like this. You will of course understand that I’ve removed credentials and network information.

#!/bin/bash

resticpw_file=/home/backup/.secret/resticpw.txt
log_file=/tmp/backup.txt

# Dump the MySql databases
mysqldump --opt --create-options --add-drop-table -h 127.0.0.1 \
    eutony_net --default-character-set=utf8 > 
    /home/backup/db/eutony_net.sql
mysqldump --opt --create-options --add-drop-table -h 127.0.0.1 \
    gallery3 --default-character-set=utf8 > 
    /home/backup/db/gallery3.sql

# Output the files to the log file, for validation
echo "**DB**" > $log_file
echo "" >> $log_file
ls -l /home/backup/db >> $log_file
echo "" >> $log_file

# Restic backup to the NAS
echo "**NAS**" >> $log_file
echo "" >> $log_file

restic -r sftp://someone@192.168.xxx.xxx:/backups/pi/restic backup \
   --password-file $resticpw_file \
   /home \
   /var/www \
   --exclude ".git" \
   --exclude "logs" \
   --exclude "wordpress" \
   --exclude "!wordpress/wp-content/wp-uploads" \
   --exclude "!wordpress/wp-config.php" \
   --exclude "/home/backup/source" \
   --exclude "/home/backup/.*" >> $log_file 2>&1
echo "-------" >> $log_file

# Restic check of the NAS repo
restic -r sftp://someone@192.168.xxx.xxx:/backups/pi/restic check \
  --password-file $resticpw_file \
  --read-data-subset=10% \
  >  /tmp/backup-check.txt 2>&1
tail -n 1 /tmp/backup-check.txt >> $log_file
echo "-------" >> $log_file 2>&1

# Restic backup to the Google using rclone

echo "" >> $log_file
echo "**Google**" >> $log_file
echo "" >> $log_file

restic -r rclone:GoogleDrive:/backups/pi/restic backup \
   --password-file $resticpw_file \
   /home \
   /var/www \
   --exclude ".git" \
   --exclude "logs" \
   --exclude "wordpress" \
   --exclude "!wordpress/wp-content/wp-uploads" \
   --exclude "!wordpress/wp-config.php" \
   --exclude "/home/backup/source" \
   --exclude "/home/backup/.*" >> $log_file 2>&1

echo "-------" >> $log_file
# Restic check of the Google drive repo
restic -r rclone:GoogleDrive:/backups/pi/restic check \
  --password-file $resticpw_file \
  > /tmp/backup-check2.txt 2>&1
tail -n 1 /tmp/backup-check2.txt >> $log_file 2>&1
echo "-------" >> $log_file

# Send a push notification of the backup and log file via ntfy.sh
curl -H "Title: Backup Pi to NAS" \
     -H "Tags: pi,computer" \
     -T $log_file \
     https://ntfy.sh/my-secret-backup-topic > /tmp/ntfy.log 2>&1

I’ve chosen to omit a few files and directories from the restic backup which don’t need to be backed up in this way, which has made the restic command look more complex then it really is.

The files are encrypted with a key stored in ~/.secret/resticpw.txt which need to be stored securely in multiple places, as without it you can access the backup!

Mine key looks a bit like a Bitwarden fingerprint phrase – but you’ll have to forgive me for not going into any more details than this.

Speaking of Bitwarden, watch this space for all things password, coming soon.