Restic backups for my homelab

Table of Contents

Companion to the philosophy post — that one is the why, this one is the how.

My homelab is one Ubuntu laptop running about a dozen Docker stacks — a media server, a self-hosted VPN, DNS, the usual. None of it is irreplaceable on its own. The container caches regenerate, the configs live in git. But there’s a small, awkward middle layer of state I’d hate to recreate: Jellyfin’s watch history, Traefik’s LetsEncrypt certs, the WireGuard peer database, AdGuard’s persistent-client mappings.

A few hundred megabytes total, scattered across a dozen paths, most of it sqlite. Replicating any one piece is fine. Replicating all of it after a disk failure is a tedious afternoon I’d rather not have.

I picked restic because it does what I want — encrypted, deduplicated, incremental snapshots to an off-site bucket — and after setup I don’t have to think about it. Borg is in the same league but the backend story is thinner. Rsnapshot plus rclone works but you’re managing two tools. Restic just won on S3 backend support and on the snapshot model being clearly documented.

The backend: Cloudflare R2 #

Two reasons: no egress fees, and the first 10 GB/month of storage is free. My whole repo is under 20 MiB after dedup and compression, so I pay nothing. (Past the free tier it’s ~$0.015/GB-month, which would still be pennies at this scale.) It’s also S3-compatible, which is what restic speaks. Backblaze B2 is the obvious alternative and is fine; I picked R2 because I was already on Cloudflare for DNS.

In the dashboard: R2 → Create bucket. Pick the EU region if you’re EU-based — Frankfurt-area data residency is free, and your snapshots end up containing personal stuff (watch histories, hostnames, peer addresses). Then mint an API token: R2 → Manage R2 API Tokens → Create, scoped to “Object Read & Write” on that one bucket. Save the access key ID and secret immediately — the secret is shown once.

Install restic #

Ubuntu’s apt package lags by months. Pull the upstream binary instead:

curl -L https://github.com/restic/restic/releases/latest/download/restic_0.18.1_linux_amd64.bz2 \
    | bunzip2 > /tmp/restic
sudo install -m 755 /tmp/restic /usr/local/bin/restic
restic version

Secrets #

Restic needs four things: the repo URL, AWS access key ID, AWS secret, and a repository password. The password is the one that matters most — it’s what encrypts the data, and there is no recovery if you lose it. Generate it long, store it somewhere durable.

I keep all four in ~/.config/restic/env, mode 600:

RESTIC_REPOSITORY=s3:https://<account-id>.eu.r2.cloudflarestorage.com/<bucket>
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
RESTIC_PASSWORD=...

The canonical copy of all four lives in Bitwarden; a small script syncs them down to this file on bootstrap. If you don’t have a password manager in your homelab loop yet, write the restic password down on paper. Treat it the way you’d treat your laptop’s disk-encryption recovery key.

Init the repo #

set -a; source ~/.config/restic/env; set +a
restic init

You’ll see created restic repository <id> at s3:.... Test with a throwaway snapshot:

echo "hi" > /tmp/hi
restic backup /tmp/hi
restic snapshots

restic snapshots output showing nine nightly backups

What restic snapshots looks like once you’ve been running it for a few weeks.

Decide what to back up #

This is the step I got wrong on the first pass. The naive thing is restic backup /srv/homelab and call it done. Don’t:

Live sqlite databases are unsafe to copy with cp or rsync. You’ll hit half-written pages and broken WALs at restore time. The right tool is sqlite’s own .backup API, which gives you a consistent snapshot while writes continue.
Most of /srv/homelab is regenerable. Jellyfin’s cache/ is huge and rebuilds on first scan. AdGuard’s work/data/ is just stats and the query log. Backing them up wastes R2 storage and slows every run.

So I do the opposite of “snapshot everything”: a small staging script enumerates exactly the files I care about, copies them into /var/tmp/homelab-backup/<stack>/<file>, and snapshots that. For sqlite:

sqlite3 /srv/homelab/jellyfin/config/data/jellyfin.db \
    ".backup '/var/tmp/homelab-backup/jellyfin/jellyfin.db'"

For everything else, plain cp. The whole thing is around 15 MiB per snapshot. After dedup and compression on R2 it’s far less.

restic stats raw-data mode showing 6.29x compression ratio and 84% space saving

Nine snapshots, 113 MiB of source data, 18 MiB actually stored on R2. Restic’s compression and dedup pull their weight.

There’s a real tradeoff in this approach: I’m hand-curating an inclusion list, so new state added by future stacks won’t be picked up automatically. The alternative — a long exclusion list under restic backup /srv/homelab — drifts the other way: new caches, logs, and junk get snapshotted unintentionally. Inclusion is more setup, but easier to reason about a year later.

The backup script #

Mine is about 80 lines; here’s the shape:

#!/usr/bin/env bash
set -euo pipefail

STAGING=/var/tmp/homelab-backup
rm -rf "$STAGING"; mkdir -p "$STAGING"

# Per-stack staging — sqlite gets .backup, everything else cp
mkdir -p "$STAGING/jellyfin"
sqlite3 /srv/homelab/jellyfin/config/data/jellyfin.db \
    ".backup '$STAGING/jellyfin/jellyfin.db'"

# ... repeat for each stack ...

source ~/.config/restic/env
restic backup --tag nightly "$STAGING"

restic forget --prune \
    --keep-daily 14 --keep-weekly 8 --keep-monthly 12

rm -rf "$STAGING"

--prune is what actually deletes old data from R2. Without it, forget only removes the snapshot pointer; the data lingers. Pruning is slower, but it runs once a night while you’re asleep, so it doesn’t matter.

restic forget dry-run applying the 14 daily, 8 weekly, 12 monthly retention policy

Retention policy in action. Each snapshot keeps the reasons it survived — daily, weekly, monthly. Run with --dry-run first when you’re tweaking the policy.

Schedule it: systemd, not cron #

systemd timer rather than cron, for one specific reason: failure notification. Cron mails to local UNIX mail, which on a headless homelab is a black hole. systemd has OnFailure=, which lets you fire any other unit when the job fails — including one that pushes a notification to your phone.

Two units. homelab-backup-restic.service:

[Unit]
Description=Nightly restic backup of homelab state
OnFailure=homelab-backup-restic-failure.service

[Service]
Type=oneshot
User=bilal
EnvironmentFile=/home/bilal/.config/restic/env
ExecStart=/srv/homelab/scripts/backup-restic.sh

And homelab-backup-restic.timer:

[Unit]
Description=Run nightly restic backup

[Timer]
OnCalendar=*-*-* 04:30:00
RandomizedDelaySec=15min
Persistent=true

[Install]
WantedBy=timers.target

Persistent=true means if the box was asleep at 4:30, the timer fires on next boot. RandomizedDelaySec adds jitter so I’m not hammering R2 at the exact same minute every night across multiple machines.

A trap I fell into early: the obvious thing is a user timer (systemctl --user enable), since the script runs as my user. User timers need loginctl enable-linger to fire when you’re not logged in, which is one more piece of hidden state. System timers run regardless of login. The unit files have to live in /etc/systemd/system/ and need root to install — but that’s a one-time setup cost, and the result doesn’t have a coupling to your login session.

Failure push #

The companion OnFailure= unit pushes to ntfy:

[Service]
Type=oneshot
ExecStart=/usr/bin/curl -d \
    "Backup failed on $(hostname). Check journalctl -u homelab-backup-restic" \
    https://ntfy.sh/<your-uuid-topic>

Use a UUIDv4 as the topic name. Public ntfy topics are world-readable; a UUID makes yours unguessable. It’s not authentication — but for “did the backup fail” it’s enough.

Restore (and the gotcha) #

This is the step nobody tests until they need to.

Look around inside a snapshot:

restic snapshots                # all of them
restic ls latest                # contents of the most recent
restic ls latest /var/tmp/homelab-backup/jellyfin

Restore everything:

restic restore latest --target /tmp/restored

Restoring a single file is the bit that always trips me up. The intuition is:

# Wrong — silently restores nothing
restic restore latest \
    --include /srv/homelab/jellyfin/jellyfin.db \
    --target /tmp

--include matches against the path inside the snapshot, not where the file would go after restore. My snapshot contains /var/tmp/homelab-backup/jellyfin/jellyfin.db (the staging path), so the correct invocation is:

restic restore latest \
    --include /var/tmp/homelab-backup/jellyfin/jellyfin.db \
    --target /tmp

Write this down somewhere visible. Discovering the staging-path quirk during an actual recovery is the wrong moment.

Test cadence #

A backup that’s never restored is not a backup. I do two things:

Monthly — restore one random file to /tmp and diff against the live copy. Ten seconds. Catches silent backend corruption.
Quarterly — full restore drill on a throwaway path. Verifies the script restores everything in the right shape with the right perms.

If you skip the quarterly, at least keep the monthly. The case you’re protecting against is “my backups have been quietly broken for six months”, and only a restore catches it.

What I deliberately don’t back up #

jellyfin/cache — regenerates on first scan.
adguard/work/data — query log and stats, regenerable.
Everything under */logs/ — regenerable.
The .env files — these live in Bitwarden, which is its own backup story.

The gap I haven’t closed #

R2 is mutable. Any process on the box with that env file can restic forget --prune the entire repo. A defensible setup would push to a second target with object lock — Backblaze B2 supports this; R2 doesn’t yet. I’ll add it once this flow has soaked for a while. For now I’m accepting the risk and trusting that nothing else on the box gets to that env file. That’s the kind of thing a homelab post should be honest about, because nobody else’s writeup will be.