Setting up your backup service

2020-06-30 :: linux, tricks, tutorial, optimize everything

I just ran the command rm -rf ~, deleting all my personal files in the process. This was not the first time, and it was no big deal, because I back up my files with automatic rolling backups. My backup system is secure, redundant, and has low resources requirements. The backup repository is encrypted, deduplicated, compressed, and mirrored across multiple machines. You can choose to use any or none of these features while following this guide.

In this guide, I describe how to set up a secure and robust backup service yourself, which runs on Linux, macOS, and Windows via WSL 2. I provide my own scripts, config files, and workflows for maintaining, validating, and restoring the backups. This is all setup using free software, supports multiple configurations with varying degrees of security and redundancy, and scales well to more backup clients.

If you’d prefer to not set this up yourself and you run macOS or Windows, I recommend Backblaze:

https://www.backblaze.com/cloud-backup.html#af9v9g

They automatically handle everything, including most of the features I want in a backup service and some I could never implement myself, for $6/m per machine (USD).

1 Introduction

2 Install Prerequisite Software

2.1 Backup Software

2.2 Optional GUI for Client

2.3 Mirror Software

3 Initialize the Backup Repository

3.1 Setup Server Environment

3.2 Setup Client-Only Environment

3.3 Create the Encrypted Repository

3.4 Mirror the Client-Only Repository Offsite

4 Configure the Backup Client

4.1 Install Backup Script

4.2 Exclude Extraneous Files From Backup

4.3 Configure Access to the Backup Repository

4.3.1 Client-only Repository Folder

4.3.2 Backup Server via SSH

4.3.3 Least Priviledge for Client SSH Key

5 Configure Mirrors

5.1 Least Priviledge for Mirrors

6 Monitor and Check Backups

6.1 Check Backups are Happening

6.2 Integrity Check the Repository

6.3 Prune Expired Snapshots

6.4 Finding Large Extraneous Files in the Repository

7 Restore from Backups

1 Introduction

This guide will help you set up a backup system that automatically records hourly snapshots, compresses, deduplicates, and encrypts them, enabling a very robust and secure backup system that takes up very little drive space. For example, I four machines backed up with 2.5TB of snapshots stored in 21GB of space, mirrored on machines in multiple locations. It would take an extraordinary event for me to lose data. I’ve successfully recovered GBs of data usually resulting from my own stupidity, and occasionally the result of various tools corrupting files or the whole filesystem.

I describe two main configuration options: (1) client-only, which requires only a single machine but relies on an external service for saving the backups offsite; or (2) a client/server approach that requires access to an always-on server but offers more redundancy. Within these two main configurations, I describe additional configuration measures, such as setting up offsite mirrors for the backup repository, implementing principles of least priviledge to restrict remote access while still automating backups.

At the end, you too will be able to (but probably shouldn’t) use rm -rf without fear, among other benefits.

2 Install Prerequisite Software

2.1 Backup Software

The main backup software is borg.

https://borgbackup.readthedocs.io/en/stable/index.html

borg features automatic compression, deduplication, encryption. It also supports an on-demand backup server via SSH, useful file exclusion methods, and filtering/recreating backup archives for when you realize you backed up something that you didn’t need to and it’s taking up too much space. These features and its superb documentation and easy of use have made it better than every other tool I’ve tried.

Install this on the server and all clients.

For example, on Arch:

pacman -S borg

Or macOS:

brew cask install borgbackup

2.2 Optional GUI for Client

borg has an optional, third-party (still free software) GUI you can install called vorta.

https://vorta.borgbase.com/

If you’re uncomfortable with commandline nonsense, you can to use this on the clients to configure most of what I describe about below. I haven’t used it myself, so you’ll need to figure out the translation from each concept and my scripts to the equivalent in the GUI. The GUIs looks pretty discoverable, though, so this shouldn’t be hard.

2.3 Mirror Software

To make redundant mirrors of your backup repository offsite, you’ll need a tool to synchronize the repository to the mirrors. I own several machines, and treat all of them as mirrors for maximum redundancy without relying on cloud services.

I recommend rclone for this, but alternatives like rsync or unison work well too.

https://rclone.org/

rclone provides rsync like capabilities, but also performs local caching to speed up the computing the delta to be transfered, supports various cloud storage backends, in case you want to sync to ~the cloud~.

Install this on all mirrors.

Arch:

pacman -S rclone

macOS:

brew install rclone

If you’re using a client-only configuration, you can also install this on the client if you wish to synchronize the local repository to a cloud service or secondary machine. However, unless your cloud service features strong and easy to use version control, I recommend installing git instead, as there are some downsides to a client automatically synchronizing a local backup repository without version control. I discuss this in Mirror the Client-Only Repository Offsite.

3 Initialize the Backup Repository

3.1 Setup Server Environment

For the client/server model, the backup server needs:

A name or fixed IP address. I call this backup-server.tld.
An SSH daemon.
A user with SSH access, permission to execute borg, and shell access. I’ll call this user backupd.
A folder this user owns to store the backup repository. I call this folder ~/backups (meaning ~backupd/backups).

3.2 Setup Client-Only Environment

For the client-only model, you only need a folder that the client has read/write access to. I’ll call this folder ~/backups, and call client user client-user.

3.3 Create the Encrypted Repository

Next we need to initialize the backup repository with an encryption key. The backup repository is encrypted at-rest.

Run the following command.

borg init -e repokey ~/backups

You’ll be prompted for a password.

I strongly recommend storing the password in a password manager. borg can automatically read from the password manager using the environment variable BORG_PASSCOMMAND. For example, I use pass as my password manager, and set BORG_PASSCOMMAND="pass show backup-server.tld/borg", which in turn causes gpg-agent to query me or my login keychain for the master password.

You can also set the password as a string the environment variable BORG_PASSPHRASE. For example, if you’re password is "password", you can set BORG_PASSPHRASE="password". You should not do this if the environment variable is stored in a plaintext file.

There are several other initialization options which you can explore if you want to customize encryption levels, disable encryption (don’t do it!), or optimize for hardware acceleration, but I’m happy with the default.

borg init --help

3.4 Mirror the Client-Only Repository Offsite

If you do not have a backup server, we need to set up at least one mirror. We need to make sure the local backup repository is stored somewhere else in the event of a total data loss locally (e.g., a stolen laptop), or a partial data loss that affects the backup repository itself (e.g., a corrupted drive).

Bad solutions include using a file synchronization service such as Dropbox, Google Drive, or OneDrive as a mirror; or automatically synchronizing via rsync, unison, or rclone to a secondary machine. In the event of data loss, an automatic synchronization service could overwrite the remote copy with a completely empty backup repository, totally destroying your backups. Some file-sync services will allow you to restore older versions of a file, which mitigates some of this risk. This is not a good solution unless you’re really sure of the version control.

An acceptable solution is to use a version-controlled file hosting service like GitHub or GitLab to host your backup repository. You can set up a cron job to automatically commit and push the backup repository regularly, tagging each commit in the same way as the archives are tagged. Ideally, the repository should be private, but since it’s encrypted, this is not strictly required. This exposes your data to more risk, as with sufficient resources, a dedicated attacker (such as a corporation or government) could break the encryption. However, such attackers probably aren’t targeting you, and if they are, you might have bigger problems.

To use my suggested method, first make ~/backups a git repo. Run the following commands.

cd ~/backups

git init

git checkout -b main

git add -A

git commit -m "Initilize repo"

Next, add the remote repository:

git remote add -m main origin git@git-repo.tld:client-user/backup-repo.git

Now add a cron job. Run crontab -e and add the following line.

@hourly /home/client-user/bin/sync-local-borg-repo.sh

Finally, install the following script in ~/bin/ for the client:

sync-local-borg-repo.sh

#!/bin/sh

cd ~/backups
git add -A
git commit --fixup HEAD
git tag `hostname`+`date +"%Y-%m-%dT%H_%M_%S"`
git push origin main

And make it executable: chmod +x ~/bin/sync-local-borg-repo.sh.

This method will use considerable client disk space, which is split between the client and server in the client/server configuration. I recommend your regularly prune the git repo, but only do so manually after checking your backups (see Monitor and Check Backups). Setting up an automatic job to prune it risks deleting your backup repository in the event of a data loss. The commit option --fixup HEAD in line 5 makes this easy with the following commands:

env EDITOR=true git rebase --root --autosquash -i

git gc

git push -f origin main

This will squash the entire history of the repo and force push to the remote. Losing the history is not a big deal, since the backup repository is actually keeping hourly snapshots. The git history is only for preventing synchronization from losing data if an automatic push happens after a data loss.

4 Configure the Backup Client

Each backup client needs:

A user with read access to all files included in the backup. I call this user client-user. For me, this is my username on the client machine. In some circumstances, I create a group, backupg, to give this user read access to special files.
A cron daemon of some kind.

To start the backup system, we need to add a script to run automatically backing up files, and exclude any extraneous files. I take the approach of including everything by default, and then manually inspecting archives from time to time for large extraneous files and folders.

4.1 Install Backup Script

I use the following script, which I set to run every hour. Add the following cron job to client-users’s crontab by running crontab -e, and adding:

@hourly /home/client-user/bin/borg-backup.sh

Then install the following script in ~/bin/ for client-user.

borg-backup.sh

#!/bin/sh

## borg-backup.sh

## Usage:
# run `borg-backup.sh`
#
# Optional environment variable inputs:
# - TAG     By default, the tag for the archive is set using the hostname of the
#           client machine. To manually set a tag, set the environment variable
#           `TAG` prior to running, e.g., `env TAG="manual-tag+"
#           borg-backup.sh`.
# - WAIT    The wait time in seconds to obtain a write lock on the repository from
#           the server. By default, 600 seconds (10 minutes).

## Configuration

# Set to the location of the backup repository.
# Can be a remote directory, using SSH, or a local directory.
# Make sure the SSH agent and/or SSH key is readable by the backup daemon,
# and the remote location is accessible by a key in the ssh-agent or configured
# in .ssh/config.
#
# Example: REPO="backupd@backup-server.tld:backups"
# Example: REPO="~/backups"
REPO="borg-server:backups"

# Set the password or passcommand for encrypted repositories.
export BORG_PASSCOMMAND='pass show backup-server.tld/borg'

## Create auxiliary files to be part of the backup.

# Export the installed package list from the package manager, so it can be backed up.
mkdir -p /tmp/pacman-local/
echo "# Pipe to pakku -S to reinstall" > /tmp/pacman-local/pacman.lst
pacman -Qenq >> /tmp/pacman-local/pacman.lst
pacman -Qemq >> /tmp/pacman-local/pacman.lst

## Create a new backup archive.
# Add additional files to backup as needed.
borg create \
     -C lzma,9 \
     -c 60 \
     --exclude-from ~/borg-exclude \
     --exclude-if-present '.borg-ignore' \
     --lock-wait ${WAIT:-600} \
     $REPO::'{hostname}+'${TAG:-}'{now:%Y-%m-%dT%H:%M:%S}' \
     /tmp/pacman-local/ \
     /etc/sysctl.d \
     /etc/modprobe.d \
     /etc/makepkg.conf \
     /etc/pacman.conf \
     /etc/fstab \
     /etc/X11 \
     ~/

Make it executable with chmod +x ~/bin/borg-backup.sh.

There are two necessary configuration steps:

Change the REPO variable to point to your backup repository. If you’re using a client-only model, this is the path to the backup repository ~/backups. If you’re using a server, you can enter the SSH address and path, or configure the .ssh/config file as discussed later.
Change the export BORG_PASSCOMMAND to export your password manager command, or change the line to export BORG_PASSPHRASE to export the password string as described earlier. You really shouldn’t use BORG_PASSPHRASE since this stores the password in plaintext, but I suppose if your hard drive is encrypted, and the backup script is only stored on the client, it’s probably fine. Ish.

You’ll probably also want to change the list of files that are included in the snapshot. I include my list for reference, which assumes an Arch Linux machine and includes some of my customized root config files.

The script is documented with its major features, but I’ll explain the borg command in more detail.

The option -C lzma,9 enables LZMA compression level 9 (maximum compression). This slows down archive creation but decreases the archive size substantially. In my experience, my snapshots take about a minute to create and upload to the server, so I’m fine with max compression.
The option -c 60 tells borg to create a checkpoint every 60 seconds, saving a partial backup if the backup process is interrupted. This can happen if you’re running on a laptop that goes to sleep in the middle of the backup, for example. I choose 60 seconds since most of my snapshots only take that long, so any longer might indicate a real change to keep track of.
The option --exclude-from ~/borg-exclude excludes any files that match the pattern specification found in the file ~/borg-exclude. I use this file to filter common files, such as compiler generated files. I share this file in Exclude Extraneous Files From Backup.
The option --exclude-if-present '.borg-ignore' excludes the directory from the backup if there is a file named .borg-ignore in that directory. I use this for excluding directories that don’t neatly fit some pattern in borg-exclude, such as large git repos that I contribute to infrequently but don’t manage, or cache or temporary directories.
The option --lock-wait specifies how long to wait for a lock. Only one client can write to the backup repository at a time. I use 10 minutes as a default; my clients usually only take a minute or so to finish running a backup, so waiting 10 minutes should be enough for all clients to finish if there’s contention.
Line 47, $REPO::'{hostname}+' ..., tells borg where the backup repository is located (before the ::), and what the backup archive should be named. I name the archive using the hostname of the client, followed by + as a delimiter, followed optionally by some tag, followed by a timestamp. This naming scheme makes it easy to sort and filter backups when validating backups or searching for a restore point.
The remaining lines are files or directories to include in the backup archive. All files and sub-directories, recursively, are includes, unless excluded by one of the above exclude options.

4.2 Exclude Extraneous Files From Backup

My ~/borg-exclude file is below. Install this file in ~/ on the client; it only needs read permissions for client-user.

borg-exclude

re:/\.ssh
re:/\.bash_history
.zsh_*
re:/no-backup/
re:/\.junk/
re:/\.cron/
re:workspace/aur4/.*/pkg
re:workspace/aur4/.*/src
re:compiled/
*.tar.xz
*.tar.gz
*/.emacs.d
*/.unison/fp*
*/.unison/ar*
*/.vim/bundle
*~
.*.trash
*.aux
*.log
*.out
*.toc
*.fls
*.swp
*.class
*.pyc
*.fdb_latexmk
*.o
*.out
*.xpi
*.zo
*.dep
*.vo
*.glob
*.bbl
*.safe
*.agdai
*.hi
*.tdo
re:\.mutt/cache
re:\.mutt/sent
re:workspace/.*/paper.pdf
re:workspace/.*/techrpt.pdf
re:workspace/.*/final.pdf
*/retex-cache/*
re:\.gnupg/S\..*
re:\.~lock.*\.odp#
y
re:/Pictures/.*/\._
re:/Pictures/.*/\.comments
*.DS_Store

This configuration file accepts exclude patterns, one per line. Each exclude pattern can be either a shell glob or regexp pattern prefixed by re:. I exclude lots of generated files patterns, certain mail folders, and files or folders that are tracked by other systems. Some depend on my workflows and naming conventions, so they might not be relevant to you.

If I want to exclude some folder that doens’t neatly fit a pattern, I run touch path/to/folder/.borg-ignore, and borg will automatically begin ignoring it due to the --exclude-if-present option in borg-backup.sh.

Be sure to run touch ~/backups/.borg-ignore. This will prevent you from DOSing yourself if either you use a client-only configuration, or if your clients are also mirrors.

4.3 Configure Access to the Backup Repository

Finally, we need to make sure the backup script has uninterrupted access to the backup repository.

4.3.1 Client-only Repository Folder

If you’re using a client-only configuration, you’re done!

4.3.2 Backup Server via SSH

If you’re running a separate server, we’ll configure SSH access. Ideally, we don’t even want to be prompted for an SSH key password to ensure backups are running uninterrupted. (Although, I do deal with this on one of my clients, because I haven’t configured the keychain to cache the SSH key while logged in.)

I recommend configuring access through the .ssh/config file, and either a keychain that caches your SSH key that you use everywhere (probably acceptable security), or a fresh passwordless SSH key the provides client-user restricted access to borg as the backupd user on backup-server.tld (better practice security).

I’ll assume you have a fresh passwordless private key called ~/.ssh/id_rsa-borg-client paired with the public key ~/.ssh/id_rsa-borg-client.pub on the client machines. You can generate a fresh passwordless key-pair with:

ssh-keygen -t rsa -b 4096 -C "borg client" -f /home/client-user/.ssh/id_rsa-borg-client -P ""

Make sure to set the permissions correctly, restricting access to the private key.

chmod 600 ~/.ssh/id_rsa-borg-client

Add the following snippet to your .ssh/config, and the borg-backup.sh will automatically use the SSH key ~/.ssh/id_rsa-borg-client on the client machine when connecting as backupd to the backup-server.tld.

Host borg-server

Hostname backup-server.tld

IdentityFile ~/.ssh/id_rsa-borg-client

User backupd

ForwardAgent no

4.3.3 Least Priviledge for Client SSH Key

If you want to follow better practice security, you should restrict access for the id_rsa-borg-client key so it has only the permission it needs: to communicate with the borg server. Add the following line to ~/.ssh/authorized_keys for backupd on the server, replacing <id_rsa-borg-client.pub> by the contents of the public key ~/.ssh/id_rsa-borg-client.pub from the client.

command="/home/backupd/.ssh/ssh-borg-serve.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-client.pub>

Next, install the following file in ~/.ssh/ on the server and give it execute permissions with chmod +x ~/.ssh/ssh-borg-serve.sh.

ssh-borg-serve.sh

#!/bin/sh

set -f

case "$SSH_ORIGINAL_COMMAND" in
    "borg serve"*)
        exec $SSH_ORIGINAL_COMMAND
        ;;
#   "/usr/lib/ssh/sftp-server")
#       exec /usr/lib/ssh/sftp-server -R
#       ;;
    *)
        echo "Invalid command $SSH_ORIGINAL_COMMAND"
        exit 1
        ;;
esac

This will allow the key id_rsa-borg-client to run only a command starting with borg serve, which launches the borg server. If an attacker gets your id_rsa-borg-client key, they can launch the borg server, but without the backup repository password, they won’t be able to do anything.

The second, commented out, command would enable the client to launch a read-only SFTP server. This is useful for making all clients mirrors. However, allowing the client key to also use the SFTP server violates the principle of least privilege, and you should instead configure a separate mirror key as described in Configure Mirrors. An attacker with SFTP access would be able to download the encrypted repository, and possibly read other files on the server.

5 Configure Mirrors

Having backups stored offsite is good, but what if the server goes down, or is struck by a meteor? It’s best to have not only offsite backups, but redundant offsite backups. Thankfully, this is easy to support. Particularly, if you, like me, have too many computers: a laptop, a desktop, a media server, a VPS, and a work computer... mirrors galore!

On each mirror, we configure rclone with the server as a remote. Add the following to ~/.config/rclone/rclone.conf on the mirror.

[borg-server]

type = sftp

host = backup-server.tld

user = backupd

port =

pass =

key_file = id_rsa-borg-mirror

md5sum_command = md5sum

sha1sum_command = sha1sum

This tells rclone how to connect to the server via SFTP. Following principle of least privilege, we’ll need a new key pair for the mirror.

ssh-keygen -t rsa -b 4096 -C "borg mirror" -f /home/client-user/.ssh/id_rsa-borg-mirror -P ""

chmod 600 ~/.ssh/id_rsa-borg-mirror

And we need to install and restrict the key on the server. Add the following line to the ~/.ssh/authorized-keys file on the server.

command="/home/backupd/.ssh/ssh-borg-mirror.sh",no-pty,no-agent-forwarding,no-port-forwarding <id_rsa-borg-mirror.pub>

Next, install the following file ~/.ssh/ on the server and give it execute permissions with chmod +x ~/.ssh/ssh-borg-mirror.sh.

ssh-borg-mirror.sh

#!/bin/sh

set -f

case "$SSH_ORIGINAL_COMMAND" in
    "/usr/lib/ssh/sftp-server")
        exec /usr/lib/ssh/sftp-server -R
        ;;
    *)
        echo "Invalid command $SSH_ORIGINAL_COMMAND"
        exit 1
        ;;
esac

This restricts the mirror’s key so it can only be used to launch the SFTP server in read-only mode.

Finally, set up a cron job to mirror the repository. Run crontab -e on the mirror and enter:

@hourly rclone sync borg-server:backups ~/backups

rclone will perform a one-way sync from the server to the mirror every hour. rclone uses a delta transfer algorithm with caching. It’s faster than rsync, but with the same low-bandwidth transfer. It also supports more backends than rsync, so you can set up additional mirrors to cloud services like Dropbox, Google Drive, etc, if you want.

Now when a meteor strikes your server just after a burglar stole your laptop, you’ll still have your data. Setup LOTS of mirrors for extra redundancy.

5.1 Least Priviledge for Mirrors

I know it seems like we already did this with the whole read-only SFTP server, but that’s not enough. Right now, an attacker compromising the mirror key can read any file that backupd has access to. That’s no good. Better security practice would be to configure the SSH daemon to chroot the mirror to the ~/backups directory, so they can only read this folder. Recall this folder is encrypted, so an attacker compromising the mirror SSH key still has to break the encryption to get anything.

Unfortunately, this requires root access on the server, reconfiguring the SSH daemon, and creating and managing multiple user and group permissions, which you may be unable or unwilling to do.

To chroot the mirror, we need a second user on the server, which I’ll call mirrord. The ssh-borg-mirror.sh script and addition to authorized_keys we added to backupd above should be thrown out, as we require a different configuration to chroot.

Next, we need a new group, mirrorg, to provide mirrord read access to the directory ~backupd/backups, owned by backupd.

groupadd mirrorg

gpasswd -a mirrord mirrorg

Now we set the group on ~/backups to mirrorg, and provide the group read access. As user backupd, run the following commands.

chgrp ~backupd/backups

chmod g+r -R ~backupd/backups

We need to modify the ssh-borg-serve.sh script (owned by backupd) to maintain the group-read permission. Change the file using the following diff.

- exec $SSH_ORIGINAL_COMMAND

+ exec borg serve --umask=027

This will force the borg server to provide read permissions to mirrorg when writing to the backup repository.

Now, modify the SSH daemon to chroot the mirrord user. As root on the server, add the following to /etc/ssh/sshd_config.

Match User mirrord

ChrootDirectory ~backups/backupd

ForceCommand internal-sftp -R

AllowTcpForwarding no

X11Forwarding no

PasswordAuthentication no

Finally, add the following line to ~/.ssh/authorized_keys for mirrord.

<id_rsa-borg-mirror.pub>

Note that we do not require any restrictions, since the SSH daemon is already restricting mirrord.

Now you have a pretty secure mirror.

6 Monitor and Check Backups

6.1 Check Backups are Happening

Backups are no good if you can’t restore from them. I have a weekly reminder to check on my backups. To check, I run borg list -P machine-name+ on the repository machine (server, or client-only), which lists the backups for the machine with hostname "machine-name". I check to see that hourly backups are being created for each client. If they aren’t, the daemon on that client may not be working for some reason.

6.2 Integrity Check the Repository

Every month of so, I run borg check ~/backups. This runs some integrity checks on the whole repository, and can take a while. I recommend running it in a screen session so you can disconnect and check back on it later. I’ve never had any integrity problems.

6.3 Prune Expired Snapshots

I don’t want to keep hourly snapshots forever. I have a policy for expiring backups, and a script for doing it. I keep hourly snapshots for the last 24 hours, daily snapshots for the last week, weekly snapshots for the last month, and monthly snapshots forever. With deduplication and my workload, this strikes a good balance between data recovery and minimizing the repository size.

Each week after checking my backups, I run the following script to prune any expired snapshots:

borg-prune.sh

#!/bin/sh

# borg-prune.sh

## Usage
# - borg-prune.sh machine-name           Perform a pruning dry-run, seeing what
#                                        would be pruned.
# - borg-prune.sh machine-name --wet     Perform a non-dry run.

REPO=$HOME/backups

DRY_RUN="-n"
if [[ "$2" == "--wet" ]]; then
    echo "Pruning..."
    DRY_RUN=""
fi

borg prune --list $REPO --prefix "$1+" \
     --keep-hourly 24 \
     --keep-daily 7 \
     --keep-weekly 4 \
     --keep-monthly -1 \
     --keep-yearly -1 \
     $DRY_RUN \
     -v

6.4 Finding Large Extraneous Files in the Repository

Sometimes, a large file will get backed up and make the repository unnecessary large. A few times, I’ve accidental backed up the entire repository in itself, DOSing my VPS by filling the drive.

borg makes it sort of easy to find these mistakes.

On the repository machine, run borg info -P machine-name+ to get a print out of the size of each archive for machine-name. When one of the archives prints out as suddenly larger, that’s usually a good target. Copy that archive name; I’ll call it $archive_name.

Next, we mount the archive to see what files are too large. Run the following commands on repository machine.

mkdir -p /tmp/borg

borg mount ~/backups::$archive_name

Now we can explore the mounted archive to find large files. I run the command the following command, which I alias as ducks in my shell.

du -sch * .* | sort -rn | head

This will print out a list of the 10 largest files or folders in the current directory. You might need to exclude the .* pattern if there are no hidden files.

I then follow the large directories until I find a likely looking file; call it /path/to/large-unnecessary-file.

Once we find a file, we want to exclude it from further backups and remove it from existing backups. I add it to the borg-exclude patterns or add a .borg-ignore file as appropriate. Then, I run the following loop to recreate and filter all archives. This loop is in fish syntax; you’ll need to figure out loops in your shell on your own, because I’ve never figured out how to write a shell loop properly.

I’ve never had any problems, but you should backup your repository before running borg recreate. Use rclone to put it anywhere else, at least temporarily.

for archive in (borg list --lock-wait 600 -P machine-name+ ~/backups | cut -f 1 -d ' ')

yes YES | borg recreate --lock-wait 600 -C lzma,9 -s --exclude "/path/to/large-unnecessary-file" backups::$archive

end

This is considered experimental, so it requires that you confirm each recreation by typing "YES". I just pipe yes YES because I like to live on the edge, and have mirrors of this repository if I break something.

borg recreate can take multiple --exclude flags if you find multiple files you want removed. It will also recompress the archive, so you can specify new and different compression options with -C, if you want to change the compression algorithm.

Now the file should be excluded from all existing archives.

7 Restore from Backups

In the likely event that you need to restore from backups, run borg list -P machine-name+ to list the archives available for machine-name. This will give you a list of archive names on the left, with some metadata on the right. Copy and paste the name for the archive you want to restore from; I’ll call this $archive_name.

Next, we mount that archive. Running the following commands, which will create a temporary mount point and mount the archive.

mkdir -p /tmp/borg

borg mount ~/backups::$archive_name

You can now see all your backed-up files in /tmp/borg.

Next, from the client, copy over your files:

rsync -avz --progress backupd@backup-server.tld:/tmp/borg/ /

Table of Contents

1 Introduction

2 Install Prerequisite Software

2.1 Backup Software

2.2 Optional GUI for Client

2.3 Mirror Software

3 Initialize the Backup Repository

3.1 Setup Server Environment

3.2 Setup Client-Only Environment

3.3 Create the Encrypted Repository

3.4 Mirror the Client-Only Repository Offsite

4 Configure the Backup Client

4.1 Install Backup Script

4.2 Exclude Extraneous Files From Backup

4.3 Configure Access to the Backup Repository

4.3.1 Client-only Repository Folder

4.3.2 Backup Server via SSH

4.3.3 Least Priviledge for Client SSH Key

5 Configure Mirrors

5.1 Least Priviledge for Mirrors

6 Monitor and Check Backups

6.1 Check Backups are Happening

6.2 Integrity Check the Repository

6.3 Prune Expired Snapshots

6.4 Finding Large Extraneous Files in the Repository

7 Restore from Backups