Operation Details

This section describes the operations that Backrest can be configured to perform on your behalf in detail.

Overview

Backrest executes commands by forking the restic binary. Each Backrest version is validated against a specific version of restic. On startup Backrest searches for a versioned restic in its data directory (typically ~/.local/share/backrest), followed by /bin/. The restic binary must be named restic-{VERSION}. You can override restic command by setting BACKREST_RESTIC_COMMAND env variable when starting Backrest. Otherwise, if no binary is found Backrest will download and install a recent version of restic from restic's github releases. When downloading a restic binary, the download is verified by checking the sha256sum of the downloaded binary against the sha256sum provided by restic and signed by the restic maintainers GPG key.

When running restic commands, Backrest injects the environment variables configured in the repo into the environment of the restic process and it appends the flags configured in the repo to the command line arguments of the restic process. Logs are collected for each command. In the case of an error, Backrest captures the last ~500 bytes of output and displays this directly in the error message (the first and last 250 bytes are shown if the output is longer than 500 bytes). Logs of the command are typically also available by clicking [View Logs] next to an operation, these logs are truncated to 32KB (with the first and last 16KB shown if the log is longer than 32KB).

Scheduling Operations

Operations run on configurable schedules and all support the same collection of scheduling policies that provide flexible behavior.

Available scheduling policies are:

  • Disabled - the operation is disabled and will not run.
  • Cron - a cron expression specifying when to run the operation, this allows you to specify tasks that run at specific times down to the minute with detailed control e.g. 0 0 * * * to run daily at midnight. See Cron Syntax for more information.
  • Interval Days - the interval in days at which the operation should be run. This is useful for running tasks at a regular interval when you don't care about the exact run time. Tasks will run no more frequently than the specified interval.
  • Interval Hours - the interval in hours at which the operation should be run. This is useful for running tasks at a regular interval when you don't care about the exact run time. Tasks will run no more frequently than the specified interval.

In addition to the scheduling policy, a schedule also specifies a clock which the policy is evaluated relative to:

  • Local - the schedule is evaluated against the current wall-clock time in the local timezone.
  • UTC - the schedule is evaluated against the current wall-clock time in UTC. (Only behaves differently from Local when using cron expressions).
  • Last Run Time - the schedule is evaluated relative to the last time the task ran. This can be useful for ensuring tasks aren't skipped if the schedule is missed for some reason (e.g. on a laptop that is frequently asleep). It also important to use this clock for tasks that run very infrequently (e.g. check and prune health check operations) to ensure they aren't skipped.
Backup operations are scheduled under plan settings in the UI. Good practice is to use the "Local" clock if running hourly or more frequently. Use the "Last Run Time" clock if running daily or less frequently.

Prune and Check operations are scheduled under repo settings in the UI. Good practice is to run these operations infrequently (e.g. every 30 days if using "Interval Days" or on the 3rd of the month, for example, if using "cron" scheduling"). Because health operations will be infrequent, it is recommended to use the "Last Run Time" clock to ensure they are not skipped.

Types of Operations

Backup

Restic docs on backup

Backups are scheduled under plan settings in the UI. A backup operation creates a snapshot of your data and sends it to a repository. The snapshot is created using the restic backup command.

As the backup runs Backrest will display the progress of the backup operation in the UI. The progress information is parsed from the JSON output of the restic backup command.

The backup flow is as follows

  • Hook trigger: CONDITION_SNAPSHOT_START, if any hooks are configured for this event they will run.
    • If any hook exits with a non-zero status, the hook's failure policy will be applied (e.g. cancelling or failing the backup operation).
  • The restic backup command is run. The newly created snapshot is tagged with the ID of the plan creating it e.g. plan:{PLAN_ID}.
  • On backup completion
    • The summary event is parsed from the backup and is stored in the operation's metadata. This includes: files added, files changed, files unmodified, total files processed, bytes added, bytes processed, and most importantly the snapshot ID.
    • If an error occurred: hook trigger CONDITION_SNAPSHOT_ERROR, if any hooks are configured for this event they will run.
    • If successful: hook trigger CONDITION_SNAPSHOT_SUCCESS, if any hooks are configured for this event they will run.
    • Finally CONDITION_SNAPSHOT_END is triggered irrespective of success or failure, if any hooks are configured for this event they will run. This condition is always triggered even if the backup failed.
  • If a retention policy is set (e.g. not None) a forget operation is triggered for the backup plan.

Created backups are tagged with some Backrest specific metadata:

  • plan:{PLAN_ID} - the ID of the plan that created the snapshot. This is used to group snapshots by plan in the UI.
  • created-by:{INSTANCE_ID} - the unique ID of the Backrest instance that created the snapshot. This is used to group snapshots by instance in the UI. Notably, this not necessarily the same as the hostname tracked by restic in the snapshot.

Forget

Restic docs on forget

Forget operations are scheduled by the "forget policy" configured in plan settings in the UI. Forget operations run after backups. A forget operation marks old snapshots for deletion but does not remove data from storage until a prune runs. The forget operation is run using the restic forget --tag plan:{PLAN_ID} command.

Retention policies are mapped to forget arguments:

  • By Count maps to --keep-last {COUNT}
  • By Time Period maps to the --keep-{hourly,daily,weekly,monthly,yearly} {COUNT} flags

Prune

Restic docs on prune

Prune operations are scheduled under repo settings in the UI. A prune operation removes data from storage that is no longer referenced by any snapshot. The prune operation is run using the restic prune command. Prune operations apply to the entire repo and will show up under the _system_ plan in the Backrest UI.

Prunes are run in compliance with a prune policy which configures a schedule and a max unused percent. The max unused percent is the percentage of data that may remain unreferenced after a prune operation. The prune operation will repack or delete unreferenced data until the repo falls under this limit, if it already is it's possible that a prune will complete immediately.

Prune operations are costly and may read a significant portion of your repo. Prune costs are mitigated by running them infrequently (e.g. monthly or every 30 days), and by using a higher max unused percent value (e.g. 5% or 10%). A higher max unused percent value will result in more data being retained in the repo, but will reduce the need to repack partially unreferenced data.

Check

Restic docs on check

Check operations are scheduled under repo settings in the UI. A check operation verifies the integrity of the repository. The check operation is run using the restic check command. Check operations apply to the entire repo and will show up under the _system_ plan in the Backrest UI.

Checks are configured by a schdule determining when they run, and a single argument read data % which determines the percentage of the repository that should be read during a check operation. Irrespective of read data%, the structure of the repo will always be verified in entirety. Reading data back verifies the hashes of pack files on disk and may detect unreliable storage (e.g. an HDD running without parity). It typically does not provide much value for a reliable cloud storage provider and can be set to a low percentage or disabled.

A value of 100% for read data% will read/download every pack file in your repository. This can be very slow and, if your provider bills for egress bandwidth, can be expensive. It is recommended to set this to 0% or a low value (e.g. 10%) for most use cases.