Automatic indi-allsky backup on external drive (Fritzbox NAS) – Phase 2: Files

After the database, configuration and migration of indi-allsky have already been automatically backed up to an external drive, the next step was to work on the actual files: the night images from which Timelapse, Keogramm and Startrails will later be created.

The basis for this is phase 1 of the backup, which I have described here:

Automatic indi-allsky backup on external drive (Fritzbox NAS) – Phase 1

Phase 2 is much more demanding. While database backups are comparatively small and deterministic, with image data we are quickly talking about several gigabytes per night – and about data that is not all equally valuable.

Why not simply “back up everything”?

An all-sky system produces images every night – regardless of whether the sky is clear, partially cloudy or completely unusable. If you backed up all the files blindly, even a large external hard disk would fill up very quickly.

That’s why it was clear from the start: the backup must be quality-based.
Not every night is equally valuable – and with indi-allsky this can be derived very well from the database.

And this is how I proceeded…

Quality criteria directly from the indi-allsky database

indi-allsky stores per image, among other things

  • the number of stars detected(stars)
  • whether moon mode was active(moonmode)
  • whether the image was taken at night

I calculate this for each night:

  • the average number of stars
  • the average proportion of the moon

Only nights that meet these minimum criteria are saved at all. Bad nights automatically fall through the grid.

To do this, I first looked at how many stars were detected per image on particularly clear nights to get a feeling of what is “gigantic” and what is “quite okay” and used this to determine the limit values.

Only the original photos and timelapses are saved, no keograms and start rails (both are commented out in the script!) and thumbnails (these can be easily regenerated in indi-allsky using a script).

“Horny nights” vs. rolling nights

The real data very quickly resulted in a clear separation:

  • Horny nights: Ø ≥ 1000 stars on average → keep permanently
  • Okay nights: good but not perfect conditions → may be deleted later

This classification is permanently stored in the backup for each night and is the basis for later retention.

Hard check – is the number of files correct?

A backup is only a backup if it is complete.
That is why there is a hard verification step after every night of backup:

Number of images according to the database = number of files in the backup

If these figures do not match exactly, the backup is deemed to have failed.
There is no such thing as “almost complete”.

This prevents that:

  • incomplete nights are silently accepted
  • Retention is based on faulty backups
  • Data loss only becomes apparent weeks later

Dynamic retention instead of fixed limits

As database backups are also stored on the same drive, retention is deliberately dynamic:

  • Deletion from > 80 % disk occupancy
  • or if less than 20 GB is free

This means that the script works regardless of whether someone uses a small SSD or a large external HDD.

Only the following are deleted:

  • verified rolling nights
  • never the last (still running) night
  • never nights classified as “horny

Every deletion is documented by e-mail – including night, number of stars, proportion of moon and released storage space.

Automatic confirmation e-mail

After each backup (manual or via cronjob), an e-mail is sent with statistics – including the size of the backup, the best starry night to date (measured by the average number of stars), free storage space and whether anything has been deleted:

indi-allsky batch backup completed

Host: allsky
Time: Sun Dec 28 00:01:30 CET 2025

Nights backed up:
2025-12-24 | 1902 images | 2915 MB images | Ø stars 1170.4
2025-12-25 | 1854 images | 3038 MB images | Ø stars 1509.3
2025-12-26 | 1677 images | 2783 MB images | Ø stars 1483.3

Timelapse total:
501 MB

Greatest night so far:
2025-12-25 | Ø stars 1509.3

Backup HDD occupancy:
Total: 232G
Used: 11G (5%)
Free: 221G

The complete backup script (phase 2)

The following script can be used productively and can be adopted 1:1.
A dry run is possible via --dry-run.

/bin/bash
set -euo pipefail

# =========================================================
# Configuration
# =========================================================
DB="/var/lib/indi-allsky/indi-allsky.sqlite"
SRC_BASE="/var/www/html/allsky/images"

BACKUP_MOUNT="/mnt/backup_allsky"
BACKUP_BASE="$BACKUP_MOUNT/backup_allsky"
NIGHTS_DIR="$BACKUP_BASE/nights"
TL_DIR="$BACKUP_BASE/timelapse"

LOG="/var/log/backup_allsky_phase2_batch.log"
MAIL_TO="mail@domain.tld"
# Customize
HOST="$(hostname)"
LOCKFILE="/var/lock/backup_allsky_phase2.lock"

START_DATE="2025-12-24" # optional, leave empty for all: START_DATE=""

# criteria
MIN_AVG_STARS_PRIMARY=700
MIN_AVG_STARS_SECONDARY=500
MIN_MOON_PCT_SECONDARY=50
GREAT_NIGHT_MIN_AVG_STARS=1000
# adjust if necessary

# Retention
RETENTION_MAX_USED_PCT=80
# adjust if necessary

DRY_RUN=0
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=1

RSYNC_OPTS=(-a)
[[ "$DRY_RUN" -eq 1 ]] && RSYNC_OPTS=(-a --dry-run)

exec >>"$LOG" 2>&1

# =========================================================
# Error reporting (so no more "start -> end" without info)
# =========================================================
fail() {
  local rc=$?
  {
    echo "indi-allsky Batch-Backup FAILED"
    echo "Host: $HOST"
    echo "Time: $(date)"
    echo "Exit code: $rc"
    echo "Line: ${BASH_LINENO[0]}"
    echo "Command: ${BASH_COMMAND}"
    echo
    echo "Last log lines:"
    tail -n 80 "$LOG" || true
  } | mail -s "indi-allsky Batch-Backup ERROR ($HOST)" "$MAIL_TO"
  exit "$rc"
}
trap fail ERR

# =========================================================
# Helper
# =========================================================
disk_used_pct() { df -P "$BACKUP_MOUNT" | awk 'NR==2 {gsub("%","",$5); print $5}'; }
disk_human() { df -h "$BACKUP_MOUNT" | awk 'NR==2 {print $2, $3, $4, $5}'; }

echo "=== Phase 2 batch start: $(date) ==="

# =========================================================
# Locking (no parallel run)
# =========================================================
exec 9>"$LOCKFILE" || exit 1
if ! flock -n 9; then
  echo "Lock active, terminate: $(date)"
  exit 0
fi

# =========================================================
# Ensure backup mount
# =========================================================
mountpoint -q "$BACKUP_MOUNT" || mount "$BACKUP_MOUNT"
mountpoint -q "$BACKUP_MOUNT" || { echo "ERROR: $BACKUP_MOUNT not mounted"; exit 1; }

mkdir -p "$NIGHTS_DIR" "$TL_DIR"

# =========================================================
# Determine active night camera (from DB)
# =========================================================
ACTIVE_CCD="$(
  sqlite3 -readonly -noheader "$DB" \
  "SELECT substr(filename,1,instr(filename,'/')-1)
   FROM image
   WHERE night=1 AND exclude=0
   ORDER BY dayDate DESC, filename DESC
   LIMIT 1;"
)"
[[ -z "$ACTIVE_CCD" ]] && { echo "ERROR: No active night camera found."; exit 1; }

SRC_CCD="$SRC_BASE/$ACTIVE_CCD"

# =========================================================
# Determine nights
# =========================================================
DATE_FILTER=""
[[ -n "${START_DATE:-}" ]] && DATE_FILTER="AND dayDate>='$START_DATE'"

mapfile -t NIGHTS < <(
  sqlite3 -readonly -noheader "$DB" "
    SELECT DISTINCT dayDate
    FROM image
    WHERE night=1 AND exclude=0 $DATE_FILTER
    ORDER BY dayDate;
  "
)

LAST_NIGHT="${NIGHTS[-1]:-}"

MAIL_LINES=""
BEST_NIGHT=""
BEST_STARS="0"
TL_TOTAL_MB=0

# =========================================================
# Processing per night
# =========================================================
for NIGHT in "${NIGHTS[@]}"; do
  # Separate stats cleanly with | and read in robustly
  IFS='|' read -r AVG_STARS MOON_PCT IMG_COUNT <<<"$(
    sqlite3 -readonly -noheader -separator '|' "$DB" "
      SELECT
        COALESCE(ROUND(AVG(stars),1),0),
        COALESCE(ROUND(100.0*AVG(moonmode),1),0),
        COUNT(*)
      FROM image
      WHERE night=1 AND exclude=0 AND dayDate='$NIGHT'
        AND filename LIKE '$ACTIVE_CCD/%';
    "
  )"

  AVG_STARS="${AVG_STARS:-0}"
  MOON_PCT="${MOON_PCT:-0}"
  IMG_COUNT="${IMG_COUNT:-0}"

  # Backup decision (bc is guaranteed to get numbers)
  SECURE_NIGHT=0
  if (( $(echo "$AVG_STARS >= $MIN_AVG_STARS_PRIMARY" | bc -l) )); then
    SECURE_NIGHT=1
  elif (( $(echo "$AVG_STARS >= $MIN_AVG_STARS_SECONDARY && $MOON_PCT >= $MIN_MOON_PCT_SECONDARY" | bc -l) )); then
    SECURE_NIGHT=1
  fi

  if [[ "$SECURE_NIGHT" -eq 0 ]]; then
    continue
  fi

  # Destination
  NIGHT_DST="$NIGHTS_DIR/$NIGHT"
  mkdir -p "$NIGHT_DST"

  # Files from DB (only active camera)
  mapfile -t FILES < <(
    sqlite3 -readonly -noheader "$DB" "
      SELECT filename
      FROM image
      WHERE night=1 AND exclude=0 AND dayDate='$NIGHT'
        AND filename LIKE '$ACTIVE_CCD/%';
    "
  )

  # Copy
  for FILE in "${FILES[@]}"; do
    SRC="$SRC_BASE/$FILE"
    DST="$NIGHT_DST/$FILE"
    mkdir -p "$(dirname "$DST")"
    rsync "${RSYNC_OPTS[@]}" "$SRC" "$DST"
  done

  # Size images (only exposures, no thumbnails)
  IMG_MB="$(du -sm "$NIGHT_DST/$ACTIVE_CCD/exposures" 2>/dev/null | awk '{print $1+0}')"
  IMG_MB="${IMG_MB:-0}"

  # Coolest night
  if (( $(echo "$AVG_STARS > $BEST_STARS" | bc -l) )); then
    BEST_STARS="$AVG_STARS"
    BEST_NIGHT="$NIGHT"
  fi

  # Manifest (for retention protection)
  CLASS="ROLLING"
  if (( $(echo "$AVG_STARS >= $GREAT_NIGHT_MIN_AVG_STARS" | bc -l) )); then
    CLASS="FOREVER"
  fi

  cat >"$NIGHT_DST/manifest.txt" </dev/null | awk '{print $1+0}')"
TL_TOTAL_MB="${TL_TOTAL_MB:-0}"

# =========================================================
# Retention (nur wenn >80% used)
# - never deletes FOREVER
# - never deletes the last night
# =========================================================
USED_PCT="$(disk_used_pct)"
if (( USED_PCT >= RETENTION_MAX_USED_PCT )); then
  while (( USED_PCT >= RETENTION_MAX_USED_PCT )); do
    CANDIDATE="$(
      ls -1 "$NIGHTS_DIR" 2>/dev/null | sort | while read -r N; do
        [[ -z "$N" ]] && continue
        [[ "$N" == "$LAST_NIGHT" ]] && continue
        [[ -f "$NIGHTS_DIR/$N/manifest.txt" ]] && grep -q "^class=FOREVER$" "$NIGHTS_DIR/$N/manifest.txt" && continue
        echo "$N"
        break
      done
    )"

    [[ -z "$CANDIDATE" ]] && break

    SIZE_H="$(du -sh "$NIGHTS_DIR/$CANDIDATE" | awk '{print $1}')"

    if [[ "$DRY_RUN" -eq 1 ]]; then
      echo "[DRY-RUN] Retention would delete: $CANDIDATE ($SIZE_H)"
    else
      rm -rf "$NIGHTS_DIR/$CANDIDATE"
      {
        echo "indi-allsky Retention: Night deleted"
        echo
        echo "Host: $HOST"
        echo "Time: $(date)"
        echo
        echo "Deleted: $CANDIDATE"
        echo "Released: $SIZE_H"
        echo
        echo "Disk (Total Used Free Usage%):"
        disk_human
      } | mail -s "indi-allsky Retention: night $CANDIDATE deleted ($HOST)" "$MAIL_TO"
    fi

    USED_PCT="$(disk_used_pct)"
  done
fi

# =========================================================
# Closing mail
# =========================================================
read -r SIZE USED AVAIL PERC <<<"$(df -h "$BACKUP_MOUNT" | awk 'NR==2 {print $2,$3,$4,$5}')"

mail -s "indi-allsky batch backup completed" "$MAIL_TO" <<EOF
indi-allsky batch backup completed

Host: $HOST
Time: $(date)

Nights backed up:
${MAIL_LINES:-}

Timelapse total:
$TL_TOTAL_MB MB

Greatest night so far:
${BEST_NIGHT:-} | Ø stars ${BEST_STARS:-0}

Backup HDD occupancy:
Total: $SIZE
Used: $USED ($PERC)
Free: $AVAIL

Note:
Thumbnails are not saved.
EOF

echo "=== Phase 2 batch end: $(date) ==="
exit 0

Running?

The more impatient people can check the status quo with ps -ef | grep backup_allsky | grep -v grep.

Cronjob

I run the script once a day around 11:30 during the day:

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
30 11 * * * root /usr/local/bin/backup_allsky_phase2.sh >> /var/log/backup_allsky_phase2_cron.log 2>&1

Log rotation in /etc/logrotate.d/backup_allsky

/var/log/backup_allsky_phase2*.log {
    weekly
    rotate 4
    compress
    missingok
    notifempty
    copytruncate
}

Conclusion

Phase 2 of the indi-allsky backup is deliberately more complex than simply copying all files via rsync.
But it is:

  • data-based
  • verifying
  • storage-efficient
  • and maintainable in the long term

For me, this is the crucial difference between “somehow backed up” and a backup that you can trust in an emergency.

Enjoyed this post?

You can support allsky-rodgau.de with a small coffee on BuyMeACoffee.

Buy me a coffee!