Bitchy USB cameras like the ZWO ASI678MC, cheaply made cables, temperature fluctuations: Raspberry Pi-based all-sky cameras are technically demanding. Even stable setups with indi-allsky can get into a state where no more new images are generated – without the process itself crashing.
This article shows a robust, tried-and-tested watchdog solution that works independently of indi-allsky internals:
An external watchdog monitors the actual image production (JPGs) and automatically restarts indi-allsky as soon as the data flow stalls. Optionally including e-mail notification.
Why an external watchdog makes sense for indi-allsky
indi-allsky itself usually only recognizes camera hangs indirectly. In many cases, the process continues to run while individual threads block. Classic mechanisms such as systemd restart or internal status messages then do not work reliably.
An external watchdog has decisive advantages:
- it evaluates real output behavior (existing JPGs)
- it is independent of INDI states, USB events or threads
- it also works with “half-dead” processes
- It is transparent and easy to customize
Basic idea of the watchdog
The principle is deliberately simple:
- Search for the most recent JPG
- Check how old it is
- If no new image has been created for X minutes:
- send a mail
- restart indi-allsky
- Prevent double triggers and mail spam
Prerequisites
indi-allsky runs as a systemd user service
- Images are saved as JPG
- Sending mail via mail already works
- Raspberry Pi OS / Linux with systemd
Create watchdog script
The script takes over the complete logic.
nano ~/indi-allsky-watchdog.sh
Content – please replace mail@domain.tdl with the correct mail address.
#!/bin/bash
# ==================================================
# Configuration
# ==================================================
IMAGE_DIR="/var/www/html/allsky/images"
MAX_AGE_MINUTES=5
MAIL_TO="mail@domain.tld"
LOCK_FILE="/tmp/indi-allsky-watchdog.lock"
STATE_FILE="/tmp/indi-allsky-watchdog.last"
# ==================================================
# Lock to prevent parallel execution
# ==================================================
exec 9>"$LOCK_FILE" || exit 1
flock -n 9 || exit 0
# ==================================================
# Determine timestamp of the latest JPG
# ==================================================
LAST_IMAGE_TIME=$(find "$IMAGE_DIR" \
-type f \
-name "*.jpg" \
-printf '%T@\n' 2>/dev/null \
| sort -n \
| tail -1)
# Exit if no images are found
[ -z "$LAST_IMAGE_TIME" ] && exit 0
# ==================================================
# Calculate age of the last image
# ==================================================
NOW=$(date +%s)
LAST=${LAST_IMAGE_TIME%.*}
AGE_MIN=$(( (NOW - LAST) / 60 ))
# ==================================================
# Check: threshold exceeded?
# ==================================================
if [ "$AGE_MIN" -lt "$MAX_AGE_MINUTES" ]; then
exit 0
fi
# ==================================================
# Rate limiting: only one notification per event
# ==================================================
if [ -f "$STATE_FILE" ]; then
LAST_STATE=$(cat "$STATE_FILE")
if [ "$LAST_STATE" = "$LAST" ]; then
exit 0
fi
fi
echo "$LAST" > "$STATE_FILE"
# ==================================================
# Notification + restart
# ==================================================
echo -e "indi-allsky watchdog triggered\n
No new JPG has been created for ${AGE_MIN} minutes.\n
Host: $(hostname)\n
Time: $(date)\n
The service will be restarted." \
| mail -s "indi-allsky watchdog: restart on $(hostname)" "$MAIL_TO"
systemctl --user restart indi-allsky
Make script executable
chmod +x ~/indi-allsky-watchdog.sh
Manual function test
~/indi-allsky-watchdog.sh
Expected behavior:
- when image production is running: no action
- when image production is stopped: mail + restart
Create systemd service for the watchdog
nano ~/.config/systemd/user/indi-allsky-watchdog.service
[Unit] Description=Watchdog checks JPG activity of indi-allsky [Service] Type=oneshot ExecStart=/home/dante/indi-allsky-watchdog.sh
Create systemd timer
nano ~/.config/systemd/user/indi-allsky-watchdog.timer
[Unit] Description=Timer for indi-allsky watchdog [Timer] OnBootSec=2min OnUnitActiveSec=2min AccuracySec=30s [Install] WantedBy=default.target
Activate watchdog
systemctl --user daemon-reload systemctl --user enable --now indi-allsky-watchdog.timer
Check:
systemctl --user list-timers | grep indi-allsky
What this watchdog does reliably
- recognizes real standstills in image production
- restarts indi-allsky automatically
- sends a maximum of one mail per event
- works independently of USB status or threads
- is completely transparent and maintainable
Appendix: Extension for deliberately unplugged cameras (USB) – 2025-12-25
After publishing the original watchdog setup, a special case emerged in real-world operation that should be taken into account:
the camera may be deliberately unplugged, for example for testing, maintenance, or hardware experiments.
In such situations, indi-allsky may continue running even though no images are being produced.
The basic watchdog logic correctly detects that no new JPG files are being written – however, it cannot distinguish
between a real failure and an intentional manual intervention.
Why process checks or /dev/video0 are not sufficient
In modern indi-allsky setups (libcamera, ZWO cameras, mixed USB/CSI environments), the following applies:
indi-allskycan keep running even if the camera is physically disconnected- a simple service or process check is therefore not reliable
/dev/video0is often not present or not stable across setups
What matters is therefore not only the software state, but the physical presence of the camera.
Solution: USB detection via vendor/product ID (optional)
USB cameras expose a unique identifier that can be queried directly:
lsusb
Example from a productive setup:
Bus 003 Device 002: ID 03c3:678b ZWO ASI678MC
This vendor/product ID can be explicitly checked in the watchdog script.
If the camera is not present on the USB bus, the watchdog assumes a deliberate manual intervention and
suppresses an automatic restart.
Where this extension is inserted in the script
The USB check is inserted in the existing watchdog script directly after the lock section,
that is before evaluating the timestamp of the last JPG file:
# ================================================== # Lock against parallel execution # ================================================== exec 9>"$LOCK_FILE" || exit 1 flock -n 9 || exit 0 # Insert USB detection here
Example USB detection snippet
CAMERA_USB_ID="03c3:678b"
if ! lsusb | grep -qi "$CAMERA_USB_ID"; then
# Camera is physically not present – no restart
exit 0
fi
Effect of this extension
With this optional extension, the watchdog cleanly distinguishes between:
- a real failure condition (camera present, but no new images)
- a deliberate manual action (camera physically unplugged)
Unnecessary restarts are avoided, while the original watchdog logic remains unchanged.