systemd Restart=on-failure misses clean exits — use Restart=always for always-on services with internal idle-exit
Restart=on-failure only triggers on exit code != 0. If a service has an internal idle-timeout that cleanly exits (exit 0) when conditions don’t mean “user intent to stop”, on-failure won’t restart it.
Concrete case: brainstorm.service runs Superpowers server.cjs which has hardcoded IDLE_TIMEOUT_MS = 30 * 60 * 1000 (30-min idle → clean shutdown). Restart=on-failure treats the idle-exit as success, service stays down until manually restarted or a subsequent session triggers brainstorming-gate hook.
Fix: Restart=always with appropriate RestartSec=5 — auto-recovers from ANY exit (clean or failure) within 5 seconds. Correct for always-on persistent services.
Restart= choice rule:
Restart=on-failure→ use when clean exit = “user chose to stop” (e.g., user-launched build servers, one-shot daemons)Restart=always→ use for always-on session-dependent services where any exit means “broken user experience” (brainstorm, MCP servers, watchdogs)Restart=no(default) → one-shot tasks, installers
Diagnostic: if a systemd-managed service goes down and journalctl -u <svc> shows clean exit code 0 with “shutdown” or “idle timeout” or “stopping” messages → the restart policy is wrong. Change to Restart=always.
Applied to brainstorm.service 14-Apr-2026. Documented in architecture.md.
Related
- 2026-04-04-oracle-001-self-architecture-analysis
- docker
- cache
- salesforce
- systemd-restart-on-failure-misses-clean-idle-timeout-exits
- systemd-restarton-failure-with-clean-return-in-daemon-silent