systemd Restart=on-failure misses clean exits — use Restart=always for always-on services with internal idle-exit

Restart=on-failure only triggers on exit code != 0. If a service has an internal idle-timeout that cleanly exits (exit 0) when conditions don’t mean “user intent to stop”, on-failure won’t restart it.

Concrete case: brainstorm.service runs Superpowers server.cjs which has hardcoded IDLE_TIMEOUT_MS = 30 * 60 * 1000 (30-min idle → clean shutdown). Restart=on-failure treats the idle-exit as success, service stays down until manually restarted or a subsequent session triggers brainstorming-gate hook.

Fix: Restart=always with appropriate RestartSec=5 — auto-recovers from ANY exit (clean or failure) within 5 seconds. Correct for always-on persistent services.

Restart= choice rule:

  • Restart=on-failure → use when clean exit = “user chose to stop” (e.g., user-launched build servers, one-shot daemons)
  • Restart=always → use for always-on session-dependent services where any exit means “broken user experience” (brainstorm, MCP servers, watchdogs)
  • Restart=no (default) → one-shot tasks, installers

Diagnostic: if a systemd-managed service goes down and journalctl -u <svc> shows clean exit code 0 with “shutdown” or “idle timeout” or “stopping” messages → the restart policy is wrong. Change to Restart=always.

Applied to brainstorm.service 14-Apr-2026. Documented in architecture.md.