Replaced single-axis alert gates in /root/.claude/crons/vps-health-wat…

Decision

Replaced single-axis alert gates in /root/.claude/crons/vps-health-watchdog.sh and claude-session-runaway-guard.sh with multi-axis AND-conjunction matching actual pre-hang shape. Steal: steal>=25 AND load1m>=6 AND user_cpu>=20. Load: load1m>=8 AND user_cpu>=20. Session runaway: age>=6h AND cpu_hrs>=1 (RSS>=4GB independent). Top-CPU panel switched ps→top -b -n 2 -d 1 to fix et=00:00 division-by-zero artifacts.

Rationale

AJ mandate: Telegram receives only “about to break” signals, not noise. Validation: 44/44 historical noise alerts in last 24h would be silent under new gates; 22-Apr pre-hang shape (steal=91, load=359, user=47, session 24h-wall/7h-CPU/7.1GB-RSS) still fires on multiple paths. Premortem identified 2 residual risks accepted: disk-I/O saturation hang (low user_cpu) silenced — mitigated by independent MEM path + Hostinger panel + Prometheus/Loki/Grafana; strict-AND fires mid-break not pre-break — mitigated by repositioning Telegram as LOUD-ALARM channel of last resort while early-warning lives in always-on dashboards.

Alternatives Rejected

Outcome

Pending