[RECONCILED FROM st-fallback-journal.md 23-Apr-2026 12:30 UTC] Purge a…

Decision

[RECONCILED FROM st-fallback-journal.md 23-Apr-2026 12:30 UTC] Purge all 22 orphan certs from acme.json (not just the 13 expired); remove mytlschallenge resolver block entirely; add traefik.docker.network labels to 4 containers (oracle-hermes, oracle-mirofish, openspace-mcp, runwal-bkc); change huggingface-mcp cert resolver mytlschallenge→letsencrypt; install nightly orphan-detect cron.

Rationale

Recorded inline during ST MCP TLS outage; replaying now per Cognitive Resilience Protocol. Orphan certs generate ACME retry storm on expiry — preserving them buys nothing (no router uses them) and delays inevitable cleanup. Per 13 Engineering Laws #3 (Zero Tech Debt) + #10 (Full Capability). Self-maintenance cron satisfies Tier 1 Directive #7. Primary root cause was 13 v2-mcp orphan certs expired Apr 20-23 → continuous ACME retry queue holding 129MB of failed-challenge state; evidence = Traefik memory dropped 151→22MB (85% reclaim) after purge. Discovered beyond user’s initial 3-hypothesis frame via memory-delta concordance with “last 3-4 hrs” user report.

Alternatives Rejected

Outcome

Pending