Vault underwent a major bloat cleanup on 2026-04-17 (2512 → 376 live notes, −85%). You need to know

What

Vault underwent a major bloat cleanup on 2026-04-17 (2512 → 376 live notes, −85%). You need to know the details before your next weekly vault-hygiene-sweep (Sun 01:30 IST) so you don’t misinterpret the archive as broken links to restore.

What changed

  1. 2137 notes archived under vault/_archive/cleanup-2026-04-17-conversation-flush-bulk/ (9.1 MB). Mostly conversation-flush output with consulted=0 — 100% of that agent’s corpus. Included massive duplicate clusters: 88× stop-hook variants, 30× react-native-web variants, 29× drizzle-fk variants.
  2. transcript-capture.py Stop/PreCompact hook DISABLED at both /root/.claude/hooks/ and /home/claude/.claude/hooks/ — the conversation-flush.py script it spawned had no pre-submit dedup and produced 2052 write-only notes with 0% consultation rate. DISABLED = True constant at top of file.
  3. New institutional-only discipline captured at aj-workspace:/memory/feedback_vault_institutional_only.md with a 7-row ✅/❌ matrix + 4-test filter (cross-agent benefit / non-stack-specific / search-first dedup / would-user-want-this).
  4. session-close skill updated — Phase 3.5 Quality Gate rewritten with 4-test filter + ≤3 submissions-per-session hard ceiling; Phase 2 now runs a Vault Bloat Audit every session-close (duplicate-cluster check, conversation-flush regression guard, unconsulted-ratio alert).
  5. Evolution backlog updated (aj-workspace:memory/evolution-backlog.md 17-Apr-2026 entries) with three items: conversation-flush structural dedup fix, Vault Submission Discipline rule, superpowers:subagent-driven-development scale guidance.

Git trace

  • /opt/second-brain repo: commits e206c4e (over-specific removal) → 7330819 (session dedup) → f697525 (mass archive 2137 files) → stragglers for subsequent conversation-flush re-writes and 2 more dupes.

Rationale

User directive 2026-04-17: “vault is institutional intelligence, not an activity log … pattern recognition or pattern-related decisions or certain kinds of best practices … I don’t want redundancies and unnecessary garbage.” Quality over size. Conversation-flush was the bloat engine; 82% of vault was its output, 100% unconsulted.

Why

Your weekly vault-hygiene-sweep checks for broken wikilinks, missing Related sections, and orphan notes. Without this handoff, it could (a) flag the 2137 archived files as “broken links” from remaining wiki backlinks and attempt auto-restore — re-bloating the vault; (b) flag the disabled transcript-capture.py hook as a regression and try to re-enable it; (c) apply its own submission gate without incorporating the new institutional-only discipline rule, producing the same bloat pattern over time.

You also own architecture/orchestration quality-gating — this cleanup + discipline rule is architectural hygiene your lane should formally own going forward.

Action Required

On next weekly vault-hygiene-sweep (Sun 01:30 IST) and thereafter:

  1. TREAT _archive/cleanup-2026-04-17-conversation-flush-bulk/ AS INTENTIONAL ARCHIVE. Do not auto-restore any file from it. Do not try to “fix broken backlinks” by pulling files back into the live tree. If a wiki entity page has backlinks pointing into the archive, either leave them (archive reference is OK) or scrub the backlink (see the 8-file scrub pattern in commit e206c4e). Never restore.

  2. DO NOT RE-ENABLE transcript-capture.py. The DISABLED = True flag at the top of both copies is intentional. Re-enable criteria documented in the hook’s docstring + aj-workspace evolution-backlog 17-Apr-2026: (a) script must call vault_search before every submission and skip on ≥3 similar-title matches, (b) hard cap ≤3 submissions per session, (c) extraction prompt must apply the institutional-only gate from feedback_vault_institutional_only.md. All three required before re-enable.

  3. ADOPT THE INSTITUTIONAL-ONLY GATE FOR YOUR OWN vault_submit_ CALLS.* The 4-test filter in feedback_vault_institutional_only.md: (a) Cross-agent benefit — would an agent in another division retrieve this and behave differently? (b) Non-stack-specific — strip framework/library name, is the pattern still valuable? (c) Search-first — vault_search returned <3 similar-title matches? (d) Would the user add this? If unsure → don’t submit. Fail any test → route the learning to an agent’s local memory (error-playbook / session-learnings / feedback), not vault.

  4. (Optional, architectural work in your lane): Rewrite /home/claude/.claude/scripts/conversation-flush.py with the three preconditions in item 2. Once rewritten and verified, Claude Code can flip DISABLED = False. Include a test harness that submits 10 known patterns and verifies dedup rejects 7+ near-dupes before allowing production use.

  5. Update your weekly sweep to include the duplicate-cluster check that session-close Phase 2 now runs (slug-prefix grouping, flag clusters ≥4 copies). Reference: /home/claude/.claude/skills/session-close/references/phase-2-pristine-state.md § Vault Bloat Audit.

Acknowledge when received.