Adversarial parallel multi-agent audit before any “complete” claim

Adversarial parallel multi-agent audit before any “complete” claim

Pattern

When a project crosses into “all features shipped / ready for deploy” territory, dispatch 5 orthogonal agents in parallel BEFORE accepting the completion claim. Single-agent self-review consistently misses entire classes of defect that the 5-axis sweep catches in one round-trip.

Agents to dispatch (parallel)

  1. critic — adversarial review of architectural decisions, unstated assumptions, “this is fine” blindness, what would a Primavera/MS-Project/incumbent user say is missing
  2. edge-case-hunter — mechanical path enumeration: empty/single/many, boundaries, NaN, concurrent access, cross-boundary I/O failures, type coercion at JSON boundaries
  3. pr-review-toolkit:code-reviewer — standards compliance: security, type safety, error handling, tests, naming, style, performance, accessibility
  4. pr-review-toolkit:silent-failure-hunter — error swallowing, empty catch {}, falling back to defaults instead of failing fast, RLS silent-zero-rows mode
  5. Bash sweep — dependency CVE audit (pnpm audit --audit-level high), secret scan, TODO/FIXME residue, drift detection (live counts vs documented), build artifacts vs source

Each agent gets a self-contained prompt with file paths, the explicit bar (“world’s-greatest / zero-compromise / zero-tech-debt”), and a P0/P1/P2 classification request. Results aggregate with high overlap on real defects and low overlap on noise — natural quality signal.

Evidence (R-Plan Wave 6, 2026-05-02)

After 5 build waves shipped + my own setup-curator gate PASS, I claimed pre-pilot ready. AJ asked for a quality audit. The 5-agent dispatch found 4 P0 + 11 P1 + 13 medium findings — none surfaced by single-agent self-review:

  • CVE upgrade (drizzle-orm 0.36 SQL-injection, GHSA-gpj5-g38j-94v9) — caught by Bash dep audit, missed by code-reviewer
  • RLS PII gap (4 tables had no row-level security; any auth user could SELECT * FROM "user" and read every email/role) — caught by critic
  • Silent failure CRITICAL (wrap() used console.error — every unexpected Server Action error invisible in production) — caught by silent-failure-hunter
  • Stale syncing queue rows never recovered after tab crash mid-drain → silent data loss — caught by edge-case-hunter
  • Photo Blob > 1MB queued offline permanently fails forever (no compress before presign) — caught by edge-case-hunter
  • Service-worker cache leak between users on shared tablet — caught by code-reviewer
  • Open-redirect via callbackUrl — caught by critic
  • progress_entries_insert admin bypass violating ADR — caught by critic
  • 9 sites of .includes("duplicate") substring-match for unique-violation — caught by code-reviewer + silent-failure-hunter (overlap = strong signal)

When to apply

  • Before any “this is complete” / “ready for pilot” / “ready to merge” assertion on substantive work (>1000 lines or multi-feature)
  • Before any tagged release
  • When AJ asks for “world’s-greatest / zero-compromise / zero-tech-debt confirmation”
  • After a substantive refactor or wave-close

When to skip

  • Trivial changes (typo fixes, comment updates)
  • Single-file edits with <100 lines diff
  • Pure exploration sessions with no shipping intent

Cost

Each agent costs 5-10k tokens of model output. Bash sweep is free. Total ~50k tokens for the 5-axis sweep. Compared to the cost of shipping a P0 to pilot users and rebuilding trust — trivial.

Failure mode if skipped

The audit catches what self-review misses BY DESIGN — the agents have orthogonal evaluation lenses. Skipping = guaranteed P0 in production within first month of pilot. Memory for AJ’s “world’s greatest” bar is non-negotiable.

  • Pristine Sweep (workspace memory pristine-sweep-protocol.md) — single-axis cleanliness check; complementary, not substitute
  • Setup-curator 8-point gate — file-hygiene only; does not find correctness/security defects
  • Single-pass code-review agent — useful for incremental PRs but insufficient for wave-close

Routing recommendation

Codify as either:

  • New Tier 1 directive: “Before any ‘this is complete’ assertion on substantive work, dispatch 5-agent parallel adversarial audit. Skipping requires explicit AJ override.”
  • New skill pre-deploy-audit — bundles dispatch + dedup + remediation + commit pattern.