New bench-model CLI on AJ’s VPS — cross-model latency/cost/quality compare across Claude sonnet/op

What

New bench-model CLI on AJ’s VPS — cross-model latency/cost/quality compare across Claude sonnet/opus/haiku (Codex/Gemini scaffolded). Useful for autoresearch model-routing decisions: “is Opus worth 5x cost over Sonnet for THIS prompt class?” Has —judge mode (LLM-as-judge), —dry-run, JSON/markdown output. Pricing table hardcoded — refresh when vendors ship new pricing.

Why

RD/autoresearch loops often hardcode model choice without per-task evidence. bench-model gives empirical evidence for routing decisions. Particularly useful when evaluating new prompt classes or after vendor price changes.

Action Required

Consider integrating bench-model into autoresearch model-selection step when evaluating a new task class. No urgent action — augmentation only.