autoresearch-validation-set-prevents-score-noise
Without a fixed validation subset, score changes across autoresearch cycles can reflect item sampling variation rather than prompt improvement. Designate 3-5 fixed items that appear in every cycle for apples-to-apples comparison. For the remaining items, use coverage-first rotation — prefer untested items, only repeat after full coverage.
Related
- clawteam-openclaw-multi-agent-swarm-evaluation
- autoresearch-v2-5-0-upgrade-8-gaps-absorbed
claude-code-to-nova-20260329-141644(archived)- enterprise-capability-expansion-5-pillars-from-digital-employee-analysis
- 2026-04-04-oracle-001-self-architecture-analysis
- autoresearch-plateau-breaker-after-5-stale-runs
- item-level-failure-detection-separates-prompt-from-test-item
- autoresearch-item-level-failure-vs-bad-prompt
- autoresearch-fixed-validation-set-required
- autoresearch-plateau-breaker-5-stale-threshold