autoresearch-fixed-validation-set-required

Without a fixed 3-5 validation subset that appears in every cycle, score changes across cycles are not apples-to-apples comparisons. Coverage-first sampling for the remainder (prefer untested items, only repeat after full coverage) prevents sampling bias from masking real regressions. autoresearch v2.5.0 mandates this in the Sample Management section.