autoresearch-llm-judge-context-blind-eval

When the same agent both generates and evaluates outputs, it grades charitably because it knows the generative intent. Eval isolation — stripping all prompt context before presenting output to the judge — removes this bias. The judge must see ONLY raw output + criterion text, never the prompt that produced it. This is now enforced in autoresearch v2.5.0 LLM-as-Judge section.