llm-as-judge-eval-isolation-prevents-charitable-grading

When the same agent generates and judges outputs, it grades charitably because it knows the prompt intent. Fix: strip all prompt context before presenting outputs to the judge — judge sees ONLY raw output + criterion text. This is called eval isolation and is critical for unbiased LLM-as-Judge scoring in autoresearch loops.