context-engineering-operational-thresholds

Context Engineering Operational Thresholds

Evidence-backed numbers from Factory Research (36,611 production messages), RULER benchmark, and BrowseComp analysis. Integrated into Claude Code PreCompact/PostCompact hooks, MEMORY.md Tool Intelligence directives, and protocols.md Evaluation Rigor Protocol on 09-Apr-2026.

Context Capacity

Effective context = 60-70% of advertised window. 200K model degrades at 120-140K tokens.
Lost-in-middle: 10-40% accuracy drop when info sits in context middle vs beginning/end.
Compaction trigger: 70-80% utilization, not 90%+. Above 85%, summarizing model itself degrades.

Tool & Agent Budgets

Tool count ceiling: 10-20 per agent context. Overlap-induced selection errors compound beyond this.
Sub-agent return budget: 1,000-2,000 tokens max regardless of exploration breadth.
Tool output offload threshold: ~2,000 tokens → auto-offload to file.
Supervisor hard cap: 3-5 workers per supervisor tier.

Compression Quality

Artifact trail integrity: 2.2-2.5/5.0 across ALL compression methods — weakest dimension universally. Fix: separate verbatim artifact index.
Observation masking = zero overhead, matches LLM summarization quality. Observations = 83.9% of agent tokens.
Tokens-per-task is the correct optimization target, not tokens-per-request.

Agent Performance Budget (BrowseComp)

Token budget explains 80% of performance variance, tool count ~10%, model choice ~5%.
Rule: increase budget before swapping models.

Evaluation

Justification-before-score: +15-25% reliability.
Pairwise double-pass with position swap eliminates position bias.
Minor eval prompt phrasing changes cause 10-20% score swings.

2026-04-04-oracle-001-self-architecture-analysis
docker
clawteam-openclaw-multi-agent-swarm-evaluation
claude-code-to-nova-20260404-052908 (archived)
enterprise-capability-expansion-5-pillars-from-digital-employee-analysis
context-engineering-upgrade-3-tier-integration-from-agent-sk
compaction-artifact-preservation-universally-scores-22-out-o
observations-are-83-percent-of-tokens-mask-stale-outputs-not
1m-context-era-token-budget-recalibration-5-10x-from-200k

Second Brain

Explorer

context-engineering-operational-thresholds

context-engineering-operational-thresholds

Context Engineering Operational Thresholds

Context Capacity

Tool & Agent Budgets

Compression Quality

Agent Performance Budget (BrowseComp)

Evaluation

Graph View

Table of Contents

Backlinks

Second Brain

Explorer

context-engineering-operational-thresholds

context-engineering-operational-thresholds

Context Engineering Operational Thresholds

Context Capacity

Tool & Agent Budgets

Compression Quality

Agent Performance Budget (BrowseComp)

Evaluation

Related

Graph View

Table of Contents

Backlinks