18.05 Sub-Agent Context Flow¶
To truly appreciate the power of Sub-Agents, we must understand exactly how tokens flow between the user, the Main Agent, and the Sub-Agent. The primary superpower of a Sub-Agent is Context Isolation.
The Context Window Problem¶
Every turn in a conversation with an LLM consumes tokens. As you prompt the agent, and as the agent returns observations and executes tools, the context window grows.
- Turn 1:
10k tokens - Turn 2:
30k tokens - Turn 5:
100k tokens
As this window approaches the model's limit (e.g., 200k tokens), three things happen: 1. Cost Explodes: You pay for every input token on every new generation. 2. Latency Increases: The time to first token drastically increases. 3. Context Pollution: The LLM gets confused by old, irrelevant data ("Context Rot"), leading to hallucinations.
Without Sub-Agents, the only solution to this problem is to manually "compact" the context or completely clear the history and start fresh, destroying your conversational momentum.
The Sub-Agent Solution¶
Sub-Agents solve this problem by preventing intermediate tokens from ever reaching the Main Agent's context window.
When the Main Agent identifies a task that will require high-token exploration (like reading deeply into a codebase or doing extensive web searches), it spawns a Sub-Agent.
The Flow of Tokens¶
- Initialization: The Main Agent writes a small, focused prompt (e.g.,
500 tokens) explaining the sub-task. - Fresh Context: The Sub-Agent wakes up. It does not see the Main Agent's history. Its entire starting context is just that 500-token prompt.
- The Loop: The Sub-Agent executes tools, reads massive files, makes mistakes, and retries. Its internal context window might swell to
80k tokens. - Resolution: The Sub-Agent finishes the task. It writes a condensed final response (e.g.,
1k tokens). - Death & Return: The Sub-Agent is terminated. The
80k tokensof noisy intermediate work vanish permanently. Only the1k tokenfinal summary is passed back to the Main Agent.
sequenceDiagram
participant User
participant Main Agent (Lean Context)
participant Sub-Agent (Noisy Context)
User->>Main Agent: Implement a complex feature
Main Agent->>Main Agent: Context: 5k tokens
Note over Main Agent: Realizes it needs to deeply explore authentication.
Main Agent->>Sub-Agent: Spawns Sub-Agent with 500-token Prompt
note right of Sub-Agent: Context Starts: 500 tokens
loop Isolated Exploriation
Sub-Agent->>Sub-Agent: Reads files, searches web. Context grows to 80k tokens.
end
Sub-Agent-->>Main Agent: Returns 1k-token focused summary.
Note over Main Agent: Context: 6k tokens
Why This is Elegant¶
By utilizing Sub-Agents, the Main Agent's conversation thread remains incredibly lean. You can interact with a coding agent like Claude Code continuously for hours without needing to manually clear the chat history, because the noisy, heavy lifting was delegated to disposable background workers.