Learn Claude Code: 12 Sessions to Build an AI Agent Harness from Scratch
Learn Claude Code is a teaching repository that reverse-engineers how Claude Code (Anthropic’s AI coding agent) works internally. It distills the harness engineering into 12 progressive Python sessions — from a bare while loop to a full multi-agent autonomous team. It also ships a Next.js web app for interactive learning.
Core philosophy: “The model IS the agent. Not a framework. Not a prompt chain. The intelligence comes from the model; the code is just the vehicle.”
0 What Is a Harness?
A harness is the code that wraps around an LLM, giving it tools, context, memory, and execution environment. Claude Code itself is a harness. This repo teaches you to build one from zero.
Harness = Tools + Context Management + Memory + Permissions + Execution Environment
Agent = LLM + Harness
1 The 12 Sessions — Progressive Complexity
The sessions are grouped into 5 layers, each building on the previous:
Layer 1: Tools & Execution [s01, s02] — The core loop
Layer 2: Planning [s03, s04, s05, s07] — Structure & knowledge
Layer 3: Memory [s06] — Infinite sessions
Layer 4: Concurrency [s08] — Background execution
Layer 5: Collaboration [s09-s12] — Multi-agent teams
Every session is a self-contained Python file (~200-500 lines) that you can run directly:
export ANTHROPIC_API_KEY=sk-ant-xxx
python agents/s01_agent_loop.py
s01 — The Agent Loop
“One loop & Bash is all you need.”
The minimal viable agent — a while loop that calls the LLM, checks for tool calls, executes them, and feeds results back:
while stop_reason == "tool_use":
response = client.messages.create(model=MODEL, messages=messages, tools=tools)
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
messages.append({"role": "user", content: [{"type": "tool_result", ...}]})
stop_reason = response.stop_reason
One tool: bash (shell commands, 120s timeout). That’s the entire kernel. Everything else is built on top.
s02 — Tool Dispatch
“Adding a tool means adding one handler.”
The loop stays exactly the same. New tools are added to a dispatch map:
TOOL_HANDLERS = {
"bash": handle_bash,
"read_file": handle_read_file,
"write_file": handle_write_file,
"edit_file": handle_edit_file,
}
output = TOOL_HANDLERS[tool_name](**kwargs)
Key insight: scalable capabilities don’t require changing the loop.
s03 — TodoWrite (Planning)
“An agent without a plan drifts.”
Adds a TodoManager with structured task tracking:
[ ] pending → [>] in_progress → [x] completed
Constraint: only one task can be in_progress at a time. If the agent hasn’t updated todos for 3+ rounds, the harness injects a nag reminder into the conversation. This forces sequential, visible planning.
s04 — Subagents
“Break big tasks down; each subtask gets a clean context.”
The task tool spawns a fresh agent with messages = [] (empty history). The subagent runs independently, then returns only a text summary. Parent context stays clean.
Parent: [full conversation history + task tool]
↓ spawn
Child: [empty messages + prompt only, NO task tool (prevents recursion)]
↓ returns summary
Parent: [receives short summary, context uncontaminated]
s05 — Skill Loading
“Load knowledge when you need it, not upfront.”
Two-layer injection:
- Layer 1 (always in system prompt): skill metadata — name + description (~100 tokens each)
- Layer 2 (on-demand): full
SKILL.mdbody injected viatool_resultwhenload_skill(name)is called
The agent sees the skill catalog cheaply, then loads the full reference only when needed. This prevents system prompt bloat.
s06 — Context Compaction
“Context will fill up; you need a way to make room.”
Three-layer compression strategy:
| Layer | Trigger | Action |
|---|---|---|
| Micro | Every turn | Replace old tool results with [Previous: used {tool_name}] |
| Auto | Token threshold (~50k) | Archive transcript to .transcripts/, LLM summarizes, replace all with summary |
| Manual | Agent calls compact tool |
Same as auto, but agent-initiated |
This enables infinite sessions — the agent can work for hours without running out of context.
s07 — Task System
“Break big goals into small tasks, order them, persist to disk.”
File-based task board: each task is a JSON file in .tasks/task_{id}.json:
{
"id": "task_1",
"subject": "Implement auth",
"status": "in_progress",
"blockedBy": [],
"blocks": ["task_3"],
"owner": "lead"
}
Dependency resolution: completing task_1 automatically removes it from task_3.blockedBy. Tasks survive context compression because they live on disk, not in conversation history.
s08 — Background Tasks
“Run slow operations in the background; the agent keeps thinking.”
BackgroundManager spawns daemon threads for long-running shell commands. Returns immediately with a task_id. Notifications are drained and injected before each LLM call:
Agent: background_run("npm test") → "task_bg_1 started"
Agent: [continues working on other things]
...
[Before next LLM call, harness injects]:
<background-results>
bg_1 completed: "All 42 tests passed"
</background-results>
s09 — Agent Teams
“When the task is too big for one, delegate to teammates.”
File-based message bus with JSONL inboxes per teammate. The lead agent spawns persistent named agents running in daemon threads:
Lead Agent
├── spawn_teammate("frontend-dev", "Build React components")
├── spawn_teammate("backend-dev", "Build API endpoints")
└── broadcast("Use TypeScript everywhere")
↓ ↓
.team/inbox_frontend-dev.jsonl .team/inbox_backend-dev.jsonl
Each teammate has its own agent loop, reads its inbox, does work, and can message back. 9 tools for the lead (spawn, send_message, read_inbox, broadcast, etc.)
s10 — Team Protocols
“Teammates need shared communication rules.”
Two FSM protocols built on the same request-id correlation pattern:
Shutdown Protocol:
Lead → shutdown_request(request_id) → Teammate
Lead ← shutdown_response(request_id, approved: true) ← Teammate
Plan Approval Protocol:
Teammate → plan_approval(request_id, plan) → Lead
Teammate ← plan_approval_response(request_id, approved: true) ← Lead
Both use the same pattern: send a message with request_id, wait for response with matching request_id. One pattern, all team negotiation.
s11 — Autonomous Agents
“Teammates scan the board and claim tasks themselves.”
Teammates don’t wait for assignments — they poll the task board and self-assign:
def scan_unclaimed_tasks():
# Find .tasks/task_*.json where status=pending and owner=None
return unclaimed
def claim_task(task_id, agent_name):
# Atomic claim with lock (prevents race conditions)
task.owner = agent_name
task.status = "in_progress"
IDLE cycle: teammates poll every 5s for up to 60s looking for unclaimed work. Identity re-injection ensures the agent remembers who it is after context compression.
s12 — Worktree Isolation
“Each works in its own directory, no interference.”
WorktreeManager creates git worktrees (isolated directory copies of the repo), one per task:
.worktrees/
├── index.json # {name, path, branch, task_id, status}
├── auth-refactor/ # git worktree for task_1
├── api-endpoints/ # git worktree for task_2
└── test-suite/ # git worktree for task_3
Tasks bind to worktrees: task_1 → worktree auth-refactor. Multiple teammates can execute in parallel without file conflicts. EventBus provides append-only JSONL lifecycle events for visibility.
2 The Capstone: s_full.py
s_full.py (~1200 lines) combines mechanisms from s01-s11 into one runnable agent. It’s NOT a teaching session — it’s the reference implementation showing how everything integrates:
Before each LLM call:
1. micro_compact() — s06 Layer 1
2. drain background queue — s08
3. check inbox — s09
4. inject notifications — system message
Tool dispatch: 25+ tools across all mechanisms
REPL commands: /compact, /tasks, /team, /inbox
3 The Web Platform
The repo includes a Next.js interactive learning app (/web/) with:
| Page | Purpose |
|---|---|
| Home | Hero + learning path overview with 5 color-coded layers |
Timeline (/timeline) |
Visual progression through all 12 sessions |
Session Viewer (/s01…/s12) |
Source code with syntax highlighting + annotations |
Diff View (/s02/diff) |
Side-by-side diff between adjacent sessions |
Compare (/compare) |
Compare any two sessions |
Layers (/layers) |
Architecture layers breakdown |
Tech stack: Next.js 16 + React 19 + Tailwind CSS v4 + Framer Motion. Supports three languages (English, Chinese, Japanese) via [locale] routing.
The scripts/extract-content.ts pre-build step extracts Python source code into structured JSON for the web viewer — keeping the web app in sync with the agent code.
4 Skills System
On-demand knowledge modules in /skills/:
| Skill | Purpose |
|---|---|
agent-builder |
How to design agents for any domain (with reference code) |
code-review |
Code review checklists and criteria |
pdf |
PDF processing workflows |
mcp-builder |
Building MCP servers for agent capabilities |
Each skill has a SKILL.md with YAML frontmatter (name, description, tags) and reference files. The agent loads them via load_skill("agent-builder") when needed (s05 pattern).
5 Architecture Mental Model
The 12 sessions map to a layered architecture:
┌─────────────────────────────────────────────┐
│ Layer 5: Collaboration │
│ [s09 Teams] [s10 Protocols] [s11 Autonomy] │
│ [s12 Worktree Isolation] │
├─────────────────────────────────────────────┤
│ Layer 4: Concurrency │
│ [s08 Background Tasks] │
├─────────────────────────────────────────────┤
│ Layer 3: Memory │
│ [s06 Three-Layer Compaction] │
├─────────────────────────────────────────────┤
│ Layer 2: Planning & Coordination │
│ [s03 TodoWrite] [s04 Subagents] │
│ [s05 Skills] [s07 Task System] │
├─────────────────────────────────────────────┤
│ Layer 1: Tools & Execution │
│ [s01 Agent Loop] [s02 Tool Dispatch] │
└─────────────────────────────────────────────┘
Each layer is independent — you can use s01-s06 without ever touching teams. But when you need multi-agent, s09-s12 build cleanly on top.
6 Key Patterns
Pattern 1 — The Universal Agent Loop (every session):
while stop_reason == "tool_use":
response = LLM(messages, tools)
for tool_call in response:
result = dispatch[tool_call.name](**tool_call.input)
messages.append(tool_result(result))
Pattern 2 — File-Based State (s07, s09, s11, s12): Tasks, inboxes, worktree index — all persisted as JSON/JSONL files. Survives context compression, process restarts, and agent replacement.
Pattern 3 — Request-ID Correlation (s10):
Every cross-agent handshake uses {request_id, type, payload}. Response matches request_id. One pattern for shutdown, plan approval, and any future protocol.
Pattern 4 — Identity Re-injection (s11):
After context compression, the agent might forget who it is. make_identity_block(name, role) re-injects identity as the first message. Agent stays coherent across compressions.
Pattern 5 — Layered Injection (s05, s06): Cheap metadata always present → expensive content loaded on demand. Applies to both skills (s05) and context management (s06).
7 Running It
git clone https://github.com/anthropics/learn-claude-code
cd learn-claude-code
pip install -r requirements.txt # anthropic>=0.25, python-dotenv
# Set API key
cp .env.example .env
# Edit .env: ANTHROPIC_API_KEY=sk-ant-xxx, MODEL_ID=claude-sonnet-4-6
# Run any session
python agents/s01_agent_loop.py # Minimal agent
python agents/s09_agent_teams.py # Multi-agent team
python agents/s_full.py # Everything combined
# Web learning platform
cd web && npm install && npm run dev
# Open http://localhost:3000
Supports multiple providers via ANTHROPIC_BASE_URL:
- Anthropic Claude (default)
- MiniMax-M2.5 (80.2% SWE-bench)
- GLM-5 by Zhipu (77.8%)
- Kimi-k2.5 by Moonshot (76.8%)
- DeepSeek-chat (73.0%)
8 Takeaway
This repo proves that an AI coding agent is not magic — it’s a well-designed harness. The 12 sessions show that:
- The core is trivial: a while loop + tool dispatch (~50 lines)
- Planning gives direction: TodoWrite + subagents prevent drift
- Memory enables persistence: compaction + file-based tasks survive any context limit
- Teams enable scale: persistent agents + message bus + protocols
- Autonomy emerges: task board polling + self-claim = agents that find their own work
The model provides intelligence. The harness provides opportunity. Build great harnesses.