pi-dynamo-provider: Wiring Pi Agents into Dynamo’s Observability Stack
pi-dynamo-provider is a small (~650 LOC TypeScript) Pi extension that registers a dynamo model provider — letting Pi’s agentic CLI run against an NVIDIA Dynamo OpenAI-compatible endpoint, while teeing every LLM request and tool call into Dynamo’s agent trace sink.
Core idea: make a Pi agent run visible — and benchmarkable — inside Dynamo’s tracing infrastructure, without patching either side’s core.
Harness boundary stays clean:
Pi side Dynamo side
───────────── ──────────────
ExtensionAPI only Public agent-trace surface
No pi-mono patches No Pi-specific shims
└──── pi-dynamo-provider ────┘
(this repo, ~650 lines)
0 Why This Bridge Exists
Pi is an agentic coding CLI that talks to LLMs and runs tools. Dynamo is an inference serving system with a built-in agent-trace sink, a Mooncake-replay benchmark harness, and Perfetto visualization. They were built independently, and the two halves of an agent task — the reasoning trace (LLM calls) and the action trace (tool executions) — live on opposite sides of the wire.
Without a bridge, a Dynamo trace shows isolated LLM requests with no idea they belong to the same agent task. With the bridge, one Dynamo trace renders both LLM spans and Pi tool spans on the same trajectory — joinable, replayable, profilable.
1 What’s in the Repo
Three source files in src/:
| File | Responsibility |
|---|---|
index.ts |
Extension entrypoint. Calls readDynamoConfig, discovers models via /v1/models, registers the dynamo provider with Pi, wires the tool-event relay |
dynamo-provider.ts |
Builds nvext.agent_context from DYN_AGENT_* and PI_SUBAGENT_* env vars. Wraps streamSimple to inject metadata on every chat-completions request. Adds x-request-id if absent |
tool-relay.ts |
ZMQ PUSH publisher for Pi tool events. Connects to a Dynamo-bound PULL endpoint. Wire format: [topic, seq_be_u64, msgpack(AgentTraceRecord)] |
Plus three helper scripts:
scripts/install-dynamo.sh— clones Dynamo, builds Python bindings, sets up auvvenvscripts/launch-agg-agent.sh— boots Dynamo frontend + one SGLang worker with agent tracing and ZMQ tool ingest enabledscripts/integration-smoke.sh— out-of-band end-to-end check; boots Dynamo + mocker, sends one real chat completion, assertsnvext.agent_contextround-trips into the trace JSONL
That’s it. The public API is a re-export of dynamo-provider and tool-relay from index.ts — no other surface.
2 The Two Channels
pi-dynamo-provider runs two independent data paths that converge in Dynamo’s trace sink:
HTTP /v1/chat/completions
+ nvext.agent_context (in request body)
┌──────┐ ───────────────────────────► ┌────────────────────┐
│ Pi │ ── pi-dynamo-provider ──► │ Dynamo frontend │
│ │ │ + worker │
│ │ ── tool_start/end/error ──► │ (vLLM / SGLang) │
└──────┘ ───────────────────────────► └─────────┬──────────┘
ZMQ PUSH (msgpack AgentTrace) │
→ Dynamo PULL ingest ▼
agent trace JSONL
(LLM + tool spans,
joined by trajectory_id)
│
▼
Perfetto · Mooncake replay
· benchmark harness
Channel A — chat completions over HTTP. Standard OpenAI-compatible POST, but the body carries an extra nvext.agent_context object that Dynamo records on request_end.
Channel B — tool events over ZMQ PUSH. Pi emits tool_start / tool_end / tool_error events to a Dynamo-bound PULL endpoint. The wire frame is three parts: a topic string, a big-endian 64-bit sequence number, and a msgpacked AgentTraceRecord.
Why ZMQ and not HTTP? Dynamo owns the bind side; multiple Pi processes, subagents, or tool workers can all connect as PUSH producers without competing for the local endpoint. Tool tracing is best-effort, not durable — the publisher’s bounded queue drops events when full. This is correct: trace data must never back-pressure Pi.
3 The agent_context Schema
Every chat-completions request gets a payload like:
{
"nvext": {
"agent_context": {
"session_type_id": "pi_coding_agent",
"session_id": "pi-demo-001",
"trajectory_id": "<pi-session-id>",
"parent_trajectory_id": "<optional-parent>",
"phase": "reasoning"
}
}
}
These field names are not arbitrary — they match ATIF, the schema Dynamo’s converter and benchmark stack join on. The phase: "reasoning" value tags the LLM call as an agent reasoning step (versus synthesis, grading, etc.); adding new phase values requires Dynamo-side coordination.
Existing nvext fields are preserved, and x-request-id is added only when the caller didn’t already set one.
4 Subagent Trajectory Linking
The repo’s most opinionated bit of logic lives in dynamo-provider.ts’s subagent bridge. The problem it solves:
When a tool like pi-subagents spawns a child Pi process, the child inherits the parent’s process.env — including DYN_AGENT_TRAJECTORY_ID. Without intervention, parent and child emit identical trajectory_id values, and the parent/child distinction collapses in the trace.
The bridge detects PI_SUBAGENT_CHILD=1 and rewrites the agent context:
parent process: trajectory_id = root-traj
parent_trajectory_id = (unset)
pi-subagents spawns child with env:
DYN_AGENT_TRAJECTORY_ID=root-traj (inherited verbatim)
PI_SUBAGENT_CHILD=1
PI_SUBAGENT_RUN_ID=run-1
PI_SUBAGENT_CHILD_AGENT=researcher
PI_SUBAGENT_CHILD_INDEX=2
child after applySubagentBridge:
trajectory_id = run-1:researcher:2
parent_trajectory_id = root-traj
Three rules make this robust:
- Manual override always wins. Setting
DYN_AGENT_PARENT_TRAJECTORY_IDexplicitly disables the bridge. - One-way knowledge flow. pi-dynamo-provider knows about
PI_SUBAGENT_*env vars; pi-subagents has no idea this bridge exists. Don’t propose pi-subagents changes for problems solvable here. - process.env gets mutated. Any subagents the child itself spawns inherit the corrected parent → child chain. Nested chains stay attributable instead of collapsing back to the root.
When PI_SUBAGENT_CHILD isn’t set, this code path is inert.
5 Env-Var Naming Contract
The repo enforces a deliberately narrow env-var namespace policy:
| Prefix | Direction | Examples |
|---|---|---|
DYNAMO_* |
client config (we read) | DYNAMO_BASE_URL, DYNAMO_API_KEY |
DYN_AGENT_* |
dynamo agent context (we read + emit) | DYN_AGENT_SESSION_ID, DYN_AGENT_TRAJECTORY_ID, DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT |
PI_SUBAGENT_* |
pi-subagents bookkeeping (we read only) | PI_SUBAGENT_CHILD, PI_SUBAGENT_RUN_ID, PI_SUBAGENT_CHILD_AGENT, PI_SUBAGENT_CHILD_INDEX |
OPENAI_BASE_URL |
OpenAI-compatibility fallback | only consulted when DYNAMO_BASE_URL is unset |
No new prefixes. New variables must justify which existing namespace they belong in.
6 Architecture Invariants
The CLAUDE.md spells out four invariants that the codebase defends:
- One-way knowledge flow. Bridge knowledge lives here, not in pi-subagents.
- No
pi-monocore patches. Everything goes through Pi’s publicExtensionAPI. If you want a Pi core change, find a different angle here first. - Dynamo owns the ZMQ bind side. We’re a PUSH connect-side producer. Don’t try to bind.
- Trace data is best-effort. No retry loops, no persistent queues, no back-pressure on Pi. Bounded-queue drops are correct.
These constraints are what keep the extension small and the integration robust across upstream churn on both sides.
7 Verifying the Effect
Four layers, from cheap to thorough:
Layer 1 — Build & unit smoke
npm install
npm run check # tsc strict + exactOptionalPropertyTypes + noUncheckedIndexedAccess
npm test # vitest
npm run build
./scripts/integration-smoke.sh # boots Dynamo + mocker, asserts agent_context round-trip
The integration smoke covers two cases: top-level agent_context and the pi-subagents bridge. Trace envelope assertions only — mocker output is intentionally garbage.
Layer 2 — Real end-to-end trace inspection
./scripts/install-dynamo.sh
./scripts/launch-agg-agent.sh --gpu 0 # serves zai-org/GLM-4.7-Flash by default
# In another shell, use the env block the launcher prints:
export DYNAMO_BASE_URL=http://127.0.0.1:18083/v1
export DYNAMO_API_KEY=dummy
export DYN_AGENT_SESSION_TYPE_ID=pi_coding_agent
export DYN_AGENT_SESSION_ID=verify-001
export DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT=tcp://127.0.0.1:20390
pi --model dynamo/zai-org/GLM-4.7-Flash \
-p "Run the tests in this folder, fix the smallest bug, and rerun the tests."
After it finishes, inspect the trace JSONL:
| Check | How | Pass criterion |
|---|---|---|
| LLM requests carry agent_context | jq -r 'select(.event_type=="request_end")\|.agent_context' trace.jsonl |
Every record has session_id=verify-001, consistent trajectory_id |
| Tool events captured | jq -r '.event_type' trace.jsonl \| sort \| uniq -c |
Both request_end and tool_start/tool_end present |
| Tool events share trajectory | join trajectory_id across both event types | Identical IDs |
x-request-id preserved |
grep x_request_id |
Every request_end has one |
| Causal ordering | tool_start timestamps fall after the triggering LLM request_end |
No tool-precedes-call inversions |
Then render in Perfetto:
cd $DYNAMO_DIR && source .venv/bin/activate
python benchmarks/agent_trace/convert_to_perfetto.py \
trace.jsonl --include-markers --separate-stage-tracks \
--output trace.perfetto.json
# Drop into https://ui.perfetto.dev/
Expected: dynamo.llm spans and dynamo.agent.tool spans nested on the same trajectory lane.
Layer 3 — Negative paths (the extension stays inert when it should)
# Tool relay endpoint unset → events drop silently, no errors
unset DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT
pi --model dynamo/<model> -p "..."
# Dynamo unreachable → streamSimple fails cleanly, doesn't hang Pi
DYNAMO_BASE_URL=http://127.0.0.1:1 pi --model dynamo/default -p "ok"
Subagent bridge isolated test:
PI_SUBAGENT_CHILD=1 \
PI_SUBAGENT_RUN_ID=run-x \
PI_SUBAGENT_CHILD_AGENT=researcher \
PI_SUBAGENT_CHILD_INDEX=2 \
DYN_AGENT_TRAJECTORY_ID=root-traj \
pi --model dynamo/<model> -p "ok"
Expect trajectory_id == "run-x:researcher:2" and parent_trajectory_id == "root-traj" in the trace.
Layer 4 — Overhead (optional)
Same prompt + seed, 30 runs each, compare:
- Tool relay off (unset
DYN_AGENT_TOOL_EVENTS_ZMQ_ENDPOINT) - Tool relay on
Measure Pi wall time and Dynamo-side TTFT / TPOT. ZMQ PUSH is non-blocking with a bounded queue; the expected gap is < 1%. If it’s larger, suspect msgpack or serialization on the hot path.
8 Downstream: What the Trace Enables
Once the trace exists, it unlocks Dynamo’s analysis surface — none of which lives in pi-dynamo-provider itself:
- Perfetto visualization —
benchmarks/agent_trace/convert_to_perfetto.pyfor time-line debugging - Mooncake replay —
cargo run -p dynamo-bench --bin agent_trace_to_mooncakeconverts the trace to Mooncake-style JSONL withhash_idsfor KV cache reuse simulation python -m dynamo.replay— replays the synthesized trace through Dynamo’s mocker (offline) or live mock runtime (online) to benchmark scheduler/router/cache behavior under different worker counts, router modes, and arrival speedup ratios
The replay path is the one I’d reach for to do parameter sweeps without burning real GPU hours — but it depends entirely on pi-dynamo-provider being the trace producer upstream.
9 Takeaway
pi-dynamo-provider is a textbook case of a good integration layer:
- Tiny surface (~650 LOC, three files, two re-exports).
- No upstream patches. Everything expressible through Pi’s
ExtensionAPIand Dynamo’s public trace sink. Neither side knows about the other. - Two channels, one trace. HTTP body annotations carry reasoning context; ZMQ PUSH carries action events. Both land in the same JSONL.
- Best-effort, never blocking. Trace failures degrade silently. Pi is never held hostage by observability.
- Schema discipline. ATIF field names are immovable because downstream tools join on them. The repo’s CLAUDE.md is explicit about what to leave alone.
The lesson generalizes: when bridging two evolving systems, the integration layer’s job isn’t to add features — it’s to speak both dialects fluently while staying invisible to each. That’s what makes pi-dynamo-provider a 650-line repo instead of a 6,500-line one.