Docs · pipeline

Pipeline

Four Claude agents, running in parallel

Four specialized agents

Causalist generates a graph by calling Claude Opus 4.7 in four parallel roles. Each agent has a dedicated system prompt constraining its output to strict JSON.

1. Structure agent

Walks the repo tree, classifies every file and module into one of seven layers: infra, data, logic, api, ui, test, config. Produces CausalNode[].

2. Dependency agent

Reads sampled file contents and extracts typed edges: imports, calls, reads, writes, extends. Produces CausalEdge[].

3. Semantic agent

Writes a one-sentence plain-English summary per node. Rules: starts with a verb, no "this file" preamble, under 140 characters. Produces {id, summary}[].

4. Oracle agent

Receives the three previous outputs, merges them into a final CausalGraph, drops orphan edges, runs the structural verification pass, and answers any "what-if?" follow-up questions using extended thinking when multi-hop tracing is required.

Parallel fan-out, serial merge

Structure, Dependency, and Semantic run in Promise.all — they only depend on the raw repository tree. Oracle synthesizes after all three return:

Graph  =  Oracle(Struct(tree),  Dep(tree),  Sem(tree))\text{Graph} \;=\; \text{Oracle}\big(\text{Struct}(\text{tree}),\; \text{Dep}(\text{tree}),\; \text{Sem}(\text{tree})\big)

Three-way parallelism on the first stage is ~3× wall-clock faster than a sequential pipeline for medium repos, at the cost of no shared context across the three — each agent works blind to the others' outputs. For most codebases this is a good trade. If an agent's output disagrees (e.g. Dependency emits an edge whose endpoints Structure didn't classify), Oracle drops the edge and logs it.

Cost budgeting

AgentInput shapeTypical tokens
Structurefile tree JSON (paths + sizes)~5k in, ~8k out
Dependencycurated file contents, capped at 40KB~15k in, ~8k out
Semanticpaths + exports, not full bodies~8k in, ~6k out
Oraclemerged outputs + schema~25k in, ~16k out

With prompt caching, the shared repo context (tree + sampled contents) is cached at write time and billed at 10% on subsequent "what-if?" queries against the same graph. Net: first analyze of a medium repo is around $0.30 in Opus tokens; follow-up Ask turns are pennies.