Causalist

The problem

When Claude Code asks "where does user authentication happen?", flat text search misses the multi-hop trail — the file that validates a JWT may not contain the word "authentication" anywhere; it's reached through auth-middleware.ts → jwt-verifier.ts → crypto-utils.ts.

A graph fixes this: traversal beats grep when the answer is two or three hops away.

How retrieval works in Causalist today

We don't run Personalized PageRank or anything fancy at retrieval time. We expose eleven typed tools that Claude can reach for two ways: as CLI subcommands (preferred for Claude Code, per Anthropic's Code Execution with MCP recommendation — code over tool-call JSON), and as MCP tools (for Cursor, Claude.ai web, and any non-CLI client). Either way, the same eleven primitives:

causalist node / query_node(id) — node metadata.
causalist neighbors / get_neighbors(id, direction) — fan-in / fan-out.
causalist path / find_path(source, target) — shortest path.
causalist blast / blast_radius(id, depth) — reverse-reachable set.
causalist tests / affected_tests(changedIds) — tests reachable from changes.
causalist writers / find_writers(target) — every node that writes.
causalist similar / similar_nodes(id) — same layer / kind / degree.
causalist layer / find_nodes_by_layer(layer) — list a semantic layer.
causalist topo / topo_order(ids) — layered subgraph order.
causalist verify / verify_edge(source, target, kind?) — confirm an edge.
create_project(owner, repo) — MCP-only; pushes a new project into the paired browser.

Each call returns the same typed JSON envelope ({ ok, summary, data? }). Claude composes them: "find the neighbors of auth-middleware, then tests from those, then read the test files." Three calls, no re-grepping.

Why graph-first beats vector-first here
For factual code questions, graph traversal is grounded — the answer is in the graph by construction. Vector retrieval guesses from similarity. LocAgent (ACL 2025) shows graph-guided retrieval significantly outperforms flat vector search on code-localization benchmarks; HippoRAG (NeurIPS 2024) shows the same on multi-hop QA in general.

We don't reproduce their math here. We use the conclusion: typed graph tools work better than embeddings for this kind of question.