Skip to main content

Codegraph Enrichments

Beta feature

Codegraph enrichment is a beta capability — disabled by default. The graph extraction and structural signals are still being calibrated across languages. Resolution recall varies by language, and signal semantics may change between releases. Opt in with CODEGRAPH_ENABLED=true.

While git enrichments answer "how has this code behaved over time?", codegraph enrichment answers "how is this code connected right now?". tea-rags extracts your project's call graph and import graph into a per-project DuckDB database, then attaches structural graph signals — fan-in, fan-out, instability, PageRank, transitive impact — to every indexed chunk. Your agent can rank by architectural importance and blast radius, not just relevance and history.

What It Is

Codegraph is a trajectory enrichment family (internal key codegraph.symbols). At index time, per-language tree-sitter walkers extract symbols, imports, and call sites; per-language resolvers turn those into graph edges stored in DuckDB (one .duckdb file per indexed project under <dataDir>/codegraph/). Two graphs are built:

  • Import graph (file-to-file) — which files import which, used for file-level coupling signals.
  • Call graph (symbol-to-symbol) — which functions/methods call which, used for symbol-level signals and cycle detection.

For the theory behind these metrics (Henry & Kafura fan-in/fan-out, Martin instability, PageRank centrality and bug-proneness), see Code Quality Metrics.

Enabling Codegraph

Codegraph is disabled by default (beta). Opt in with CODEGRAPH_ENABLED:

claude mcp add tea-rags -s user -- node /path/to/tea-rags/build/index.js \
-e CODEGRAPH_ENABLED=true

While disabled (the default), the entire family is dropped — no graph extraction, no graph signals on payloads, and the codegraph MCP tools (get_callers, get_callees, find_cycles) are not registered. Re-index after enabling so payloads carry the new signals.

Supported Languages

Graph extraction runs for 8 languages across 12 extensions:

LanguageExtensions
TypeScript.ts, .tsx
JavaScript.js, .jsx, .mjs, .cjs
Python.py
Ruby.rb
Go.go
Java.java
Rust.rs
Bash.sh, .bash

Files in other languages are still indexed and embedded by tea-rags — they just carry no codegraph signals.

What You Get

Codegraph computes signals at two scopes:

File-scope signals (import graph)

SignalWhat it tells you
codegraph.file.fanInNumber of files importing this file (afferent coupling)
codegraph.file.fanOutNumber of files this file imports (efferent coupling)
codegraph.file.instabilityMartin instability fanOut / (fanIn + fanOut), range 0–1
codegraph.file.connectionCountTotal file-graph edges fanIn + fanOut (support for instability confidence)
codegraph.file.isHubtrue when fanIn exceeds the collection p95 (heavily depended-upon)
codegraph.file.isLeaftrue when fanOut is 0 and fanIn > 0 (pure dependency, depends on nothing)
codegraph.file.transitiveImpactDistinct files that transitively import this file (reverse BFS, depth-capped at 5) — the real blast radius

Symbol-scope signals (call graph)

SignalWhat it tells you
codegraph.chunk.fanInDistinct call sites invoking this symbol (method-level fan-in)
codegraph.chunk.fanOutOutgoing calls from this symbol (method-level fan-out)
codegraph.chunk.pageRankPageRank over the call graph (damping 0.85, normalized 0–1) — recursive importance
Why two fan-in's?

codegraph.file.fanIn and codegraph.chunk.fanIn measure different graphs — file imports vs. method call sites — so they are not interchangeable. A file with low import fan-in can still contain a method everyone calls. Standard alpha-blending between file and chunk does not apply to codegraph signals for this reason.

MCP Tools

When codegraph is enabled, three graph-query tools become available (they read the pre-computed DuckDB graph directly — no embedding, sub-millisecond):

ToolReturns
get_callersSymbols that invoke the given symbolId (who depends on this)
get_calleesSymbols invoked by the given symbolId (what this depends on)
find_cyclesStrongly-connected components (cycles ≥ 2) in the import graph (scope: "file") or call graph (scope: "method")

These pair naturally with find_symbol, which resolves a name to a symbolId using the same Class#method (instance) / Class.method (static) convention the codegraph tools consume.

Use Cases

What would break if I change this function? Show me its callers

Find the architectural hubs in this codebase

Are there any circular imports between modules?

Show me entry-point files nothing else imports from

What does this service depend on transitively?

Reranking Presets

Codegraph signals power composite presets that blend the structural graph with git history. These presets are only available when codegraph is enabled (they declare a requires dependency and are silently dropped otherwise):

PresetRequiresUse case
blastRadiuscodegraph + gitRank by how much a change ripples out (fan-in + transitive impact + churn)
architecturalHubcodegraph + gitFind the load-bearing files everything depends on
dangerouscodegraph + gitHigh blast radius and high bug-fix rate — change with care
entryPointcodegraphLeaf/entry files — natural starting points for onboarding

Enabling codegraph also upgrades the shared presets (hotspots, techDebt, codeReview, ownership, securityAudit) to composite versions that factor structural coupling into their scoring.

Scoring Weights Reference

Weight keys available for custom reranking (rerank: { "custom": { ... } }) when codegraph is enabled:

KeySignalScope
fanInNormalized files importing this filefile
fanOutNormalized files this file importsfile
fanOutPerLineEfferent coupling per line of codefile
instabilityMartin instability (already 0–1)file
isHub1 when file is a hub (fanIn > p95)file
isLeaf1 when file is a leaffile
transitiveImpactNormalized count of transitive importersfile
chunkFanInNormalized method-level fan-insymbol
chunkFanOutNormalized method-level fan-outsymbol
pageRankNormalized PageRank (recursive importance)symbol

Configuration

VariableDefaultDescription
CODEGRAPH_ENABLEDfalseMaster switch for the codegraph trajectory family (beta). true enables extraction, signals, and tools.
CODEGRAPH_DB_PATHdata dirOverride the graph-DB root directory. Per-project files at <rootDir>/codegraph/<collection>.duckdb.
CODEGRAPH_DB_MEMORY_LIMIT"2GB"Per-project DuckDB RAM ceiling before spilling to a temp dir (prevents OOM on large repos).
CODEGRAPH_DB_THREADS2DuckDB worker threads per project. The writer lock — not parallel scan — is the bottleneck, so more threads inflate memory without speeding up.
CODEGRAPH_EXCLUDE_TESTStrueExclude test files from the graph (still indexed by Qdrant; only graph extraction is gated). false includes tests in fan-graph / PageRank / cycles.
CODEGRAPH_CUSTOM_EXCLUDE(empty)Comma-separated .gitignore-shaped patterns added to the exclusion filter, e.g. vendor/**,generated/**,*.pb.go.
CODEGRAPH_AMBIGUOUS_RESOLVE_MODE"strict"How to resolve short-name calls matching multiple candidates. strict drops the edge unless exactly one match; first picks the first candidate (higher recall, more noise).

Next Steps