How to Think with TeaRAGs
TeaRAGs is not a search engine with extra metadata. It's a system that changes how coding agents should reason about which code to trust. This page explains the shift.
Why This Page Exists
Most developers and agents treat code retrieval as a similarity problem: "find code that looks like my query." TeaRAGs adds a second dimension — code evolution — and this requires a different mental model. Without it, you'll use TeaRAGs as a fancy grep and miss the point entirely.
Traditional RAG Mindset
In standard code RAG, the retrieval loop is:
- Embed the query
- Find the most similar chunks
- Inject them into context
- Generate code
The optimization target is relevance: how closely does the retrieved code match the query? The implicit assumption is that similar code is useful code.
This works until it doesn't. The first search hit might be:
- A prototype someone abandoned
- A pattern that was reverted three times
- Code written by an intern, rewritten by a senior, then rewritten again
- A function that technically does what you want but breaks every sprint
Similarity tells you nothing about any of this.
Trajectory-Aware Mindset
TeaRAGs shifts the optimization target from "find similar code" to "find code that improves the agent's decision quality."
Every retrieved chunk carries 19 git-derived signals — churn, stability, authorship, bug-fix rates, code age, task references. These signals encode how code has evolved, not just what it looks like right now.
The thinking patterns change:
| Traditional RAG | Trajectory-Aware RAG |
|---|---|
| "Find code that looks like X" | "Find code that looks like X and has survived production" |
| "Copy the first match" | "Copy the match with the lowest bug-fix rate" |
| "Any example will do" | "Find the domain owner's implementation" |
| "This code is relevant" | "This code is relevant and stable — or relevant and volatile (which is a signal too)" |
| "Ignore code history" | "High churn = treat as anti-pattern, not template" |
The key insight: volatile code is not noise to be filtered out — it's a signal. A function with 12 commits and a 60% bug-fix rate tells the agent something important: don't copy this, don't extend this without understanding why it keeps breaking.
How Agents Should Reason
An agent using TeaRAGs effectively follows three steps:
1. Explore Context
Search semantically to understand the landscape. Use rerank: "relevance" first — find what exists.
2. Evaluate Signals
Before using any result, check the trajectory signals:
- Low churn + old age → stable pattern, safe to copy
- High churn + high bug-fix rate → anti-pattern, study but don't replicate
- Single dominant author → domain expert's style, match it
- Multiple task IDs → code evolved through many requirements, understand them before modifying
- Recent + high commit count → active development area, coordinate with the team
3. Select for Decision Quality
Choose context that leads to better decisions, not just more context. Three chunks of battle-tested code are more valuable than twenty chunks of relevant-but-unknown-quality code.
Decision Hierarchy
When an agent receives search results, signals should be evaluated in this order:
1. Similarity — is this code relevant to my task?
2. Trajectory — is this code stable, owned, and low-risk?
3. Impact — if I base my code on this, what's the blast radius?
Similarity is the entry filter. Trajectory signals determine trust. Impact determines caution.
A result that scores high on similarity but low on stability is a warning, not a template. A result that scores moderately on similarity but high on stability and clear ownership is often the better choice.
Git Data Depth
TeaRAGs collects git history at two different granularities, each with its own time window:
| Level | Variable | Default | What It Controls |
|---|---|---|---|
| File-level | GIT_LOG_MAX_AGE_MONTHS | 12 months | git log analysis: commit counts, authors, task IDs per file |
| Chunk-level | GIT_CHUNK_MAX_AGE_MONTHS | 6 months | git blame analysis: per-function churn, volatility, bug-fix rates |
The defaults are deliberately generous. Research on code churn and defect prediction consistently shows that shorter windows (2–6 months) capture the most actionable signals, while longer windows add historical context at the cost of noise from resolved issues.
- Nagappan & Ball (2005) demonstrated that relative code churn measures — especially when normalized by time — are strong predictors of defect density. Their "weeks of churn / file count" metric highlights the importance of temporal extent in churn analysis. (IEEE ICSE 2005)
- Adam Tornhill ("Your Code as a Crime Scene", "Software Design X-Rays") recommends 2–3 month windows as a practical heuristic for hotspot and temporal coupling analysis — old data from resolved issues can interfere with current analysis. (adamtornhill.com, Pragmatic Bookshelf)
- GitClear (2024–2025) analyzed 211M changed lines and found that code revised within two weeks of commit is a reliable quality signal — supporting the idea that shorter, focused windows detect instability better than full-history analysis. (GitClear Research)
Why File-Level Metrics Matter for Tech Debt
The 12-month file-level window (GIT_LOG_MAX_AGE_MONTHS) serves a different purpose than chunk-level churn. While chunk-level signals tell you which functions are unstable, file-level signals reveal structural tech debt — the kind that accumulates silently and surfaces as friction during development.
TeaRAGs computes these file-level metrics from git log:
| Metric | What It Reveals |
|---|---|
commitCount | How often the file changes — high values indicate a coordination bottleneck |
relativeChurn | (linesAdded + linesDeleted) / currentLines — how much of the file has been rewritten |
changeDensity | commits / months — average change frequency over the analysis window |
churnVolatility | stddev(days between commits) — erratic patterns suggest reactive patching |
bugFixRate | % of commits with fix/bug/hotfix keywords — direct measure of defect density |
contributorCount | Number of unique authors — high counts + high churn = coordination cost |
dominantAuthorPct | How concentrated ownership is — low % on a high-churn file = no one owns the debt |
taskIds | Ticket references from commits — traces debt back to business decisions |
The research behind this approach:
- Tornhill's hotspot model identifies tech debt as the product of complexity and change frequency — a complex file that rarely changes is low priority, but a complex file that changes weekly is the most expensive debt in your system. (CodeScene: Technical Debt)
- CodeScene's Code Health metric (1–10 scale) combines behavioral signals (churn, coupling) with structural ones (complexity) to prioritize which debt to pay first — proving that file-level git metrics are sufficient for actionable debt ranking even without parsing the code. (CodeScene: Code Health)
- GitClear's tech debt model tracks "recurrently active" files (modified 2+ times per month across multiple months) as the strongest file-level predictor of future defects — more reliable than raw churn alone. (GitClear: Measuring Tech Debt)
- Gartner (2024) recognized behavioral code analysis (churn + complexity product) as one of five tool categories for measuring and monitoring tech debt. (Gartner Report via CodeScene)
An agent using rerank: "techDebt" leverages these file-level signals to surface files where debt is highest. Combined with metaOnly: true, it can build a tech debt report without reading a single line of code — purely from evolution signals.
Adjusting Thresholds
The defaults (6 months chunk / 12 months file) work well for most codebases. But agents can reason about whether to adjust:
- Young codebase (<1 year) — defaults cover the entire history, no adjustment needed
- Mature codebase (5+ years) — defaults already filter out ancient history; if signals feel noisy, reduce
GIT_CHUNK_MAX_AGE_MONTHSto 3 - High-velocity team (daily deploys) — consider reducing chunk window to 3 months for sharper hotspot detection
- Legacy migration — increase
GIT_LOG_MAX_AGE_MONTHSto 24+ to capture the full migration arc
The reranker normalizes all signals relative to the analysis window. Changing the window doesn't break presets — it shifts what "old" and "high churn" mean within that window.
The Common Mistake
The most common mistake is using TeaRAGs as plain semantic search — ignoring the enrichment signals and treating results as a flat ranked list.
If you're not using rerank presets, not reading bugFixRate or commitCount in results, not distinguishing between stable and volatile code — you're paying the cost of trajectory enrichment without getting the benefit. You'd get the same results from any vector search tool.
TeaRAGs becomes valuable when the agent reasons about the signals, not when it merely retrieves more context.
Key Takeaway
Traditional RAG asks: "What code looks like what I need?"
TeaRAGs asks: "What code looks like what I need, has proven itself in production, is owned by someone who knows the domain, and won't introduce the same bugs that have already been fixed three times?"
The difference is not in retrieval quality. It's in decision quality.