TeaRAGs

Trajectory Enrichment-Aware RAG system for Coding Agents 🦖🍵

A high-performance code RAG system exposed as an MCP server. Built for large monorepos and enterprise codebases (millions of LOC). Combines semantic retrieval with git-derived development history signals — authorship, churn, volatility, bug-fix rates — to rerank results beyond pure similarity.

How It Works

User → Agent calls TeaRAGs tools → TeaRAGs queries Qdrant + enriches results → Agent makes decisions

Why Trajectory Enrichment Awareness?

Trajectory Enrichment-Aware RAG is a new philosophy of code retrieval. Not an incremental improvement — a fundamental shift in what search results mean.

Standard code search finds code that looks like your query. It has no opinion on whether that code is good. TeaRAGs introduces a principle: every piece of retrieved code must carry its own history — who wrote it, how often it changed, how many times it was fixed, how stable it is, and why it exists. This transforms retrieval from pattern matching into evidence-based decision-making.

The result: 19 git-derived scoring signals per chunk, composable into reranking presets like hotspots, ownership, techDebt, and securityAudit. Code that is stable, well-owned, and battle-tested rises to the top. Code that is risky, volatile, and bug-prone gets flagged.

This enables agentic data-driven engineering — a paradigm where AI agents make code generation decisions backed by empirical evidence, not pattern matching intuition.

Why TeaRAGs?

55% fewer tool calls. 97% fewer fresh input tokens. 27.5% lower cost. And that's just semantic search over grep — before trajectory enrichment even kicks in.

With trajectory enrichment awareness, the agent goes further: it knows which code is stable and which is buggy, who owns each domain, which functions have a 0–20% bug-fix rate vs 50%+, and links every chunk to JIRA/GitHub tickets. All at function-level granularity — not just per file.

Isn't that awesome? Read the full breakdown: Agent on Grep vs Semantic Search vs TeaRAGs

Key Features

🧠 Intelligence layer for coding agents — makes your AI agent smarter by giving it empirical signals about code quality, ownership, and evolution. Not just "find similar code" but "find the right code to learn from"
📊 Agentic data-driven engineering — agents make code generation decisions backed by evidence (stable templates, anti-pattern avoidance, style matching, risk assessment), not pattern matching intuition
🧬 Git trajectory enrichment awareness — 19 git-derived signals per chunk (churn, volatility, authorship, bug-fix rate, task traceability) feed a composable reranking layer with presets like hotspots, ownership, techDebt, securityAudit
🔮 Topological trajectory enrichment awareness (planned) — symbol dependency graphs, cross-file coupling, blast radius analysis. The next dimension of code intelligence
🔍 Semantic & hybrid search — natural language queries with optional BM25 keyword matching and Reciprocal Rank Fusion
🎯 AST-aware chunking — tree-sitter parsing for functions, classes, methods across most popular languages including Ruby and Markdown
🚀 Built for scale — fast local indexing for enterprise codebases (millions of LOC), incremental reindexing, parallel pipelines
🔒 Privacy-first — works fully offline with Ollama, your code never leaves your machine
🔌 Provider agnostic — Ollama (local), OpenAI, Cohere, Voyage AI — swap without reindexing
⚙️ Highly configurable — fine-tune batch sizes, concurrency, caching. Auto-tuning benchmark included (npm run tune)

Getting Started

What is TeaRAGs — overview and key features
Core Concepts — vectorization, semantic search, trajectory enrichment awareness, reranking
Installation — prerequisites and setup
Connect to an Agent — configure Claude Code, Roo, or Cursor
Create Your First Index — index a codebase in one command
Your First Query — search your code with natural language

Documentation

Section	Description
Introduction	What TeaRAGs is, origin story, comparison, non-goals
Core Concepts	Code vectorization, semantic search, trajectory enrichment awareness, reranking
Quickstart	Get up and running in 15 minutes — 15-Minute Guide
Usage	Indexing repositories, query modes, use cases
Configuration	Environment variables, embedding providers, performance tuning
Git Enrichments	Git-derived quality signals: churn, ownership, stability, task IDs
Agent Workflows	Mental model, search strategies, deep analysis, data-driven code generation
Architecture	System design, pipelines, data model
Knowledge Base	RAG theory, code search, software evolution, blast radius, criticism & responses
Tools Schema	MCP tools, search parameters, reranking presets
Design Decisions	RFCs documenting key architectural choices
Operations	Troubleshooting, FAQ, recovery
Extending	Adding providers, custom chunkers, development setup
Roadmap	Future plans and open questions

Acknowledgments

Huge thanks to:

Martin Halder and the qdrant-mcp-server project for the solid foundation — a clean architecture, excellent documentation, the MIT license that made this fork possible, and the research on code indexing that paved the way
qdrant/mcp-server-qdrant — the ancestor of all forks
To Grandpa Docusaurus — for making beautiful, functional documentation effortless 📚

How It Works​

Why Trajectory Enrichment Awareness?​

Why TeaRAGs?​

Key Features​

Getting Started​

Documentation​

Acknowledgments​