Skip to main content
TeaRAGs

TeaRAGs

Trajectory Enrichment-Aware RAG system for Coding Agents 🦖🍵

A high-performance code RAG system exposed as an MCP server. Built for large monorepos and enterprise codebases (millions of LOC). Combines semantic retrieval with git-derived development history signals — authorship, churn, volatility, bug-fix rates — to rerank results beyond pure similarity.

How It Works

User → Agent calls TeaRAGs tools → TeaRAGs queries Qdrant + enriches results → Agent makes decisions

Why Trajectory Enrichment Awareness?

Trajectory Enrichment-Aware RAG is a new philosophy of code retrieval. Not an incremental improvement — a fundamental shift in what search results mean.

Standard code search finds code that looks like your query. It has no opinion on whether that code is good. TeaRAGs introduces a principle: every piece of retrieved code must carry its own history — who wrote it, how often it changed, how many times it was fixed, how stable it is, and why it exists. This transforms retrieval from pattern matching into evidence-based decision-making.

The result: 19 git-derived scoring signals per chunk, composable into reranking presets like hotspots, ownership, techDebt, and securityAudit. Code that is stable, well-owned, and battle-tested rises to the top. Code that is risky, volatile, and bug-prone gets flagged.

This enables agentic data-driven engineering — a paradigm where AI agents make code generation decisions backed by empirical evidence, not pattern matching intuition.

Why TeaRAGs?

55% fewer tool calls. 97% fewer fresh input tokens. 27.5% lower cost. And that's just semantic search over grep — before trajectory enrichment even kicks in.

With trajectory enrichment awareness, the agent goes further: it knows which code is stable and which is buggy, who owns each domain, which functions have a 0–20% bug-fix rate vs 50%+, and links every chunk to JIRA/GitHub tickets. All at function-level granularity — not just per file.

Isn't that awesome? Read the full breakdown: Agent on Grep vs Semantic Search vs TeaRAGs

Key Features

  • 🧠 Intelligence layer for coding agents — makes your AI agent smarter by giving it empirical signals about code quality, ownership, and evolution. Not just "find similar code" but "find the right code to learn from"
  • 📊 Agentic data-driven engineering — agents make code generation decisions backed by evidence (stable templates, anti-pattern avoidance, style matching, risk assessment), not pattern matching intuition
  • 🧬 Git trajectory enrichment awareness — 19 git-derived signals per chunk (churn, volatility, authorship, bug-fix rate, task traceability) feed a composable reranking layer with presets like hotspots, ownership, techDebt, securityAudit
  • 🔮 Topological trajectory enrichment awareness (planned) — symbol dependency graphs, cross-file coupling, blast radius analysis. The next dimension of code intelligence
  • 🔍 Semantic & hybrid search — natural language queries with optional BM25 keyword matching and Reciprocal Rank Fusion
  • 🎯 AST-aware chunking — tree-sitter parsing for functions, classes, methods across most popular languages including Ruby and Markdown
  • 🚀 Built for scale — fast local indexing for enterprise codebases (millions of LOC), incremental reindexing, parallel pipelines
  • 🔒 Privacy-first — works fully offline with Ollama, your code never leaves your machine
  • 🔌 Provider agnostic — Ollama (local), OpenAI, Cohere, Voyage AI — swap without reindexing
  • ⚙️ Highly configurable — fine-tune batch sizes, concurrency, caching. Auto-tuning benchmark included (npm run tune)

Getting Started

Documentation

SectionDescription
IntroductionWhat TeaRAGs is, origin story, comparison, non-goals
Core ConceptsCode vectorization, semantic search, trajectory enrichment awareness, reranking
QuickstartGet up and running in 15 minutes — 15-Minute Guide
UsageIndexing repositories, query modes, use cases
ConfigurationEnvironment variables, embedding providers, performance tuning
Git EnrichmentsGit-derived quality signals: churn, ownership, stability, task IDs
Agent WorkflowsMental model, search strategies, deep analysis, data-driven code generation
ArchitectureSystem design, pipelines, data model
Knowledge BaseRAG theory, code search, software evolution, blast radius, criticism & responses
Tools SchemaMCP tools, search parameters, reranking presets
Design DecisionsRFCs documenting key architectural choices
OperationsTroubleshooting, FAQ, recovery
ExtendingAdding providers, custom chunkers, development setup
RoadmapFuture plans and open questions

Acknowledgments

Huge thanks to:

  • Martin Halder and the qdrant-mcp-server project for the solid foundation — a clean architecture, excellent documentation, the MIT license that made this fork possible, and the research on code indexing that paved the way
  • qdrant/mcp-server-qdrant — the ancestor of all forks
  • To Grandpa Docusaurus — for making beautiful, functional documentation effortless 📚