Skip to main content

Core Concepts

TeaRAGs transforms source code into searchable vector embeddings enriched with development history signals. Understanding these five layers is key to getting the most out of the system.

1. Code Vectorization

How source code becomes searchable. The indexing pipeline scans your project, splits code into semantic chunks using AST-aware parsers (tree-sitter), converts chunks into vector embeddings, and stores them in Qdrant. Incremental reindexing detects changes and updates only affected chunks.

The foundation: finding code by intent and meaning, not exact keywords. Ask "how does authentication work?" and get the actual implementation, even if it's called Pipeline::StageClient. Supports hybrid search (semantic + BM25) for combining meaning-based and keyword-based retrieval.

3. Trajectory Enrichment Awareness

What makes TeaRAGs different from standard code RAG. Each chunk is augmented with 19 git-derived signals — churn, authorship, volatility, bug-fix rates, task traceability — at both file and chunk (function/method) granularity. This metadata enables quality-aware retrieval: find code that is not just similar, but also stable, well-owned, or risky.

4. Reranking

How trajectory signals are used at search time. Results from vector similarity are re-scored using composable weight presets (hotspots, ownership, techDebt, securityAudit, etc.) or custom weight configurations.

5. Agentic Data-Driven Engineering

Trajectory enrichment + reranking together enable a new paradigm: AI agents making code decisions backed by empirical evidence, not pattern matching intuition. Instead of copying the first search hit, an agent can:

  • Find stable templates (rerank: "stable") — low-bug, battle-tested code
  • Avoid anti-patterns (rerank: "hotspots") — high-churn, bug-prone code
  • Match domain owner's style (rerank: "ownership") — consistent conventions
  • Understand context via taskIds — why the code exists
  • Assess risk (rerank: "techDebt") — defensive patterns for legacy code

This transforms code generation from artistic guesswork into data-driven engineering.

👉 Agentic Data-Driven Engineering — full strategies, workflows, and the transformation table.

How It All Fits Together