Core Concepts
TeaRAGs transforms source code into searchable vector embeddings enriched with development history signals. Understanding these five layers is key to getting the most out of the system.
1. Code Vectorization
How source code becomes searchable. The indexing pipeline scans your project, splits code into semantic chunks using AST-aware parsers (tree-sitter), converts chunks into vector embeddings, and stores them in Qdrant. Incremental reindexing detects changes and updates only affected chunks.
2. Semantic Search
The foundation: finding code by intent and meaning, not exact keywords. Ask "how does authentication work?" and get the actual implementation, even if it's called Pipeline::StageClient. Supports hybrid search (semantic + BM25) for combining meaning-based and keyword-based retrieval.
3. Trajectory Enrichment Awareness
What makes TeaRAGs different from standard code RAG. Each chunk is augmented with 19 git-derived signals — churn, authorship, volatility, bug-fix rates, task traceability — at both file and chunk (function/method) granularity. This metadata enables quality-aware retrieval: find code that is not just similar, but also stable, well-owned, or risky.
4. Reranking
How trajectory signals are used at search time. Results from vector similarity are re-scored using composable weight presets (hotspots, ownership, techDebt, securityAudit, etc.) or custom weight configurations.
5. Agentic Data-Driven Engineering
Trajectory enrichment + reranking together enable a new paradigm: AI agents making code decisions backed by empirical evidence, not pattern matching intuition. Instead of copying the first search hit, an agent can:
- Find stable templates (
rerank: "stable") — low-bug, battle-tested code - Avoid anti-patterns (
rerank: "hotspots") — high-churn, bug-prone code - Match domain owner's style (
rerank: "ownership") — consistent conventions - Understand context via
taskIds— why the code exists - Assess risk (
rerank: "techDebt") — defensive patterns for legacy code
This transforms code generation from artistic guesswork into data-driven engineering.
👉 Agentic Data-Driven Engineering — full strategies, workflows, and the transformation table.