Origin

Why TeaRAGs Exists

TeaRAGs was born out of a real engineering pain point. Working in a large enterprise monorepo — 3.5 million lines of code, years of history, dozens of contributors — using AI coding agents without semantic code search was an exercise in frustration. The agent couldn't find the right code, copied wrong patterns, and had no understanding of code quality or ownership.

I wanted to fix this — not just for myself, but for my team. The goal was twofold: make the AI agent actually useful in a large, actively growing codebase, and test a hypothesis I'd been carrying for a while — what if search results carried git-derived quality signals like churn, authorship, and bug-fix rates? Would that meaningfully improve the agent's code generation decisions?

Spoiler: it did. The trajectory enrichment layer turned out to be a significant improvement over pure similarity search. Agents started finding stable templates instead of buggy ones, matching the domain owner's style, and avoiding high-churn code.

But getting here was a journey.

The Journey

Discovering Semantic Search

I started using Claude Code in May–June 2025. The experience was, to put it bluntly, frustrating. Token limits were eaten in minutes, results were poor, and context was constantly lost. I found myself writing prompts that were 60–70% defensive instructions — "Never do X", "Don't search Y", "Stop reading Z". I cancelled my subscription, disappointed in the promise of agentic coding.

Everything changed in August when RooCode shipped Codebase Indexes — their built-in semantic search feature. I restored my Claude subscription to use as a provider for Roo, set up Qdrant in Docker alongside Ollama for local embeddings, and started indexing our monorepo.

For the next five months, I actively used semantic search in my daily workflow. The difference was night and day. It allowed me to:

Instantly understand unfamiliar code — no more spelunking through hundreds of files
Diagnose complex bugs in minutes — find related error handling, retries, edge cases across the codebase
Generate code that follows project style without extra instructions — critical for Ruby, where convention matters enormously
Efficiently solve pattern-search tasks — "find all places where we handle X" became trivial

The Breakup

By January 2026, I was a RooCode power user. The Roo + Claude + RAG setup was my primary work tool. I'd built muscle memory around it, refined my CLAUDE.md instructions, and was shipping features faster than ever.

Then Anthropic started banning accounts that used Claude Code subscriptions through third-party harnesses (Hacker News discussion). Claude stopped working as a Roo provider. The RooCode community immediately opened issues requesting native Claude Code integration and OAuth support, but my workflow — the one I'd spent months building — broke overnight.

Searching for Alternatives

I evaluated the options methodically:

Switch to another agent — KiloCode was the closest alternative, but compliance concerns made it a non-starter in an enterprise environment.
Replace the provider in Roo — either switch to Anthropic API (pay-as-you-go costs run roughly 10x a Claude Code Pro subscription) or use alternative models like Gemini and DeepSeek, which were more affordable but nowhere near Anthropic's quality for code generation.
Use an MCP server for semantic search — bring the RAG capability directly into Claude Code CLI.

I rejected the first two quickly. I didn't want to risk a permanent ban on my corporate Anthropic account by playing cat-and-mouse with harness detection. And paying API rates for the volume I was doing would have been prohibitively expensive.

The final option was the only viable path: set up a standalone semantic search MCP server that works natively with Claude Code.

Market Research

I surveyed the MCP landscape for codebase indexing solutions. Most were research-grade experiments, not ready for enterprise scale (1M+ LOC):

Solution	Description	Verdict
claude-context	Most popular at the time. Fully cloud-based, no incremental indexing	Cloud dependency, doesn't scale for enterprise
grepai	Go-based, file-system watchers, multi-provider embeddings, GPU support, beginner-friendly	No AST chunking, embedded storage only — not enterprise-ready
quad-rag-code	Python, GPU acceleration, auto-watcher incremental reindex	Research-grade, Python-only, no embedding provider choice
rag-code-mcp	Go, Qdrant + Ollama	No incremental indexing, no Ruby support
mhalder/qdrant-mcp-server	Node.js, incremental indexing, Qdrant + Ollama, claimed Ruby support	Best match for my requirements

Industry leaders — Cursor, RooCode, KiloCode, Sourcegraph, GitHub — all had built-in semantic search. But standalone MCP solutions were sparse and immature. The gap was obvious: nobody had built a production-grade, local-first semantic search MCP that could handle enterprise codebases.

Testing mhalder's Solution

I spent about a week testing mhalder/qdrant-mcp-server with various parameter combinations against our enterprise monorepo. Full indexing of the 3.5M+ LOC codebase took 4 to 10 hours depending on configuration. Even incremental reindexing on roughly 100 commits (about 1.5 days of team output) couldn't go faster than 40 minutes.

What was good:

Clean, well-tested codebase with solid architecture
MIT license — freedom to fork and modify
Qdrant + Ollama foundation matched my local-first requirement
The incremental indexing concept was already there

What needed work:

Performance was the main blocker — and the bottleneck was architectural, not hardware
Ruby AST support was nominal — the parser was listed but didn't actually work with tree-sitter for Ruby
No git metadata enrichment at all — pure similarity search, blind to code history
Poor DX — every parameter required deep ML knowledge, no intuitive defaults for developers
Lacking ADX (Agentic Developer eXperience) — Claude frequently made errors calling the tools due to unclear parameter schemas
Missing MCP Tools API parameters for a complete search workflow

The conclusion was clear: MCP semantic search tools at that point were enthusiast experiments, not enterprise-ready solutions.

The Fork Decision

My observation was simple: "The problem isn't hardware — it's architecture."

Rather than contributing upstream — MIT license bureaucracy plus stakeholder alignment is not the path when motivation is high and the vision diverges significantly — I decided to fork and build what I needed.

There's also a deeper reason. In my career, I've repeatedly encountered engineers who prefer simplicity over complexity — and for good reason. But in this case, complexity was unavoidable. Achieving enterprise-scale performance required fundamental architectural changes: parallel request patterns to Ollama, backpressure-aware pipelines, sharded snapshot storage, consistent hashing for change detection. I wasn't going to spend time convincing stakeholders why the entire core needed to be rewritten with "complex patterns."

My view on simplicity: the decision must be balanced. In my experience, system complexity often emerges from an excess of simplicity. It's easier to encapsulate caching in one complex abstraction than to understand how it works when it's scattered in small pieces across every component. Isn't it? Localized complexity with clear boundaries beats distributed simplicity with hidden coupling.

Core values for the fork:

Local-first — Ollama + Qdrant as the foundation, cloud optional
DX focus — intuitive for developers, not ML engineers
ADX focus (Agentic Developer eXperience) — intuitive for coding agents, not just humans. Inspired by Steve Yegge's Desire Paths philosophy: when an agent hallucinates a command that doesn't exist, implement it — the hallucination is a feature request written in the language of attempted use
Performance — enterprise-scale indexing in reasonable time on average MacBooks
eGPU support — leverage external GPUs on the local network for embedding acceleration
The name must be typeable with one left hand — inspired by asdf runtime manager. This consumed roughly 90% of the naming budget. The result: TeaRAGs — a T-Rex who drinks tea while doing RAG. Because if you're going to index 3.5 million lines of code, you might as well be civilized about it. 🦖🍵

What Was Achieved

Key achievements in the fork:

Modular architecture — clear separation of concerns across indexing, chunking, enrichment, and search
Improved MCP Tools Schema — better agent interaction, dramatically fewer Claude errors when calling tools
Full Ruby AST parsing — proper tree-sitter chunking for Ruby, not just line-based splitting
Markdown AST chunking — documentation becomes semantically searchable alongside code
Git trajectory enrichment — 19 signals per chunk with no significant indexing performance loss
Git-aware filtering and reranking — find fresh code, stable code, hotspots, ownership patterns
The name — typeable with one left hand. Worth it.

Performance: From 4 Hours to 12 Minutes

The most critical achievement was indexing performance. For a 3.5M+ LOC enterprise codebase:

Setup	Full Index Time	Notes
mhalder/qdrant-mcp-server	4–10 hours	Various parameter combinations
RooCode (eGPU AMD 7800M)	25–30 min	40–50% GPU utilization
RooCode (MacBook M3 Pro)	1.5–2 hours	Severe system lag during indexing
TeaRAGs (eGPU AMD 7800M)	12–14 min	~55 chunks/sec, 85–90% GPU utilization
TeaRAGs (MacBook M3 Pro)	~17–20 min	~60% of eGPU throughput

Incremental reindexing for ~100 commits: 30–40 seconds on MacBook, compared to 40+ minutes before.

The full index wait happens only once. After that, incremental reindexing keeps the index fresh in seconds. The difference between waiting 4 hours and waiting 12 minutes is the difference between "I'll index tonight and check tomorrow" and "let me index while I grab coffee."

What Changed (Detailed)

Feature	Original	TeaRAGs
Snapshot storage	Single JSON file	Sharded storage (v3)
Change detection	Sequential	Parallel (N workers)
Hash distribution	—	Consistent hashing
Merkle tree	Single level	Two-level (shard + meta)
Delete operations	Filter scan	Payload index (1000x faster)
Batch pipeline	Sequential	Parallel with backpressure
Checkpointing	—	Resume from interruption
Git metadata	—	19 signals per chunk
Reranking layer	—	9 presets + custom weights
Chunk-level churn	—	Per-function/method granularity
Task ID extraction	—	JIRA, GitHub, Azure DevOps, GitLab
Ruby/Markdown AST	—	Full tree-sitter support
Concurrency control	Fixed	Configurable via env
Performance benchmarks	—	Auto-tuning included (`npm run tune`)
Cache compatibility	—	Auto-migration between versions

Acknowledgments

Huge thanks to Martin Halder and the qdrant-mcp-server project for the solid foundation — a clean architecture, excellent documentation, and the MIT license that made this fork possible. And to the ancestor of all forks — qdrant/mcp-server-qdrant.

Why TeaRAGs Exists​

The Journey​

Discovering Semantic Search​

The Breakup​

Searching for Alternatives​

Market Research​

Testing mhalder's Solution​

The Fork Decision​

What Was Achieved​

Performance: From 4 Hours to 12 Minutes​

What Changed (Detailed)​

Acknowledgments​