Git Enrichments
tea-rags enriches every indexed code chunk with 19 git-derived quality signals — churn, stability, authorship, bug-fix rates, code age — at function-level granularity. These signals power filtering and reranking, so your AI agent finds not just relevant code, but code that is stable, well-owned, and battle-tested.
Git enrichment runs concurrently with embedding and does not increase indexing time.
Enabling Git Enrichment
Set the environment variable when configuring your MCP server:
claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \
-e CODE_ENABLE_GIT_METADATA=true
What You Get
tea-rags computes metrics at two levels:
- File-level — shared by all chunks of a file (commitCount, relativeChurn, bugFixRate, authors, etc.)
- Chunk-level — per-function granularity (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, etc.)
For detailed metric definitions, formulas, and research context, see Code Churn: Theory & Research.
Metrics at a Glance
| Metric | Level | What it tells you |
|---|---|---|
commitCount | File | How often this file changes |
relativeChurn | File | Churn normalized by file size (stronger defect signal) |
recencyWeightedFreq | File | Recent activity burst (exponential decay) |
changeDensity | File | Commits per month |
churnVolatility | File | Regularity of changes (stddev of commit gaps) |
bugFixRate | File | Percentage of bug-fix commits (detection details) |
contributorCount | File | Number of unique authors |
dominantAuthor | File | Author with most commits |
dominantAuthorPct | File | Ownership concentration (0-100) |
ageDays | File | Days since last modification |
taskIds | File | Extracted ticket IDs (JIRA, GitHub, etc.) |
chunkCommitCount | Chunk | Commits touching this specific function/block |
chunkChurnRatio | Chunk | This chunk's share of file churn (0-1) |
chunkContributorCount | Chunk | Authors who touched this chunk |
chunkBugFixRate | Chunk | Bug-fix rate for this chunk specifically |
chunkAgeDays | Chunk | Days since this chunk was last modified |
Bug-Fix Commit Detection
bugFixRate and chunkBugFixRate rely on heuristic classification of commits as bug fixes. The detection works as follows:
Pattern: Each commit message is tested against the regex:
/\b(fix|bug|hotfix|patch|resolve[sd]?|defect)\b/i
This matches whole words only (word boundaries \b prevent false positives like "prefix" or "bugle"). The match is case-insensitive and checks the full commit body — not just the subject line.
Merge commit filtering: Commits whose subject line starts with Merge (e.g., Merge branch 'fix/auth', Merge pull request #42) are excluded from bug-fix detection. The rationale: a merge commit referencing a fix branch is not itself a fix — the actual fix commit within the branch is already counted separately. Without this filter, every merged fix branch would be double-counted.
What matches:
| Commit message | Detected? | Why |
|---|---|---|
fix: resolve crash on login | Yes | "fix" in subject |
hotfix: emergency patch for payments | Yes | "hotfix" in subject |
Resolved issue with timeout | Yes | "Resolved" matches resolve[sd]? |
Bug in date parsing | Yes | "Bug" matches |
chore: update deps | No | No bug-fix keywords |
Merge branch 'fix/auth' | No | Merge commit — skipped |
Merge pull request #42 from user/fix-auth | No | Merge commit — skipped |
chore: update auth\nfix: also resolve login bug | Yes | "fix" found on 2nd line (full body is checked) |
Formula:
bugFixRate = round((bugFixCommits / totalCommits) * 100)
Where bugFixCommits is the count of non-merge commits matching the pattern. The result is an integer percentage (0-100).
Chunk-level: chunkBugFixRate uses the same detection logic, but only counts commits whose diff hunks overlap the chunk's line range.
The pattern is intentionally broad — it catches conventional commits (fix: ...), free-form messages (fixed the bug), and ticket-driven messages (resolve TD-123 defect). False positive rate is low due to word boundary matching.
Use Cases
Show me files with high churn rate
Find code with a single dominant author
What code changed in the last week?
Find hot functions that change frequently
Show me legacy code with high bug-fix rates
For detailed scenarios — hotspot detection, knowledge silo analysis, tech debt assessment, incident-driven search, security audit, and more — see Git Enrichment Use Cases.
Reranking Presets
All presets automatically prefer chunk-level data when available (e.g., chunkCommitCount over commitCount for churn signals).
| Preset | Signals | Use case |
|---|---|---|
hotspots | chunkChurn + chunkRelativeChurn + burstActivity + bugFix + volatility | Bug-prone areas at function granularity |
techDebt | age + churn + bugFix + volatility | Legacy assessment with fix-rate indicator |
codeReview | recency + burstActivity + density + chunkChurn | Recent changes with activity intensity |
stable | low churn | Reliable implementations |
ownership | ownership + knowledgeSilo | Knowledge transfer, bus factor analysis |
refactoring | chunkChurn + relativeChurnNorm + chunkSize + volatility + bugFix + age | Refactor candidates at chunk level |
securityAudit | age + ownership + bugFix + pathRisk + volatility | Old critical code in sensitive paths |
impactAnalysis | similarity + imports | Dependency analysis |
onboarding | documentation + stability | Entry points for new team members |
Scoring Weights Reference
Available weight keys for custom reranking:
| Key | Signal | Source |
|---|---|---|
similarity | Embedding similarity score | Vector search |
recency | Inverse of ageDays (prefers chunk-level) | git |
stability | Inverse of commitCount (prefers chunk-level) | git |
churn | Direct commitCount (prefers chunk-level) | git |
age | Direct ageDays (prefers chunk-level) | git |
ownership | Author concentration via dominantAuthorPct | git |
chunkSize | Lines of code in chunk | chunk metadata |
documentation | Is documentation file | chunk metadata |
imports | Import/dependency count | file metadata |
bugFix | bugFixRate (prefers chunk-level) | git |
volatility | churnVolatility (stddev of commit gaps) | git |
density | changeDensity (commits/month) | git |
chunkChurn | chunkCommitCount | git chunk-level |
relativeChurnNorm | relativeChurn normalized (churn relative to file size) | git |
burstActivity | recencyWeightedFreq — recent burst of changes | git |
pathRisk | Security-sensitive path pattern match (0 or 1) | file metadata |
knowledgeSilo | Single-contributor flag (1 / 0.5 / 0) | git |
chunkRelativeChurn | chunkChurnRatio — chunk's share of file churn | git chunk-level |
Environment Variables
Git enrichment configuration
| Variable | Default | Description |
|---|---|---|
CODE_ENABLE_GIT_METADATA | "false" | Enable git enrichment during indexing |
GIT_LOG_MAX_AGE_MONTHS | 12 | Time window for file-level git analysis (months). 0 = no age limit (safety depth still applies). |
GIT_LOG_TIMEOUT_MS | 30000 | Timeout for isomorphic-git; falls back to native CLI on expiry |
GIT_LOG_SAFETY_DEPTH | 10000 | Max commits for isomorphic-git depth and CLI --max-count |
GIT_CHUNK_ENABLED | "true" | Enable chunk-level churn analysis |
GIT_CHUNK_MAX_AGE_MONTHS | 6 | Time window for chunk-level churn analysis (months). 0 = no age limit. |
GIT_CHUNK_CONCURRENCY | 10 | Parallel commit processing for chunk churn |
GIT_CHUNK_MAX_FILE_LINES | 10000 | Skip files larger than this for chunk analysis |
Next Steps
- Filters — filter syntax, git churn filters, filterable fields reference
- Code Churn: Theory & Research — metric formulas, research basis, and academic references
- Git Enrichment Pipeline — architecture, design decisions, and performance characteristics
- Search Strategies — how agents use reranking presets for different tasks
- Configuration Variables — full list of all configuration options