Skip to main content

Git Enrichments

tea-rags enriches every indexed code chunk with 19 git-derived quality signals — churn, stability, authorship, bug-fix rates, code age — at function-level granularity. These signals power filtering and reranking, so your AI agent finds not just relevant code, but code that is stable, well-owned, and battle-tested.

tip

Git enrichment runs concurrently with embedding and does not increase indexing time.

Enabling Git Enrichment

Set the environment variable when configuring your MCP server:

claude mcp add tea-rags -s user -- node /path/to/tea-rags-mcp/build/index.js \
-e CODE_ENABLE_GIT_METADATA=true

What You Get

tea-rags computes metrics at two levels:

  1. File-level — shared by all chunks of a file (commitCount, relativeChurn, bugFixRate, authors, etc.)
  2. Chunk-level — per-function granularity (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, etc.)

For detailed metric definitions, formulas, and research context, see Code Churn: Theory & Research.

Metrics at a Glance

MetricLevelWhat it tells you
commitCountFileHow often this file changes
relativeChurnFileChurn normalized by file size (stronger defect signal)
recencyWeightedFreqFileRecent activity burst (exponential decay)
changeDensityFileCommits per month
churnVolatilityFileRegularity of changes (stddev of commit gaps)
bugFixRateFilePercentage of bug-fix commits (detection details)
contributorCountFileNumber of unique authors
dominantAuthorFileAuthor with most commits
dominantAuthorPctFileOwnership concentration (0-100)
ageDaysFileDays since last modification
taskIdsFileExtracted ticket IDs (JIRA, GitHub, etc.)
chunkCommitCountChunkCommits touching this specific function/block
chunkChurnRatioChunkThis chunk's share of file churn (0-1)
chunkContributorCountChunkAuthors who touched this chunk
chunkBugFixRateChunkBug-fix rate for this chunk specifically
chunkAgeDaysChunkDays since this chunk was last modified

Bug-Fix Commit Detection

bugFixRate and chunkBugFixRate rely on heuristic classification of commits as bug fixes. The detection works as follows:

Pattern: Each commit message is tested against the regex:

/\b(fix|bug|hotfix|patch|resolve[sd]?|defect)\b/i

This matches whole words only (word boundaries \b prevent false positives like "prefix" or "bugle"). The match is case-insensitive and checks the full commit body — not just the subject line.

Merge commit filtering: Commits whose subject line starts with Merge (e.g., Merge branch 'fix/auth', Merge pull request #42) are excluded from bug-fix detection. The rationale: a merge commit referencing a fix branch is not itself a fix — the actual fix commit within the branch is already counted separately. Without this filter, every merged fix branch would be double-counted.

What matches:

Commit messageDetected?Why
fix: resolve crash on loginYes"fix" in subject
hotfix: emergency patch for paymentsYes"hotfix" in subject
Resolved issue with timeoutYes"Resolved" matches resolve[sd]?
Bug in date parsingYes"Bug" matches
chore: update depsNoNo bug-fix keywords
Merge branch 'fix/auth'NoMerge commit — skipped
Merge pull request #42 from user/fix-authNoMerge commit — skipped
chore: update auth\nfix: also resolve login bugYes"fix" found on 2nd line (full body is checked)

Formula:

bugFixRate = round((bugFixCommits / totalCommits) * 100)

Where bugFixCommits is the count of non-merge commits matching the pattern. The result is an integer percentage (0-100).

Chunk-level: chunkBugFixRate uses the same detection logic, but only counts commits whose diff hunks overlap the chunk's line range.

info

The pattern is intentionally broad — it catches conventional commits (fix: ...), free-form messages (fixed the bug), and ticket-driven messages (resolve TD-123 defect). False positive rate is low due to word boundary matching.

Use Cases

Show me files with high churn rate

Find code with a single dominant author

What code changed in the last week?

Find hot functions that change frequently

Show me legacy code with high bug-fix rates

For detailed scenarios — hotspot detection, knowledge silo analysis, tech debt assessment, incident-driven search, security audit, and more — see Git Enrichment Use Cases.

Reranking Presets

All presets automatically prefer chunk-level data when available (e.g., chunkCommitCount over commitCount for churn signals).

PresetSignalsUse case
hotspotschunkChurn + chunkRelativeChurn + burstActivity + bugFix + volatilityBug-prone areas at function granularity
techDebtage + churn + bugFix + volatilityLegacy assessment with fix-rate indicator
codeReviewrecency + burstActivity + density + chunkChurnRecent changes with activity intensity
stablelow churnReliable implementations
ownershipownership + knowledgeSiloKnowledge transfer, bus factor analysis
refactoringchunkChurn + relativeChurnNorm + chunkSize + volatility + bugFix + ageRefactor candidates at chunk level
securityAuditage + ownership + bugFix + pathRisk + volatilityOld critical code in sensitive paths
impactAnalysissimilarity + importsDependency analysis
onboardingdocumentation + stabilityEntry points for new team members

Scoring Weights Reference

Available weight keys for custom reranking:

KeySignalSource
similarityEmbedding similarity scoreVector search
recencyInverse of ageDays (prefers chunk-level)git
stabilityInverse of commitCount (prefers chunk-level)git
churnDirect commitCount (prefers chunk-level)git
ageDirect ageDays (prefers chunk-level)git
ownershipAuthor concentration via dominantAuthorPctgit
chunkSizeLines of code in chunkchunk metadata
documentationIs documentation filechunk metadata
importsImport/dependency countfile metadata
bugFixbugFixRate (prefers chunk-level)git
volatilitychurnVolatility (stddev of commit gaps)git
densitychangeDensity (commits/month)git
chunkChurnchunkCommitCountgit chunk-level
relativeChurnNormrelativeChurn normalized (churn relative to file size)git
burstActivityrecencyWeightedFreq — recent burst of changesgit
pathRiskSecurity-sensitive path pattern match (0 or 1)file metadata
knowledgeSiloSingle-contributor flag (1 / 0.5 / 0)git
chunkRelativeChurnchunkChurnRatio — chunk's share of file churngit chunk-level

Environment Variables

Git enrichment configuration
VariableDefaultDescription
CODE_ENABLE_GIT_METADATA"false"Enable git enrichment during indexing
GIT_LOG_MAX_AGE_MONTHS12Time window for file-level git analysis (months). 0 = no age limit (safety depth still applies).
GIT_LOG_TIMEOUT_MS30000Timeout for isomorphic-git; falls back to native CLI on expiry
GIT_LOG_SAFETY_DEPTH10000Max commits for isomorphic-git depth and CLI --max-count
GIT_CHUNK_ENABLED"true"Enable chunk-level churn analysis
GIT_CHUNK_MAX_AGE_MONTHS6Time window for chunk-level churn analysis (months). 0 = no age limit.
GIT_CHUNK_CONCURRENCY10Parallel commit processing for chunk churn
GIT_CHUNK_MAX_FILE_LINES10000Skip files larger than this for chunk analysis

Next Steps