Skip to main content

Git Enrichments

tea-rags enriches every indexed code chunk with 20 git-derived quality signals — churn, stability, authorship, bug-fix rates, code age — at function-level granularity. These signals power filtering and reranking, so your AI agent finds not just relevant code, but code that is stable, well-owned, and battle-tested.

Git enrichment runs concurrently with embedding and does not increase

indexing time. :::

Enabling Git Enrichment

Set the environment variable when configuring your MCP server:

claude mcp add tea-rags -s user -- node /path/to/tea-rags/build/index.js \
-e CODE_ENABLE_GIT_METADATA=true

What You Get

tea-rags computes metrics at two levels:

  1. File-level — shared by all chunks of a file (commitCount, relativeChurn, bugFixRate, authors, etc.)
  2. Chunk-level — per-function granularity (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, etc.)

For detailed metric definitions, formulas, and research context, see Code Churn: Theory & Research.

Metrics at a Glance

MetricLevelWhat it tells you
commitCountFileHow often this file changes
relativeChurnFileChurn normalized by file size (stronger defect signal)
recencyWeightedFreqFileRecent activity burst (exponential decay)
changeDensityFileCommits per month
churnVolatilityFileRegularity of changes (stddev of commit gaps)
bugFixRateFilePercentage of bug-fix commits (detection details)
contributorCountFileNumber of unique authors
dominantAuthorFileAuthor with most commits
dominantAuthorPctFileOwnership concentration (0-100)
ageDaysFileDays since last modification
taskIdsFileExtracted ticket IDs (JIRA, GitHub, etc.)
chunkCommitCountChunkCommits touching this specific function/block
chunkChurnRatioChunkThis chunk's share of file churn (0-1)
chunkContributorCountChunkAuthors who touched this chunk
chunkBugFixRateChunkBug-fix rate for this chunk specifically
chunkAgeDaysChunkDays since this chunk was last modified
chunkTaskIdsChunkTicket IDs from commits touching this chunk

Bug-Fix Commit Detection

bugFixRate and chunkBugFixRate rely on a multi-layered heuristic classification of commits as bug fixes. The detection uses two independent mechanisms that work together:

Layer 0: Merge Branch Resolution

Before analyzing individual commit messages, tea-rags identifies gitflow fix branches by inspecting merge commits and traversing the parent graph.

When a merge commit matches one of these patterns:

Merge branch 'fix/...'
Merge branch 'hotfix/...'
Merge branch 'bugfix/...'
Merge pull request #N from user/fix-...
Merge pull request #N from user/hotfix/...
Merge pull request #N from user/bugfix-...

All child commits reachable from the branch tip (second parent) are marked as bug-fix commits via BFS traversal. This is critical for gitflow workflows where child commits often don't contain "fix" in their message — e.g., refactor: extract validation inside a fix/TD-123-crash branch is correctly classified as a bug fix.

Layer 1: Commit Message Classification

Each non-merge commit is tested through a six-rule pipeline applied in order. The first matching rule wins.

Rule 1 — Skip merge commits:

/^Merge\b/i → return false

Merge commits are not classified by message — their branches are already resolved in Layer 0.

Rule 2 — Exclude cosmetic/infrastructure fixes (false positive filter):

Checked against the full commit body:

/\bfix(?:e[sd])?\s+(?:typo|lint|linter|format|formatting|style|whitespace|
indentation|imports?|tests?|specs?|flaky|rubocop|eslint|prettier|ci|
pipeline|migration|review|code\s*review|conflicts?)\b/i

/\btext\s+fix(?:es)?\b/i

These are not real bug fixes — they are maintenance commits that happen to contain the word "fix".

Rule 3 — Conventional commit prefix (subject line only):

/^(?:hot)?fix(?:\([^)]+\))?!?:/i

Matches: fix: ..., fix(auth): ..., hotfix: ..., fix(scope)!: ...

Rule 4 — Explicit tag (subject line only):

/^\[(?:Fix|Bug|Hotfix|Bugfix)\]/i

Matches: [Fix] null pointer, [Bug] race condition, [HOTFIX] production crash

Rule 5 — Ticket + Fix verb (subject line only):

/^\[?[A-Z]+-\d+\]?\s+(?:fix|fixed|fixes)\b/i

Matches: [TD-123] Fix crash on login, PROJ-456 fixed timeout, [ABC-789] fixes edge case

Rule 6 — GitHub/GitLab closing keywords (full body):

/\b(?:fix|fixe[sd]|resolve[sd]?|close[sd]?)\s+#\d+/i

Matches: fixes #123, resolves #456, closes #789, Resolved #42

Default: If no rule matches → not a bug fix.

Classification Examples

Commit messageDetected?Rule
fix: crash on null inputYesRule 3 — conventional prefix
fix(auth): token expirationYesRule 3 — conventional prefix
hotfix: urgent payment bugYesRule 3 — conventional prefix
[Fix] null pointer in handlerYesRule 4 — explicit tag
[Bug] race conditionYesRule 4 — explicit tag
[TD-123] Fix crash on loginYesRule 5 — ticket + fix verb
(body contains fixes #123)YesRule 6 — closing keyword
fix typo in readmeNoRule 2 — cosmetic exclusion
fix lint errorsNoRule 2 — cosmetic exclusion
fix testsNoRule 2 — cosmetic exclusion
fix code review commentsNoRule 2 — cosmetic exclusion
text fixesNoRule 2 — cosmetic exclusion
chore: update depsNoNo rule matched
Merge branch 'fix/auth'NoRule 1 — merge (but children are marked via Layer 0)
child commit inside fix/auth branchYesLayer 0 — merge branch resolution

Formula

bugFixRate uses Laplace smoothing with Jeffreys prior (α = 0.5) to handle small sample sizes:

bugFixRate = round(((bugFixCommits + 0.5) / (totalCommits + 1.0)) * 100)

This prevents extreme values: a file with 0 fixes out of 1 commit gets 33% (not 0%), while a file with 1 fix out of 1 commit gets 75% (not 100%). The smoothing effect diminishes as commit count grows.

Chunk-level: chunkBugFixRate uses the same detection logic, but only counts commits whose diff hunks overlap the chunk's line range. An offset tracker corrects for line drift caused by insertions/deletions above the chunk in earlier commits.

The detection is designed to minimize false positives. Cosmetic patterns

(fix typo, fix lint, fix tests, etc.) are explicitly excluded. Merge commits are handled separately via branch resolution — their child commits inherit the fix classification even when their individual messages don't mention "fix". :::

Use Cases

Show me files with high churn rate

Find code with a

single dominant author

What code changed in the last

week?

Find hot functions that change frequently

Show me legacy code with high bug-fix rates

For detailed scenarios — hotspot detection, knowledge silo analysis, tech debt assessment, incident-driven search, security audit, and more — see Git Enrichment Use Cases.

Reranking Presets

All presets automatically prefer chunk-level data when available (e.g., chunkCommitCount over commitCount for churn signals).

PresetSignalsUse case
hotspotschunkChurn + chunkRelativeChurn + burstActivity + bugFix + volatilityBug-prone areas at function granularity
techDebtage + churn + bugFix + volatilityLegacy assessment with fix-rate indicator
codeReviewrecency + burstActivity + density + chunkChurnRecent changes with activity intensity
stablelow churnReliable implementations
ownershipownership + knowledgeSiloKnowledge transfer, bus factor analysis
refactoringchunkChurn + relativeChurnNorm + chunkSize + volatility + bugFix + ageRefactor candidates at chunk level
securityAuditage + ownership + bugFix + pathRisk + volatilityOld critical code in sensitive paths
onboardingdocumentation + stabilityEntry points for new team members

Scoring Weights Reference

Available weight keys for custom reranking:

KeySignalSource
similarityEmbedding similarity scoreVector search
recencyInverse of ageDays (prefers chunk-level)git
stabilityInverse of commitCount (prefers chunk-level)git
churnDirect commitCount (prefers chunk-level)git
ageDirect ageDays (prefers chunk-level)git
ownershipAuthor concentration via dominantAuthorPctgit
chunkSizeLines of code in chunkchunk metadata
documentationIs documentation filechunk metadata
importsImport/dependency countfile metadata
bugFixbugFixRate (prefers chunk-level)git
volatilitychurnVolatility (stddev of commit gaps)git
densitychangeDensity (commits/month)git
chunkChurnchunkCommitCountgit chunk-level
relativeChurnNormrelativeChurn normalized (churn relative to file size)git
burstActivityrecencyWeightedFreq — recent burst of changesgit
pathRiskSecurity-sensitive path pattern match (0 or 1)file metadata
knowledgeSiloSingle-contributor flag (1 / 0.5 / 0)git
chunkRelativeChurnchunkChurnRatio — chunk's share of file churngit chunk-level

Environment Variables

Git enrichment configuration
VariableDefaultDescription
CODE_ENABLE_GIT_METADATA"false"Enable git enrichment during indexing
GIT_LOG_MAX_AGE_MONTHS12Time window for file-level git analysis (months). 0 = no age limit (safety depth still applies).
GIT_LOG_TIMEOUT_MS30000Timeout for isomorphic-git; falls back to native CLI on expiry
GIT_LOG_SAFETY_DEPTH10000Max commits for isomorphic-git depth and CLI --max-count
GIT_CHUNK_ENABLED"true"Enable chunk-level churn analysis
GIT_CHUNK_MAX_AGE_MONTHS6Time window for chunk-level churn analysis (months). 0 = no age limit.
GIT_CHUNK_CONCURRENCY10Parallel commit processing for chunk churn
GIT_CHUNK_MAX_FILE_LINES10000Skip files larger than this for chunk analysis

Next Steps