Git Enrichments

tea-rags enriches every indexed code chunk with 20 git-derived quality signals — churn, stability, authorship, bug-fix rates, code age — at function-level granularity. These signals power filtering and reranking, so your AI agent finds not just relevant code, but code that is stable, well-owned, and battle-tested.

Git enrichment runs concurrently with embedding and does not increase

indexing time. :::

Enabling Git Enrichment

Set the environment variable when configuring your MCP server:

claude mcp add tea-rags -s user -- node /path/to/tea-rags/build/index.js \
  -e CODE_ENABLE_GIT_METADATA=true

What You Get

tea-rags computes metrics at two levels:

File-level — shared by all chunks of a file (commitCount, relativeChurn, bugFixRate, authors, etc.)
Chunk-level — per-function granularity (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, etc.)

For detailed metric definitions, formulas, and research context, see Code Churn: Theory & Research.

Metrics at a Glance

Metric	Level	What it tells you
`commitCount`	File	How often this file changes
`relativeChurn`	File	Churn normalized by file size (stronger defect signal)
`recencyWeightedFreq`	File	Recent activity burst (exponential decay)
`changeDensity`	File	Commits per month
`churnVolatility`	File	Regularity of changes (stddev of commit gaps)
`bugFixRate`	File	Percentage of bug-fix commits (detection details)
`contributorCount`	File	Number of unique authors
`dominantAuthor`	File	Author with most commits
`dominantAuthorPct`	File	Ownership concentration (0-100)
`ageDays`	File	Days since last modification
`taskIds`	File	Extracted ticket IDs (JIRA, GitHub, etc.)
`chunkCommitCount`	Chunk	Commits touching this specific function/block
`chunkChurnRatio`	Chunk	This chunk's share of file churn (0-1)
`chunkContributorCount`	Chunk	Authors who touched this chunk
`chunkBugFixRate`	Chunk	Bug-fix rate for this chunk specifically
`chunkAgeDays`	Chunk	Days since this chunk was last modified
`chunkTaskIds`	Chunk	Ticket IDs from commits touching this chunk

Bug-Fix Commit Detection

bugFixRate and chunkBugFixRate rely on a multi-layered heuristic classification of commits as bug fixes. The detection uses two independent mechanisms that work together:

Layer 0: Merge Branch Resolution

Before analyzing individual commit messages, tea-rags identifies gitflow fix branches by inspecting merge commits and traversing the parent graph.

When a merge commit matches one of these patterns:

Merge branch 'fix/...'
Merge branch 'hotfix/...'
Merge branch 'bugfix/...'
Merge pull request #N from user/fix-...
Merge pull request #N from user/hotfix/...
Merge pull request #N from user/bugfix-...

All child commits reachable from the branch tip (second parent) are marked as bug-fix commits via BFS traversal. This is critical for gitflow workflows where child commits often don't contain "fix" in their message — e.g., refactor: extract validation inside a fix/TD-123-crash branch is correctly classified as a bug fix.

Layer 1: Commit Message Classification

Each non-merge commit is tested through a six-rule pipeline applied in order. The first matching rule wins.

Rule 1 — Skip merge commits:

/^Merge\b/i → return false

Merge commits are not classified by message — their branches are already resolved in Layer 0.

Rule 2 — Exclude cosmetic/infrastructure fixes (false positive filter):

Checked against the full commit body:

/\bfix(?:e[sd])?\s+(?:typo|lint|linter|format|formatting|style|whitespace|
  indentation|imports?|tests?|specs?|flaky|rubocop|eslint|prettier|ci|
  pipeline|migration|review|code\s*review|conflicts?)\b/i

/\btext\s+fix(?:es)?\b/i

These are not real bug fixes — they are maintenance commits that happen to contain the word "fix".

Rule 3 — Conventional commit prefix (subject line only):

/^(?:hot)?fix(?:\([^)]+\))?!?:/i

Matches: fix: ..., fix(auth): ..., hotfix: ..., fix(scope)!: ...

Rule 4 — Explicit tag (subject line only):

/^\[(?:Fix|Bug|Hotfix|Bugfix)\]/i

Matches: [Fix] null pointer, [Bug] race condition, [HOTFIX] production crash

Rule 5 — Ticket + Fix verb (subject line only):

/^\[?[A-Z]+-\d+\]?\s+(?:fix|fixed|fixes)\b/i

Matches: [TD-123] Fix crash on login, PROJ-456 fixed timeout, [ABC-789] fixes edge case

Rule 6 — GitHub/GitLab closing keywords (full body):

/\b(?:fix|fixe[sd]|resolve[sd]?|close[sd]?)\s+#\d+/i

Matches: fixes #123, resolves #456, closes #789, Resolved #42

Default: If no rule matches → not a bug fix.

Classification Examples

Commit message	Detected?	Rule
`fix: crash on null input`	Yes	Rule 3 — conventional prefix
`fix(auth): token expiration`	Yes	Rule 3 — conventional prefix
`hotfix: urgent payment bug`	Yes	Rule 3 — conventional prefix
`[Fix] null pointer in handler`	Yes	Rule 4 — explicit tag
`[Bug] race condition`	Yes	Rule 4 — explicit tag
`[TD-123] Fix crash on login`	Yes	Rule 5 — ticket + fix verb
(body contains `fixes #123`)	Yes	Rule 6 — closing keyword
`fix typo in readme`	No	Rule 2 — cosmetic exclusion
`fix lint errors`	No	Rule 2 — cosmetic exclusion
`fix tests`	No	Rule 2 — cosmetic exclusion
`fix code review comments`	No	Rule 2 — cosmetic exclusion
`text fixes`	No	Rule 2 — cosmetic exclusion
`chore: update deps`	No	No rule matched
`Merge branch 'fix/auth'`	No	Rule 1 — merge (but children are marked via Layer 0)
child commit inside `fix/auth` branch	Yes	Layer 0 — merge branch resolution

Formula

bugFixRate uses Laplace smoothing with Jeffreys prior (α = 0.5) to handle small sample sizes:

bugFixRate = round(((bugFixCommits + 0.5) / (totalCommits + 1.0)) * 100)

This prevents extreme values: a file with 0 fixes out of 1 commit gets 33% (not 0%), while a file with 1 fix out of 1 commit gets 75% (not 100%). The smoothing effect diminishes as commit count grows.

Chunk-level: chunkBugFixRate uses the same detection logic, but only counts commits whose diff hunks overlap the chunk's line range. An offset tracker corrects for line drift caused by insertions/deletions above the chunk in earlier commits.

The detection is designed to minimize false positives. Cosmetic patterns

(fix typo, fix lint, fix tests, etc.) are explicitly excluded. Merge commits are handled separately via branch resolution — their child commits inherit the fix classification even when their individual messages don't mention "fix". :::

Use Cases

Show me files with high churn rate

Find code with a
single dominant author

What code changed in the last
week?

Find hot functions that change frequently

Show me legacy code with high bug-fix rates

For detailed scenarios — hotspot detection, knowledge silo analysis, tech debt assessment, incident-driven search, security audit, and more — see Git Enrichment Use Cases.

Reranking Presets

All presets automatically prefer chunk-level data when available (e.g., chunkCommitCount over commitCount for churn signals).

Preset	Signals	Use case
`hotspots`	chunkChurn + chunkRelativeChurn + burstActivity + bugFix + volatility	Bug-prone areas at function granularity
`techDebt`	age + churn + bugFix + volatility	Legacy assessment with fix-rate indicator
`codeReview`	recency + burstActivity + density + chunkChurn	Recent changes with activity intensity
`stable`	low churn	Reliable implementations
`ownership`	ownership + knowledgeSilo	Knowledge transfer, bus factor analysis
`refactoring`	chunkChurn + relativeChurnNorm + chunkSize + volatility + bugFix + age	Refactor candidates at chunk level
`securityAudit`	age + ownership + bugFix + pathRisk + volatility	Old critical code in sensitive paths
`onboarding`	documentation + stability	Entry points for new team members

Scoring Weights Reference

Available weight keys for custom reranking:

Key	Signal	Source
`similarity`	Embedding similarity score	Vector search
`recency`	Inverse of ageDays (prefers chunk-level)	git
`stability`	Inverse of commitCount (prefers chunk-level)	git
`churn`	Direct commitCount (prefers chunk-level)	git
`age`	Direct ageDays (prefers chunk-level)	git
`ownership`	Author concentration via dominantAuthorPct	git
`chunkSize`	Lines of code in chunk	chunk metadata
`documentation`	Is documentation file	chunk metadata
`imports`	Import/dependency count	file metadata
`bugFix`	bugFixRate (prefers chunk-level)	git
`volatility`	churnVolatility (stddev of commit gaps)	git
`density`	changeDensity (commits/month)	git
`chunkChurn`	chunkCommitCount	git chunk-level
`relativeChurnNorm`	relativeChurn normalized (churn relative to file size)	git
`burstActivity`	recencyWeightedFreq — recent burst of changes	git
`pathRisk`	Security-sensitive path pattern match (0 or 1)	file metadata
`knowledgeSilo`	Single-contributor flag (1 / 0.5 / 0)	git
`chunkRelativeChurn`	chunkChurnRatio — chunk's share of file churn	git chunk-level

Environment Variables

Git enrichment configuration

Variable	Default	Description
`CODE_ENABLE_GIT_METADATA`	`"false"`	Enable git enrichment during indexing
`GIT_LOG_MAX_AGE_MONTHS`	`12`	Time window for file-level git analysis (months). `0` = no age limit (safety depth still applies).
`GIT_LOG_TIMEOUT_MS`	`30000`	Timeout for isomorphic-git; falls back to native CLI on expiry
`GIT_LOG_SAFETY_DEPTH`	`10000`	Max commits for isomorphic-git `depth` and CLI `--max-count`
`GIT_CHUNK_ENABLED`	`"true"`	Enable chunk-level churn analysis
`GIT_CHUNK_MAX_AGE_MONTHS`	`6`	Time window for chunk-level churn analysis (months). `0` = no age limit.
`GIT_CHUNK_CONCURRENCY`	`10`	Parallel commit processing for chunk churn
`GIT_CHUNK_MAX_FILE_LINES`	`10000`	Skip files larger than this for chunk analysis

Next Steps

Filters — filter syntax, git churn filters, filterable fields reference
Code Churn: Theory & Research — metric formulas, research basis, and academic references
Git Enrichment Pipeline — architecture, design decisions, and performance characteristics
Search Strategies — how agents use reranking presets for different tasks
Configuration Variables — full list of all configuration options

Enabling Git Enrichment​

What You Get​

Metrics at a Glance​

Bug-Fix Commit Detection​

Layer 0: Merge Branch Resolution​

Layer 1: Commit Message Classification​

Classification Examples​

Formula​

Use Cases​

Reranking Presets​

Scoring Weights Reference​

Environment Variables​

Next Steps​