Skip to main content

Deep Codebase Analysis

TeaRAGs exposes git-derived signals at two granularity levels — file and chunk (function). Understanding when to use which level is the key to meaningful analysis. This page covers metric interpretation, threshold tables, and decision frameworks — what the numbers mean and how to read them.

For which tools and presets to use for each task, see Search Strategies. For how agents should use these signals during code generation, see Agentic Data-Driven Engineering.

File-Level vs Chunk-Level Metrics: When to Use Each

Every indexed chunk carries both file-level and chunk-level git metrics. They measure different things and answer different questions.

File-level metrics

File-level metrics (commitCount, relativeChurn, bugFixRate, ageDays, dominantAuthor) describe the file as a whole. All chunks within the same file share identical file-level values.

Use file-level metrics when:

  • Scanning for general hotspots — "which files change most?" is a coarse but fast signal. A file with commitCount >= 20 is worth investigating further.
  • Ownership analysisdominantAuthor and contributorCount are inherently file-scoped. Git tracks commits per file, not per function.
  • Relative churn assessmentrelativeChurn (lines changed / file size) is the strongest single defect predictor according to Nagappan & Ball (2005). It normalizes for file size, so a 50-line file with 100 lines changed (relativeChurn = 2.0) ranks higher than a 2000-line file with the same changes (relativeChurn = 0.05).
  • Task traceabilitytaskIds are extracted from commit messages at file level.
  • Legacy code discoveryageDays at file level tells you when the file was last touched, regardless of which function inside it changed.

Limitations: A 500-line file with 30 commits may have one function that absorbed 28 of them. File-level commitCount = 30 makes the whole file look churny, but only one function is the problem. You need chunk-level metrics to see this.

Chunk-level metrics

Chunk-level metrics (chunkCommitCount, chunkChurnRatio, chunkBugFixRate, chunkAgeDays) describe a specific function, method, or code block within a file. They are computed by mapping diff hunks to chunk line ranges.

Use chunk-level metrics when:

  • Pinpointing the exact problemchunkCommitCount tells you which function inside a churny file is actually causing the churn. A file with commitCount = 25 might have one function with chunkCommitCount = 22 and another with chunkCommitCount = 1.
  • Refactoring prioritizationchunkChurnRatio (chunk commits / file commits) close to 1.0 means this one function is responsible for nearly all of the file's churn. That function is the refactoring target, not the file.
  • Function-level bug densitychunkBugFixRate at 60% means most commits to this specific function were bug fixes. The file-level bugFixRate might be only 30% because other functions dilute the signal.
  • Stable code inside unstable fileschunkAgeDays = 180 inside a file with ageDays = 2 means this function hasn't been touched in 6 months, even though the file was modified yesterday. This function is stable and reliable as a template.

Limitations: Chunk-level metrics require the GIT_CHUNK_ENABLED=true setting (on by default) and only cover commits within the GIT_CHUNK_MAX_AGE_MONTHS window (default: 6 months). Older commits fall back to file-level data.

Decision guide

QuestionUseKey metric
Which files change most?FilecommitCount, relativeChurn
Which function changes most?ChunkchunkCommitCount, chunkChurnRatio
Is this file a defect predictor?FilerelativeChurn (Nagappan: 89% accuracy)
Is this function buggy?ChunkchunkBugFixRate
Who owns this area?FiledominantAuthor, dominantAuthorPct
Who last touched this function?ChunkchunkAgeDays, chunkContributorCount
Is the churn healthy or pathological?BothCompare commitCount vs bugFixRate — high commits + low bugfix = healthy iteration; high commits + high bugfix = pathological
What should I refactor first?ChunkchunkChurnRatio + chunkBugFixRate + chunk size