Skip to main content

Custom Rerank Strategies

Preset reranking covers common scenarios, but custom weights unlock precise, task-specific analysis. The key is combining orthogonal signals — each weight should add unique information that the others don't have.

Strategy 1: Multi-signal risk scoring

Combine orthogonal signals to create a composite risk score:

Find code ranked by a combination of churn, bug fixes, blast radius, and volatility

Tool parameters — "What should we test more?"
{
"rerank": {
"custom": {
"chunkChurn": 0.25,
"bugFix": 0.3,
"imports": 0.25,
"volatility": 0.2
}
}
}

Why this works: chunkChurn identifies hot functions, bugFix adds quality signal, imports adds blast radius, volatility adds unpredictability. Each signal adds information the others don't have.

Anti-pattern: Don't combine churn + chunkChurn + relativeChurnNorm — these are all churn variants and will just triple-weight the same underlying signal.

Strategy 2: Inverse scoring for safe code

Sometimes you need the opposite of a hotspot — stable, well-owned, low-bug code:

Find stable, battle-tested implementations with distributed ownership to use as reference

Tool parameters — "Safe reference code"
{
"rerank": {
"custom": {
"stability": 0.3,
"age": 0.2,
"similarity": 0.3,
"ownership": 0.2
}
}
}

Why this works: stability (inverse of commitCount) boosts low-churn code. age (direct ageDays) boosts old code — old + stable = battle-tested. ownership boosts distributed authorship — multiple people reviewed this code.

Strategy 3: Activity pulse

Track where active development is happening right now:

Find code with recent burst activity and high change density

Tool parameters — "Active development zones"
{
"rerank": {
"custom": {
"burstActivity": 0.4,
"density": 0.3,
"recency": 0.3
}
}
}

Why this works: burstActivity (recencyWeightedFreq) uses exponential decay — a commit from today counts ~10x more than one from 3 weeks ago. Combined with density (commits/month) and recency, this surfaces the most actively worked-on code.

Use case: Sprint planning — see where engineering effort is concentrated. Compare against roadmap priorities.

Strategy 4: Cross-signal anomaly detection

Find unusual patterns that individual presets miss:

Find code that is both a knowledge silo and a hotspot in security-sensitive paths

Tool parameters — "Dangerous silos"
{
"rerank": {
"custom": {
"knowledgeSilo": 0.3,
"chunkChurn": 0.25,
"bugFix": 0.25,
"pathRisk": 0.2
}
},
"pathPattern": "{**/auth/**,**/payment/**,**/crypto/**}"
}

Why this works: No single preset combines silo risk + churn + security path. Custom weights let you search for this specific intersection — the most dangerous code in the codebase.

Guidelines for building custom weights

GuidelineExplanation
Keep weights summing to ~1.0Weights are normalized internally, but 1.0 makes intent clearer
Use 3-5 signals maximumMore signals dilute each other — focus on what matters for this specific question
Don't overlap signalschurn + chunkChurn + density all measure change frequency — pick one
Include similarity at 0.2-0.4Unless doing pure metadata analysis, some semantic relevance prevents nonsensical matches
Test with metaOnly: true firstSee the scoring before downloading full code content

Signal Overlap Reference

Signals that measure similar things — avoid combining within the same custom rerank:

Signal groupMembersPick one
Churn frequencychurn, chunkChurn, density, burstActivitychunkChurn for function-level, burstActivity for recency-weighted
Churn magnituderelativeChurnNorm, chunkRelativeChurnchunkRelativeChurn for function-level
Age/freshnessage, recencyrecency if you want recent code, age if you want old code
Ownershipownership, knowledgeSiloknowledgeSilo for binary silo detection, ownership for gradient

Signals that are orthogonal and combine well:

Signal A+ Signal BCombined meaning
chunkChurnbugFixFrequently changed + mostly bug fixes = quality problem
knowledgeSiloimportsSingle owner + many dependents = dangerous silo
stabilityageLow churn + old = battle-tested code
burstActivitypathRiskRecent activity in security paths = needs review
chunkRelativeChurnvolatilityFunction absorbs disproportionate churn + irregular pattern = structural problem

Known Limitations

  1. Schema gap: The ScoringWeightsSchema in the MCP tool definitions does not yet expose the newer weight keys (relativeChurnNorm, burstActivity, pathRisk, knowledgeSilo, chunkRelativeChurn). Agents using preset strings are unaffected; agents constructing custom weights for these signals will need the schema updated.

  2. No cross-search chaining: Each search is independent. The agent must manually chain results from one search into filters for the next. There is no built-in "find all files that import results from my previous search."

  3. Git metadata required: All reranking presets except relevance require CODE_ENABLE_GIT_METADATA=true during indexing. Without git enrichment, non-relevance presets silently degrade to similarity-only scoring.

  4. Chunk-level data is partial: Chunk-level metrics (chunkCommitCount, chunkBugFixRate, etc.) are only available for files with multiple chunks and recent commits within the GIT_CHUNK_MAX_AGE_MONTHS window. Single-chunk files and old-only commits fall back to file-level metrics.

  5. No fan-in (importedBy) data yet: The current impactAnalysis preset uses only fan-out (imports count). Fan-in metrics and the blastRadius preset are planned.