Agentic Data-Driven Engineering
Standard code RAG retrieves by similarity: "find code that looks like X." The agent copies the first match without knowing if that code is stable, bug-prone, or has been rewritten five times. With trajectory enrichment, every search result carries 19 git-derived quality signals — and the agent can reason about what to copy, what to avoid, and how to generate before writing a single line.
This page is the practical guide: the five strategies, generation modes, danger zone checks, ripgrep verification, and ready-to-paste agent configurations. For the mental model behind this approach, see How to Think with TeaRAGs. For metric interpretation and threshold tables, see Deep Codebase Analysis. For preset/tool selection and custom weight strategies, see Search Strategies.
The Five Strategies
1. Template Selection — What to Copy
The first search hit is not the best template. A function with 60% bug-fix rate "looks right" semantically but is the worst possible example to copy. Template selection should be driven by chunk-level quality signals, not similarity alone.
Why chunk-level, not file-level: A file may have commitCount = 25 and bugFixRate = 35% — looks risky. But inside it, formatReceipt() has chunkCommitCount = 1, chunkBugFixRate = 0%, chunkAgeDays = 120. That function is a perfect template. File-level metrics would have hidden it. Always prefer chunk-level signals for template evaluation — see decision guide.
Find stable, low-churn implementations of request validation to use as a template
Tool parameters
{
"query": "request validation pattern",
"rerank": "stable",
"pathPattern": "src/services/**",
"limit": 10
}
Quality criteria for a good template:
| Signal | Good template | Mediocre | Avoid |
|---|---|---|---|
chunkCommitCount | 1-3 | 4-7 | > 8 |
chunkBugFixRate | 0-15% | 15-35% | > 40% |
chunkAgeDays | > 60 | 30-60 | < 14 |
churnVolatility | < 5 | 5-15 | > 20 |
dominantAuthorPct | > 70% | 40-70% | < 40% |
The ideal template: Low churn + old age + low bug rate + strong ownership = code that was written well the first time, survived production, and has a consistent style to follow.
2. Anti-Pattern Detection — What NOT to Copy
Equally important: find code that should not be used as a reference. These are functions that look right semantically but have a history of instability.
Find high-churn, frequently-patched implementations of job processing
Tool parameters
{
"query": "job processing workflow",
"rerank": "hotspots",
"limit": 10
}
Red flags in search results:
| Signal | Threshold | Problem |
|---|---|---|
chunkBugFixRate > 50% | More than half of commits are fixes | Structurally fragile — patches don't stick |
chunkChurnRatio > 0.7 | One function absorbs 70%+ of file churn | This function is the root cause of the file's instability |
churnVolatility > 20 | Irregular bursts of patching | Reactive maintenance — "we only touch it when it breaks" |
relativeChurn > 5.0 | Code has been rewritten multiple times | Design problem, not implementation problem |
When the agent encounters an anti-pattern, it should note the specific issues — nested feature-flag checks, complex branching, mixed responsibilities — and explicitly avoid replicating them in generated code.
3. Style Consistency — Match the Domain Owner
Code generated in a vacuum passes tests but fails code review. The dominantAuthor signal tells the agent whose patterns to match.
Find the dominant author and coding patterns in the workflow pipeline services
Tool parameters
{
"query": "workflow pipeline service",
"rerank": "ownership",
"pathPattern": "src/services/workflow/**",
"metaOnly": true,
"limit": 15
}
How to use ownership signals:
| Ownership profile | Agent behavior |
|---|---|
dominantAuthorPct > 80%, contributorCount = 1 | Strong silo — match this author's style exactly. Don't refactor, don't introduce new patterns. The author will review this code. |
dominantAuthorPct 50-80%, contributorCount = 2-3 | Clear owner — follow the dominant author's patterns with minor flexibility. |
dominantAuthorPct < 40%, contributorCount > 4 | Shared area — generate neutral, conventional code. Follow project-wide conventions rather than any single author's style. Opportunity to introduce unifying patterns. |
What to match from the domain owner:
- Error handling patterns (custom error classes vs generic rescue)
- Parameter extraction style (constructor vs method)
- Transaction boundaries and nesting
- Naming conventions for the specific domain
4. Historical Context — Why Code Exists
taskIds extracted from commit messages reveal the feature's evolution — critical context that similarity search cannot provide.
Show metadata for workflow job services to understand feature context through ticket IDs
Tool parameters
{
"query": "workflow job service",
"pathPattern": "src/services/workflow/jobs/**",
"metaOnly": true,
"limit": 15
}
Interpreting taskIds patterns:
| Pattern | taskIds | Commits | Bug rate | Meaning | Agent action |
|---|---|---|---|---|---|
| Simple feature | 1 | 1-2 | < 10% | Built once, works | Safe to refactor freely |
| Evolving feature | 4+ | 5+ | 15-30% | Multiple iterations | Read all tickets before modifying |
| Bug-prone feature | 1 | 3+ | > 50% | Harder than expected | Add defensive tests, check edge cases |
| Coordinated change | Shared across files | Any | Any | Multi-file feature | Modify all related files together |
Shared taskIds across files are the strongest coordination signal. If move.rb and update.rb share taskId TD-72004, changes to one likely require changes to the other. The agent should flag this before generating code that touches only one of the pair.
5. Risk Assessment — How to Modify Safely
When modifying existing code with high-risk signals, the agent should switch from "write the best code" to "write the safest change."
Find legacy code with high bug-fix rates in the recurrence processing area
Tool parameters
{
"query": "recurrence processing job creation",
"rerank": "techDebt",
"limit": 10
}
Risk indicators and defensive actions:
| Risk profile | Signals | Defensive strategy |
|---|---|---|
| Legacy fragile | ageDays > 90 + bugFixRate > 40% + commitCount > 5 | Use wrapper pattern — keep old code intact, add new code path behind feature flag. Gradual rollout: 10% → 50% → 100%. |
| Knowledge silo + churn | dominantAuthorPct > 85% + chunkCommitCount > 5 | Request review from the owner before merging. Do not restructure without knowledge transfer. |
| High blast radius | imports count > 10 + chunkBugFixRate > 30% | Stabilize first — fix existing bugs before adding features. Run full integration tests. |
| Pathological churn | churnVolatility > 30 + bugFixRate > 40% | This code needs redesign, not patching. Propose a rewrite plan rather than incremental changes. |
Never directly modify code with ageDays > 90 + bugFixRate > 50% + relativeChurn > 5.0. This code has been rewritten multiple times and keeps breaking. Any new patch has a high probability of introducing another bug. Use a wrapper pattern or propose a rewrite.