Skip to main content

Filters

Filters narrow search results by metadata — language, file path, code structure, git churn, authorship — before semantic ranking. Without filters, semantic search returns the best matches from the entire index. With filters, you restrict the candidate set first, then rank within that subset.

Why it matters: In a large codebase with thousands of chunks, a query like "error handling" could return results from tests, documentation, utilities, and production code all mixed together. Filters let you say "only TypeScript, only in src/api/, only recently changed" — so every result is relevant to the task at hand.

Filters work with all three search modes: Code Search, Semantic Search, and Hybrid Search.

Filter Syntax

TeaRAGs uses Qdrant's native filter syntax based on boolean logic. Every filter is an object with one or more boolean operators:

{
"must": [], // AND — all conditions must be true
"should": [], // OR — at least one condition must be true
"must_not": [] // NOT — none of the conditions may be true
}

You can combine all three in a single filter. Conditions inside must are ANDed together; conditions inside should are ORed.

Match Filter

Exact match on a metadata field. Use when you know the exact value — a specific language, a specific author, a chunk type.

{ "key": "language", "match": { "value": "typescript" } }

When to use: filtering by language, chunk type (function, class, interface), boolean flags (isDocumentation), or exact author name.

Text Match

Partial or substring match. Use when you want to match part of a string — a directory name inside a path, a keyword in a symbol name.

{ "key": "relativePath", "match": { "text": "auth" } }

When to use: filtering by path fragments when pathPattern glob syntax is not enough, or matching partial symbol names.

Range Filter

Numeric comparison. Use for metrics — commit counts, age in days, churn ratios, bug-fix percentages.

{ "key": "git.commitCount", "range": { "gte": 5, "lte": 50 } }

Available operators: gt (greater than), gte (greater or equal), lt (less than), lte (less or equal).

When to use: finding high-churn code, recent changes, old legacy code, ownership thresholds.

Practical Examples

Find TypeScript error handling

Narrow a semantic query to a single language:

{
"must": [{ "key": "language", "match": { "value": "typescript" } }]
}

Find error handling in TypeScript files only

Find functions in a specific directory

Combine path and structure filters to find only function-level chunks inside your API layer:

{
"must": [
{ "key": "relativePath", "match": { "text": "src/api" } },
{ "key": "chunkType", "match": { "value": "function" } }
]
}

Search across multiple languages (OR)

Use should to include results from several languages at once:

{
"should": [
{ "key": "language", "match": { "value": "typescript" } },
{ "key": "language", "match": { "value": "javascript" } }
]
}

Exclude test and documentation files

Remove noise from results when you only care about production code:

{
"must_not": [
{ "key": "isDocumentation", "match": { "value": true } },
{ "key": "relativePath", "match": { "text": "test" } }
]
}

Combined filter: recent TypeScript functions (AND + OR + NOT)

A real-world example — find recently changed functions in TypeScript or JavaScript, excluding docs:

{
"must": [
{ "key": "chunkType", "match": { "value": "function" } },
{ "key": "git.ageDays", "range": { "lte": 14 } }
],
"should": [
{ "key": "language", "match": { "value": "typescript" } },
{ "key": "language", "match": { "value": "javascript" } }
],
"must_not": [
{ "key": "isDocumentation", "match": { "value": true } }
]
}

Path Pattern Filtering

The pathPattern parameter provides glob-style file path filtering as a simpler alternative to constructing Qdrant path filters. Uses picomatch syntax.

When to use: restricting search to a specific directory, file type, or excluding certain paths. Simpler than building a Qdrant filter for path matching.

pathPattern: "**/workflow/**"        # All files in workflow directories
pathPattern: "src/**/*.ts" # TypeScript files in src/
pathPattern: "{models,services}/**" # Multiple directories at once
pathPattern: "!**/test/**" # Exclude test directories

Search for request validation in the API directory

Find authentication logic excluding test files

tip

pathPattern is available in all three search modes and is often the fastest way to scope a search. Use it before reaching for full Qdrant filter syntax.

Git Churn Filters

These filters require CODE_ENABLE_GIT_METADATA=true during indexing. See Git Enrichments for metric descriptions.

Finding high-churn code

High commit count signals frequently modified code — potential hotspots or areas under active development.

{ "key": "git.commitCount", "range": { "gte": 10 } }

Use case: identifying areas that change too often, candidates for stabilization or refactoring.

Finding high relative churn

Relative churn normalizes commit count by file size — a 50-line file with 20 commits is more concerning than a 2000-line file with the same count.

{ "key": "git.relativeChurn", "range": { "gte": 2.0 } }

Use case: a stronger signal for defect-prone code than raw commit count alone.

Finding recent changes

Filter by age to find code modified within a specific window — useful during incidents, code reviews, or sprint retrospectives.

{ "key": "git.ageDays", "range": { "lte": 7 } }

Show me code changed in the last week

Use case: incident response ("what changed recently near the failure?"), code review preparation.

Finding legacy code

Old code that hasn't been touched in months may need review, especially if it's in a critical path.

{ "key": "git.ageDays", "range": { "gte": 90 } }

Use case: tech debt discovery, security audit of stale code.

Finding buggy code

High bug-fix rate means a large percentage of commits to this file were fixes — a quality signal.

{ "key": "git.bugFixRate", "range": { "gte": 30 } }

Use case: quality assessment, identifying areas that need redesign rather than more patches.

Ownership filters

Single-owner code is a bus-factor risk. Dominant author percentage shows how concentrated knowledge is.

// Knowledge silos — one person owns 90%+ of commits
{ "key": "git.dominantAuthorPct", "range": { "gte": 90 } }

// Code by a specific author
{ "key": "git.dominantAuthor", "match": { "value": "Alice" } }

Find code with a single dominant author

Use case: knowledge transfer planning, onboarding risk assessment, identifying areas that need cross-training.

Chunk-level churn

Function-level churn is more precise than file-level — a stable file may contain one hot function.

// Hot functions (high function-level churn)
{ "key": "git.chunkCommitCount", "range": { "gte": 5 } }

// Functions that are mostly bug fixes
{ "key": "git.chunkBugFixRate", "range": { "gte": 50 } }

Use case: pinpointing the exact function that causes problems, not just the file.

Combined churn filters

Real-world scenarios often combine multiple signals:

Stable functions inside churny files — the function is reliable even though the file changes a lot:

{
"must": [
{ "key": "git.commitCount", "range": { "gte": 20 } },
{ "key": "git.chunkCommitCount", "range": { "lte": 3 } }
]
}

Old high-churn TypeScript code — tech debt candidates:

{
"must": [
{ "key": "git.ageDays", "range": { "gte": 90 } },
{ "key": "git.commitCount", "range": { "gte": 5 } },
{ "key": "language", "match": { "value": "typescript" } }
]
}

Show me high-churn code in the auth directory

Filterable Fields Reference

Code metadata

FieldTypeDescriptionExample use
relativePathstringRelative file pathFilter by directory or filename
fileExtensionstringFile extension (e.g., .ts)Target specific file types
languagestringProgramming languageNarrow to one language
startLineintegerChunk start lineFind chunks in a specific line range
endLineintegerChunk end lineFind chunks in a specific line range
chunkIndexintegerPosition within fileTarget the Nth chunk in a file
isDocumentationbooleanTrue for markdown, README, etc.Include or exclude docs

Chunk structure

FieldTypeDescriptionExample use
namestringSymbol name (e.g., MyClass)Find a specific named chunk
chunkTypestringfunction, class, interface, blockNarrow by code structure
parentNamestringParent class or module nameFind methods of a specific class
parentTypestringParent type (class, module, etc.)Find all class methods vs module functions
symbolIdstringUnique symbol identifierTarget exact symbol (e.g., MyClass.processData)
importsstring[]File-level importsFind files importing a specific module

Git metadata

Requires CODE_ENABLE_GIT_METADATA=true during indexing.

FieldTypeDescriptionExample use
git.commitCountintegerCommits touching this fileHigh-churn detection
git.ageDaysintegerDays since last modificationRecent changes or legacy code
git.relativeChurnnumberChurn normalized by file sizeStronger defect signal
git.bugFixRatenumberBug-fix percentage (0-100)Quality assessment
git.dominantAuthorstringAuthor with most commitsFilter by author
git.dominantAuthorPctnumberOwnership concentration (0-100)Knowledge silo detection
git.authorsstring[]All contributorsMulti-author queries
git.contributorCountintegerUnique author countBus factor analysis
git.taskIdsstring[]Ticket IDs (JIRA, GitHub, etc.)Trace code to tickets
git.lastModifiedAttimestampUnix timestamp of last changePrecise date filtering
git.firstCreatedAttimestampUnix timestamp of first commitFind when code was introduced
git.chunkCommitCountintegerCommits touching this chunkFunction-level churn
git.chunkChurnRationumberChunk's share of file churn (0-1)Hotspot within a file
git.chunkBugFixRatenumberChunk bug-fix rate (0-100)Function-level quality
git.chunkAgeDaysintegerDays since chunk was last modifiedFunction-level age

Filter + Rerank Combinations

Filters and reranking presets are complementary: filters narrow the candidate set, reranking scores relevance within it.

GoalFilterRerankWhy this combination
Recent bugs in authgit.ageDays <= 14 + pathPattern: **/auth/**hotspotsNarrow to recent auth code, then rank by bug signals
Old single-owner codegit.ageDays >= 90 + git.commitCount >= 5ownershipFind stale churny code, rank by knowledge concentration
Recently active TypeScriptlanguage: typescript + git.ageDays <= 30codeReviewScope to TS, rank by recent activity intensity
Large stable functionschunkType: function + git.commitCount <= 3onboardingFind reliable entry points for new team members
High-churn security codegit.commitCount >= 10 + security path patternsecurityAuditTarget volatile security-sensitive areas

Best Practices

  1. Start broad, then narrow — add filter conditions one at a time. Over-filtering returns zero results and gives no diagnostic signal.
  2. Use pathPattern for paths — simpler than constructing Qdrant path filters. Covers most directory and extension-based filtering.
  3. Combine filters with semantic search — filters narrow scope, vectors rank relevance. Neither alone is as powerful as both together.
  4. Use consistent types — don't pass a string where a number is expected. git.commitCount is an integer, not "5".
  5. Test filters incrementally — validate a simple filter works before building complex boolean logic.
  6. Prefer chunk-level git filtersgit.chunkCommitCount is more precise than git.commitCount for identifying problem spots.

Next Steps