Adding Embedding Providers
How to add a new embedding provider (e.g. a new cloud API, a different local runtime, or a private in-house model service). Five providers ship built-in — ONNX, Ollama, OpenAI, Cohere, Voyage — all implementing the same contract.
The EmbeddingProvider Interface
Source: src/core/adapters/embeddings/base.ts
interface EmbeddingProvider {
embed: (text: string) => Promise<EmbeddingResult>;
embedBatch: (texts: string[]) => Promise<EmbeddingResult[]>;
getDimensions: () => number;
getModel: () => string;
/** Lightweight health check — returns true if provider is reachable. */
checkHealth: () => Promise<boolean>;
/** Provider identifier (e.g. "ollama", "onnx", "openai"). */
getProviderName: () => string;
/** Base URL for remote providers. Undefined for local (e.g. ONNX). */
getBaseUrl?: () => string;
/** Resolve model capabilities (context length, dimensions) from provider API. */
resolveModelInfo?: () => Promise<{ model: string; contextLength: number; dimensions: number } | undefined>;
}
Required methods: embed, embedBatch, getDimensions, getModel, checkHealth, getProviderName.
Optional: getBaseUrl (for remote APIs), resolveModelInfo (for model-aware sizing).
Implementation Checklist
1. Create the adapter class
Place under src/core/adapters/embeddings/{provider-name}.ts. Mirror the existing cloud adapters (OpenAI is the simplest template).
// src/core/adapters/embeddings/acme.ts
import Bottleneck from "bottleneck";
import { EmbeddingProvider, EmbeddingResult, RateLimitConfig } from "./base.js";
import { retryWithBackoff } from "./retry.js";
import { AcmeRateLimitError, AcmeAuthError } from "./acme/errors.js";
export class AcmeEmbeddings implements EmbeddingProvider {
private readonly limiter: Bottleneck;
constructor(
private readonly apiKey: string,
private readonly model: string,
private readonly dimensions: number,
private readonly rateLimit: RateLimitConfig,
private readonly baseUrl = "https://api.acme.example.com/v1",
) {
this.limiter = new Bottleneck({
reservoir: rateLimit.maxRequestsPerMinute,
reservoirRefreshAmount: rateLimit.maxRequestsPerMinute,
reservoirRefreshInterval: 60_000,
});
}
async embed(text: string): Promise<EmbeddingResult> {
const [result] = await this.embedBatch([text]);
return result;
}
async embedBatch(texts: string[]): Promise<EmbeddingResult[]> {
return this.retryWithBackoff(() => this.limiter.schedule(() => this.rawEmbed(texts)));
}
getDimensions() { return this.dimensions; }
getModel() { return this.model; }
getProviderName() { return "acme"; }
getBaseUrl() { return this.baseUrl; }
async checkHealth() {
try { await this.rawEmbed(["health check"]); return true; }
catch { return false; }
}
// ── Private ────────────────────────────────────────────
private async rawEmbed(texts: string[]): Promise<EmbeddingResult[]> { /* HTTP call */ }
private async retryWithBackoff<T>(fn: () => Promise<T>): Promise<T> { /* see existing retry.ts */ }
}
2. Register in the factory
Edit src/core/adapters/embeddings/factory.ts → EmbeddingProviderFactory.create:
case "acme":
if (!config.acmeApiKey) {
throw new ConfigValueMissingError("apiKey", "ACME_API_KEY");
}
return new AcmeEmbeddings(
config.acmeApiKey,
model || "acme-embed-v1",
dimensions,
rateLimitConfig,
);
Add "acme" to the ConfigValueInvalidError allowed-values list at the bottom of the switch.
3. Wire the config
Edit src/bootstrap/config/parse.ts to read ACME_API_KEY → EmbeddingConfig.acmeApiKey. Follow the pattern for openaiApiKey, cohereApiKey, etc.
Add EMBEDDING_PROVIDER=acme to the allowed enum in src/core/contracts/types/config.ts.
4. Typed errors
Create src/core/adapters/embeddings/acme/errors.ts with at minimum:
AcmeRateLimitError extends InfraErrorwith codeINFRA_EMBEDDING_ACME_RATE_LIMITAcmeAuthError extends InfraErrorwith codeINFRA_EMBEDDING_ACME_AUTH
Follow the exact pattern of src/core/adapters/embeddings/openai/errors.ts. Error codes are how the agent surfaces provider problems to users — don't skip this.
5. Default batch size + rate limit
Add your provider's tuned defaults to src/bootstrap/config/defaults.ts:
batchSize— API-limited texts per request (e.g. Cohere = 96, OpenAI = 2048)maxRequestsPerMinute— provider's RPM tier
Users override via EMBEDDING_TUNE_BATCH_SIZE and EMBEDDING_TUNE_MAX_REQUESTS_PER_MINUTE.
6. Tests
Place under tests/core/adapters/embeddings/acme.test.ts. Mock the HTTP client with msw or vi.fn(). Cover:
- Successful batch embed → returns array of vectors with correct dimensions
- Rate-limit 429 → retries after
Retry-Afterheader - Auth 401 → throws
AcmeAuthError(no retry) - Health check reachable / unreachable
Follow tests/core/adapters/embeddings/openai.test.ts as template.
7. Documentation
Add a new page under website/docs/config/providers/acme.md mirroring the structure of openai.md: type/price/scale table, setup, configuration, available models, rate limits, when to use.
Add a row to website/docs/config/providers/index.md comparison table.
Local vs Remote
Remote (cloud API) — pattern used by OpenAI, Cohere, Voyage:
- HTTP client (
openai,cohere-ai, raw fetch) - Rate limiter via
bottleneck - Retry with
Retry-Afterhonouring getBaseUrl()returns configured endpoint- Typed errors for rate-limit / auth / quota
Local (on-device) — pattern used by ONNX, Ollama:
- Process/daemon lifecycle management
- No rate limiting (backpressure via provider's own concurrency)
- Fallback behaviour if local service crashes (see
OllamaEmbeddings#switchToFallback) getBaseUrl()returnsundefined(ONNX) or local socket (http://localhost:11434for Ollama)
Copy the closest template to your situation.
Testing the Integration End-to-End
After registration:
export EMBEDDING_PROVIDER=acme
export ACME_API_KEY=...
npm run build
# then in Claude Code, re-connect the MCP server and:
Call index_codebase with forceReindex: true on a small test directory. If the provider succeeds, you should see a new collection {name}_{model}_{schemaVersion} with the new dimensions. Confirm via get_collection_info.
If health check fails at startup, the agent will receive INFRA_EMBEDDING_ACME_* error codes — check they're mapped correctly in your errors.ts.
Related
- Embedding Providers Overview — user-facing provider comparison
- Failure Model — how retries and fallbacks work
- Data Model — what embeddings populate (the dense vector)