Skip to main content

Embedding Providers

TeaRAGs supports five embedding providers — from zero-config local inference to high-throughput cloud APIs. Choose based on your codebase size, privacy requirements, and budget.

Provider Comparison

ProviderTypePriceScale*Key Feature
ONNXLocal🟢 Free~700k LoCZero-config, built-in runtime, adaptive GPU batching
OllamaLocal🟢 Free~8M+ LoC (depends on hardware)GPU acceleration, 100+ models
OpenAICloud🟡 Pay-per-use ($0.02/1M tokens)~800k–8M LoC (depends on API tier)Highest quality, easy setup
CohereCloud🟡 Pay-per-use ($0.10/1M tokens)~1M LoCMultilingual support
VoyageCloud🟡 Pay-per-use ($0.12/1M tokens)~2.4M LoCCode-specialized models

* Estimated lines of code for initial full indexing within 45 minutes. Benchmarked on Apple M3 Pro with WebGPU — actual throughput depends on your hardware. Incremental reindexing is fast on any provider — typically only 1–5% of files change between runs.

How to Choose

Want zero setup? Start with ONNX — no external services, no API keys, works out of the box. Best for small-to-medium projects.

Have a GPU? Use Ollama — free, private, and handles millions of lines of code. The default choice for serious local development.

Need cloud scale or quality? Pick OpenAI for the best embedding quality and familiar API. Consider Voyage if your codebase is code-heavy — their models are trained specifically on source code. Choose Cohere if you need multilingual embeddings.

Privacy matters? ONNX and Ollama keep everything local. No data leaves your machine.

Common Configuration

All providers share these tuning variables:

VariableDescriptionDefault
EMBEDDING_PROVIDERProvider name: onnx, ollama, openai, cohere, voyageollama
EMBEDDING_MODELModel name (provider-specific)Provider default
EMBEDDING_DIMENSIONSVector dimensions (auto-detected from model)Auto
EMBEDDING_TUNE_BATCH_SIZETexts per embedding batchProvider-specific (see below)
EMBEDDING_TUNE_RETRY_ATTEMPTSRetry count on failure3
EMBEDDING_TUNE_RETRY_DELAY_MSInitial retry delay (exponential backoff)1000

Default Batch Sizes

EMBEDDING_TUNE_BATCH_SIZE is automatically set per provider — you don't need to configure it unless you want to override. Defaults are optimized based on API limits and throughput characteristics:

ProviderDefault Batch SizeRationale
ONNXAuto-calibratedGPU probe sets optimal batch size at startup
Ollama1024GPU-optimized, native batch API
OpenAI2048Max texts per API request
Cohere96API limit: 96 texts per request
Voyage128Balanced for 120k token/request limit

Override with EMBEDDING_TUNE_BATCH_SIZE if needed.

Pipeline Concurrency

INGEST_PIPELINE_CONCURRENCY controls pipeline worker concurrency (default: 1). The pipeline already handles parallelism via batch accumulation, and increasing concurrency adds complexity without improving throughput for most providers. Leave at 1 unless you have a specific reason to change it.

See individual provider pages for provider-specific variables and setup instructions.