RAG Cost Explained: Indexing, Embeddings & Generation

Retrieval-augmented generation has a reputation for being expensive. In practice the cost is concentrated in one place — generation — while embeddings and storage are cheap. Here's the breakdown.

Updated 2026-06-19

Four cost components

  1. One-time indexing — embedding your documents once. Usually a few dollars even for large corpora.
  2. Query embeddings — embedding each incoming query. Tiny, because queries are short.
  3. Vector database — storing and searching vectors. Modest; often $0–$70/month.
  4. Generation — the LLM answering with retrieved context. This dominates.

Why generation dominates

Embedding models cost ~$0.02–$0.13 per million tokens; generation models cost dollars per million output tokens. Each query sends retrieved chunks as input and produces an answer as output, both billed at generation rates. Retrieving 8 large chunks when 3 would do directly inflates this.

The RAG cost formula

monthly = re-index + query_embeddings + generation + reranker + vector_db
generation = queries × ((system + chunks×chunk_tokens)×input_price + answer_tokens×output_price) ÷ 1e6

How to keep it cheap

Retrieve fewer, tighter chunks; add a reranker; cap answer length; cache the system prompt and hot chunks; use a smaller generation model for routine questions; and re-index incrementally. Model your numbers in the RAG cost calculator.

Related calculators & guides