Four cost components
- One-time indexing — embedding your documents once. Usually a few dollars even for large corpora.
- Query embeddings — embedding each incoming query. Tiny, because queries are short.
- Vector database — storing and searching vectors. Modest; often $0–$70/month.
- Generation — the LLM answering with retrieved context. This dominates.
Why generation dominates
Embedding models cost ~$0.02–$0.13 per million tokens; generation models cost dollars per million output tokens. Each query sends retrieved chunks as input and produces an answer as output, both billed at generation rates. Retrieving 8 large chunks when 3 would do directly inflates this.
The RAG cost formula
monthly = re-index + query_embeddings + generation + reranker + vector_db generation = queries × ((system + chunks×chunk_tokens)×input_price + answer_tokens×output_price) ÷ 1e6
How to keep it cheap
Retrieve fewer, tighter chunks; add a reranker; cap answer length; cache the system prompt and hot chunks; use a smaller generation model for routine questions; and re-index incrementally. Model your numbers in the RAG cost calculator.