The formula
One-time indexing = documents × tokens/doc × embedding price ÷ 1,000,000. Monthly = re-index cost + query embeddings + generation + reranker + vector DB storage. Generation per query = (system + chunks × chunk tokens) × input price + answer tokens × output price, all ÷ 1,000,000.
Questions
How much does a RAG app cost?
A small one-time indexing cost (often a few dollars) plus an ongoing per-query cost dominated by answer generation. Embeddings and vector storage are usually cheap by comparison.
Is embedding or generation more expensive?
Generation, by a wide margin. Embedding models are roughly 100× cheaper per token, so optimisation should focus on the generation model and the amount of context you retrieve.
How much does a vector database cost?
Managed vector DBs typically run $0–$70/month for small-to-mid workloads; self-hosted options cost only the underlying compute and storage.