The formula
Monthly cost = ((requests × (1 + tool calls) × (1 + retry%) × input tokens ÷ 1,000,000) × input price) + (same for output × output price) + embeddings + human review + storage, then × (1 + safety margin%). Cached tokens are billed at the cached input rate; the batch share gets the provider's batch discount.
Questions
Why are output tokens more expensive than input tokens?
Generating tokens is sequential and compute-intensive, whereas prompt tokens can be processed in parallel. Providers price output 2–5× higher to reflect that, so shortening answers saves more than shortening prompts.
Are AI API costs predictable?
They are predictable per request, but total spend scales with usage, retries and context length. Add a safety margin and monitor token usage in production; a single long-context feature can multiply costs.
How do I reduce AI API costs?
Use a smaller model for simple steps, cap output length, enable prompt caching for repeated context, batch non-urgent jobs, trim system prompts, and only retrieve the RAG chunks you actually need.