The core formula
Every LLM API bills two things: input tokens (your prompt) and output tokens (the model's reply), each at a price quoted per million tokens. The base monthly cost is:
monthly = requests × (input_tokens × input_price + output_tokens × output_price) ÷ 1,000,000
That's it for a naive estimate. The art is in getting the inputs right and adding what production actually costs.
Step 1 — Estimate tokens, not words
One token is roughly ¾ of an English word (~4 characters). Count your typical prompt and typical response. Don't forget the system prompt: it's sent on every request, so a 1,000-token system prompt across a million requests is a billion input tokens. Use the token counter to measure real text.
Step 2 — Pick the right price
Prices vary by 100× across models. A frontier model can be 50× the price of a small model that may handle your task just as well. Always check the current pricing — rates change often.
Step 3 — Add the multipliers production introduces
- Retries: failed or invalid calls are re-run. A 10% retry rate adds 10% to the bill.
- Tool calls: each tool call is an extra round-trip that re-sends context.
- Embeddings: RAG apps pay to embed queries and documents.
- Human review: regulated or high-risk output needs a person to check it.
Step 4 — Subtract the discounts
Prompt caching slashes the cost of repeated context; the Batch API takes ~50% off asynchronous jobs. Model both in the API cost calculator.
Worked example
200,000 requests/month, 1,500 input + 500 output tokens, GPT-4o mini ($0.15 / $0.60 per 1M):
- Input: 200,000 × 1,500 ÷ 1e6 × $0.15 = $45.00
- Output: 200,000 × 500 ÷ 1e6 × $0.60 = $60.00
- Base total: $105.00/month, before retries, caching and margin.
Add a 15% safety margin and you'd budget about $120/month. Now you have a defensible number.