There is no single cheapest provider
Each provider has budget, mid and premium tiers. The right comparison is tier-to-tier for your input/output mix. An output-heavy workload favours models with low output prices; an input-heavy RAG workload favours low input prices and good caching.
Budget tier
GPT-4o mini / 4.1 nano, Claude Haiku, and Gemini Flash / Flash-Lite are all inexpensive and capable for classification, extraction and simple chat. Differences here are cents — pick on quality and latency for your task.
Mid tier
GPT-4.1 mini, Claude Sonnet and Gemini Flash trade a little cost for noticeably stronger reasoning. This tier is the sweet spot for most production features.
Premium tier
GPT-4o / o-series, Claude Opus and Gemini Pro are the frontier — and the most expensive, especially on output. Use them where quality genuinely moves the needle.
Caching and batch matter too
A model with a higher sticker price but strong caching can be cheaper for context-heavy apps. Always model the full picture in the API cost calculator, which compares every model on your exact usage.
See live numbers
Enter your usage in any calculator and the model comparison table ranks every model by your real monthly cost. Current per-model prices are on the data sources page.