One task, many calls
A chatbot answers a message with one model call. An agent completes a task with many: it plans, picks a tool, reads the result, decides the next step, maybe retries, and finally summarises. Five to fifty calls per task is normal — and each call re-sends the system prompt, tool schemas and accumulated context.
The three multipliers
Steps × calls-per-step sets the base call count. Tool calls add round-trips. Retries multiply everything when tools fail or output is invalid. Together they explain why agent bills surprise people.
Context is the silent cost
Because every call carries the growing context, input tokens dominate. This is also the biggest opportunity: caching the stable prefix (instructions, tool schemas, reference docs) can cut input cost dramatically.
Model routing
Use a capable model for planning and the final answer, and a cheap model for mechanical sub-steps. Counter-intuitively, routing everything to the cheapest model can cost more overall, because weaker planning causes more wrong turns and retries.
Failure-adjusted cost
If tasks succeed 80% of the time, you pay for the 20% that fail too. Always reason in cost-per-successful-task. Improving reliability lowers cost twice: fewer retries and fewer wasted runs.
Model it before you build
Use the AI agent cost calculator to estimate cost per task and compare routing strategies before committing to an architecture.