Use case · Agents
AI agent cost estimator
Agents chain multiple LLM calls with tool outputs reinserted into prompts. A “single user task” can hide five or ten billable requests.
This page helps you expose that structure to finance and set sane guardrails.
Token usage patterns for agents
Each tool call serializes arguments and results back into context. Failed validations trigger retries that multiply tokens silently.
Agent-shaped scenarios
| Scenario | Prompt tokens | Output tokens | Model (est.) | Cost / request |
|---|---|---|---|---|
| Research agent (3 tools) | 8000 | 1200 | GPT-4o | $0.0320 |
| Ops agent with SQL | 5200 | 700 | Claude 3.5 Sonnet | $0.0261 |
| Coding agent patch | 11000 | 2500 | GPT-4o | $0.0525 |
Figures use rates from config/models.php; confirm against your provider before billing decisions.
Monthly estimates
-
Internal automation
900 agent tasks per weekday.
- Per request
- $0.0178
- Monthly (900 req/day × 22 days)
- $351.45
Infrastructure considerations
Sandboxed tool execution, secret management, and durable execution frameworks add baseline COGS beyond tokens.
Model recommendations
Use reasoning models only on steps that need planning; cache stable plans for repeatable workflows.
Optimization recommendations
Cap max steps, require structured tool schemas, and deduplicate observations before re-prompting.
ROI examples
Agents shine when they replace hours of analyst time—express savings in fully loaded hourly rates, not tokens alone.
API budget planning
Create per-workflow budgets with automatic circuit breakers when token counts exceed expectations mid-run.
FAQ: AI agent inference pricing
Short answers mirror the structured data on this page for search engines and readers.
- Why are agent tasks expensive at night?
- Batch jobs often run uncapped with large contexts—set schedules and limits.
- How do I attribute tokens to customers?
- Propagate a tenant id through spans and aggregate in your warehouse nightly.
- Are smaller models viable for agents?
- Sometimes, for narrow tools—but plan for escalation paths when tool errors rise.
- What about human approvals?
- Approval waits do not reduce tokens already spent—optimize earlier steps first.