Use case · Chatbots
Chatbot API cost calculator
Customer-facing chatbots mix short user messages with variable-length answers, tool calls, and sometimes retrieved documents. That variability makes “one number” pricing misleading unless you anchor estimates to measured token histograms.
This page walks through realistic chatbot token patterns, shows how they translate into daily and monthly bills, and links to optimization tactics your team can implement without hurting UX.
Use the calculator with the embedded preset tuned for support-style chats, then adjust completion caps to match your brand voice.
Expected token usage patterns
Most production chatbots spend input tokens on system instructions, the latest user turn, and a sliding window of recent history. Retrieval adds chunks—often the biggest surprise when finance first reviews logs.
Output tokens track answer length, empathetic padding, and structured payloads like JSON buttons. Streaming responses do not reduce billed tokens versus batch responses.
Pattern checklist
- Measure p50 and p95 prompt sizes per locale.
- Separate “free tier” user behavior from paid users.
- Account for moderator or safety classifier passes if they call the same model.
Example chatbot workloads
| Scenario | Prompt tokens | Output tokens | Model (est.) | Cost / request |
|---|---|---|---|---|
| Retail FAQ | 380 | 140 | GPT-4o mini | $0.0001 |
| Technical support with logs | 2400 | 320 | GPT-4o | $0.0092 |
| Sales concierge | 900 | 260 | Claude 3.5 Haiku | $0.0018 |
Figures use rates from config/models.php; confirm against your provider before billing decisions.
Daily and monthly API estimates
-
Growth-stage SaaS
8,000 sessions per weekday.
- Per request
- $0.0002
- Monthly (8000 req/day × 22 days)
- $35.90
-
Weekend-heavy consumer app
25,000 shorter turns.
- Per request
- $0.0001
- Monthly (25000 req/day × 30 days)
- $72.00
Infrastructure considerations
Beyond raw tokens, budget for retries, shadow traffic in canary deployments, and duplicate calls from mobile clients with poor connectivity. Observability sinks (logging tokens to your warehouse) also have a cost—usually tiny versus model bills but worth noting.
Model recommendations
Start with a mini or Haiku-class default for breadth, route frustration signals or high LTV accounts to larger models, and keep human handoff paths cheap by truncating context intelligently.
Optimization recommendations
Summarize stale turns instead of sending entire transcripts, store stable persona text server-side, and test tighter max_tokens settings with product design so answers stay concise without feeling curt.
ROI estimation examples
If a chatbot deflects twenty percent of tickets costing twelve dollars each, model spend can often be justified even at flagship pricing—provided you measure deflection honestly with QA sampling.
API budget planning guidance
Build a simple spreadsheet: tokens per session × sessions per month × blended rate. Add ten to twenty percent buffer for growth and experimentation, then reconcile weekly during launch months.
FAQ: Chatbot API pricing
Short answers mirror the structured data on this page for search engines and readers.
- Why did our chatbot bill double after adding retrieval?
- Each retrieved chunk adds input tokens. Measure average chunk count and size after launch.
- Should we use the same model for free and paid users?
- Often no—tier models by SLA and features, but disclose differences ethically if users compare quality.
- How do handoffs to humans affect cost?
- They reduce future model spend on long threads but may increase CRM storage; track both.
- What metric best predicts chatbot cost?
- Output tokens per resolved session, paired with resolution rate—not raw message count alone.