Tokens · Pricing
How AI token pricing works
Token pricing sounds opaque until you separate three ideas: how text becomes tokens, how providers price input versus output, and how your traffic multiplies both.
Token calculation explanation
Providers bill tokens, not words. Tokenizers determine the split; different models tokenize the same string differently.
Words-to-token examples
Use heuristics for proposals, tokenizer APIs for production forecasts.
Prompt optimization tips
Tight prompts reduce input tokens; tight answer formats reduce output tokens.
Token reduction techniques
Summaries, caches, retrieval, and routing are engineering tools with direct token payoffs.
Context window explanation
Windows set hard caps; approaching them increases failure rates and sometimes cost if you chunk work into extra calls.
Real pricing examples
Per-request cost = prompt_tokens/1000 * input_rate + completion_tokens/1000 * output_rate. Monthly cost = per-request * daily_requests * days.
Sample configured rates
| Model | Provider | Input | Output |
|---|---|---|---|
| GPT-4o | OpenAI | $0.0025 / 1K in | $0.0100 / 1K out |
| GPT-4o mini | OpenAI | $0.0002 / 1K in | $0.0006 / 1K out |
| Claude 3.5 Sonnet | Anthropic | $0.0030 / 1K in | $0.0150 / 1K out |
| DeepSeek Chat | DeepSeek | $0.0001 / 1K in | $0.0003 / 1K out |
FAQ: Token pricing mechanics
Short answers mirror the structured data on this page for search engines and readers.
- Why is output pricing higher?
- Generation allocates incremental compute per token; providers reflect that in rates.
- Are there taxes or fees on top?
- Cloud marketplaces may add line items—read invoices holistically.
- Do batch APIs change unit price?
- Often yes—if latency allows, batching can improve effective economics.
- Can I negotiate pricing?
- At scale, often—document usage forecasts before conversations.