Tokens · I/O
Input vs output token cost
If you only watch prompt size, you will be surprised by invoices—output tokens often drive the majority of spend on assistant-style workloads.
Token calculation explanation
Input tokens are everything you send before generation; output tokens are everything the model emits, including hidden formatting.
Words-to-token examples
A verbose ten-paragraph answer can cost more than a massive prompt with a one-sentence reply—profile both.
Prompt optimization tips
Focus on input when you control large contexts; focus on output when users ask for “detailed” answers.
Token reduction techniques
Use max_tokens, structured formats, and UI nudges toward concise modes.
Context window explanation
Large inputs leave less room for outputs inside the same window—plan generation headroom explicitly.
Real pricing examples
If output price is double input price, a fifty-fifty token split is not a fifty-fifty cost split—weight by rates.
Output-heavy vs input-heavy months
-
Output-heavy assistant
2,000 calls/day, short prompts, long answers.
- Per request
- $0.0180
- Monthly (2000 req/day × 22 days)
- $792.00
-
Input-heavy RAG
600 calls/day, huge prompts, tight answers.
- Per request
- $0.0325
- Monthly (600 req/day × 22 days)
- $429.00
FAQ: Input vs output pricing
Short answers mirror the structured data on this page for search engines and readers.
- Which side should I optimize first?
- Whichever contributes more dollars in your histogram—measure before guessing.
- Do retries double output tokens?
- Failed attempts may still bill partial outputs—monitor failure paths.
- Are system prompts input tokens?
- Yes—every included instruction counts as input.
- Does streaming change the split?
- No—billing still tracks completed tokens.