Do tokens include spaces?

Yes, whitespace is usually represented as its own token or attached to neighboring tokens depending on the tokenizer. That is why extra blank lines can inflate counts. Trim unnecessary formatting in automated prompts when you need lean bills.

Why does my cost differ from a word-count estimate?

Word counts ignore subword splits and special symbols. Provider tokenizers can merge frequent pairs or split rare strings. Always verify with the official tokenizer or usage metrics instead of dividing character counts by four.

Are embedding models billed in tokens too?

Most embedding endpoints also meter by tokens on the input text. There is typically no completion side. Check whether batch APIs price per million tokens differently from interactive calls.

How are images handled tokens?

Vision inputs are converted to patches or latent representations and billed with their own token rules. Do not assume the text tokenizer guidance applies to multimodal inputs without reading provider charts.

Can I know tokens before calling the API?

Many SDKs expose local counting helpers using the same tokenizer weights providers publish. Offline counts should match within rounding, but server logs remain authoritative for billing.

Does fine-tuning change token definitions?

Fine-tuning changes model weights, not usually the public tokenizer for that family. You still count tokens the same way, but quality improvements may let you shorten prompts while keeping accuracy.

FAQ guide

What is an AI token?

Quick answer

An AI token is a small chunk of text that large language models process as a unit. Providers bill API usage in tokens rather than characters or words because models operate on token sequences. English text often maps to roughly four characters or three-quarters of a word per token on average, but counts vary by language and tokenizer. Tokens include visible words, punctuation, and many pieces of whitespace or formatting.

In large language model APIs, a token is the fundamental billing and context unit. Unlike a human word, a token might be a short fragment like "ing" or a full word like "pricing." The tokenizer splits input and output into these pieces before the neural network runs. That is why identical prompts can yield different token totals across model families, which matters when you compare dashboards and cost calculators.

Understanding tokens helps you reason about limits, latency, and spend. Context windows, rate limits, and per-million-token prices all refer to these tokenizer outputs. When you optimize prompts or cache repeated prefixes, you are primarily changing how many tokens you send on each request.

Why providers use tokens for billing

Models do not charge by page views or seconds of wall-clock time in the same way web hosting does. Inference cost scales with how much text moves through the transformer stack, and token count tracks that workload closely. Providers publish separate input and output prices because generating new tokens is often more expensive than reading prompt tokens.

Token-based pricing also aligns with hardware utilization. Each token participates in attention computations against prior context, so longer prompts increase compute even if the answer is short. This is separate from training, which is capital intensive and not billed per chat message in typical API products.

Tokens versus words in practice

Rough heuristics say one hundred English words might become seventy-five to one hundred thirty tokens depending on vocabulary and punctuation. Code, JSON, and non-Latin scripts can diverge further because compressibility differs. Always measure with the provider tokenizer when precision matters.

User interfaces sometimes show character counts, but API logs almost always show tokens. Bridging that gap prevents surprises when finance reviews the bill.

Special tokens

Systems may add hidden delimiter tokens for roles, tools, or formatting. Those still count toward limits even if the user did not type them.

Simple token intuition

If a prompt is two hundred tokens and the model returns one hundred tokens, you pay for three hundred total, usually split across input and output rates. Cached prompt tokens may discount the input side when a provider supports prompt caching.

Estimated cost ≈ (input_tokens × input_price_per_million ÷ 1e6) + (output_tokens × output_price_per_million ÷ 1e6).

Common misconceptions

Equating one token with one English word across every model leads to bad forecasts when you switch models or languages.
Ignoring system prompts, tool schemas, and chat templates omits tokens that silently consume context.
Assuming shorter answers are always cheaper if the prompt is huge, because input tokens often dominate total cost.
Forgetting that streaming or retries can duplicate partial output token charges when clients reconnect.

Practical tips

Log token_usage fields from API responses and reconcile weekly against invoices.
Prototype token budgets per feature and alert when a route exceeds its ceiling.
Reuse stable instruction blocks so prompt caching can shrink repeated input tokens.
Compare tokenizer outputs before migrating to a new model family, not just list price.
Document assumptions in your runbooks so finance understands token-based forecasts.

Continue exploring

Internal links connect calculators, blog guides, and related FAQ articles for stronger topical coverage.

Core tools

Blog & related FAQs

Turn these ideas into concrete dollars

Compare models, simulate monthly traffic, and export shareable estimates in seconds. Numbers follow your config/models.php rates so you can mirror vendor tables before you commit to architecture.

Open calculator OpenAI view Claude view