FAQ guide
How do AI models count tokens?
Quick answer
Models count tokens by running deterministic tokenizer code that maps UTF-8 text to integer IDs using pretrained vocabularies and merge tables or equivalent structures. Each ID corresponds to a token for billing and tensor shapes. The count is reproducible offline with the same tokenizer version. Special tokens mark boundaries or roles and contribute to totals. Multimodal inputs map through separate encoders but still yield billable units per provider rules.
Introduction
Counting is not semantic; it is syntactic and statistical. That is why near-duplicate sentences with different unicode can tokenize differently. Pipelines normalize text first, then apply subword splits until the entire string is covered.
Engineering teams rely on identical counting between CI environments and production to prevent silent budget drift. Pin tokenizer assets like any other dependency.
When you change SDK versions or operating systems, revalidate counts on a canary workload before trusting old dashboards. Subtle clipboard or filesystem normalization differences have surprised teams that assumed byte-identical prompts across laptops.
Algorithm sketch
Byte-level or character-level beginnings guarantee open vocabulary coverage. Merge operations learned on large corpora preferentially group frequent pairs, shrinking average tokens per character for common language.
Special tokens are reserved integers signaling sequence start, padding, or tool delimiters depending on ecosystems.
Operational implications
When counts mismatch between client and server, first verify normalization, hidden template injection, or different model endpoints.
Regression tests should include edge cases like ligatures, emoji sequences, and mixed-language rows.
Version drift
Upgrading libraries without pinning tokenizer resources can change counts overnight even if model weights are unchanged.
Quick example
The phrase token budget might split into two or three tokens depending on whether budget merges as a single common token in your vocabulary snapshot.
Counting mistakes
- Equating string length in PHP characters with tokenizer length without conversion.
- Stripping markdown locally but not on the server worker path.
- Assuming all OpenAI-compatible proxies return identical usage accounting.
- Embedding unnormalized user HTML directly so tags and entities inflate tokens beyond what authors see.
- Sampling only short strings in tests while production drags in multi-kilobyte JSON blobs.
Tips for engineers
- Wrap tokenizer calls behind a service interface to swap implementations safely.
- Log hash of tokenizer config alongside major releases.
- Provide CLI utilities for writers to preview counts interactively.
- Add CI checks on maximum prompt sizes using real tokenizers.
- Document multilingual caveats for support teams.
Related questions
Structured for clarity and aligned with on-page FAQ schema for search features.
Continue exploring
Internal links connect calculators, blog guides, and related FAQ articles for stronger topical coverage.
Core tools
Blog & related FAQs
Turn these ideas into concrete dollars
Compare models, simulate monthly traffic, and export shareable estimates in seconds. Numbers follow your config/models.php rates so you can mirror vendor tables before you commit to architecture.