Are system messages prompt tokens?

Yes, they are part of the context the model reads before completion begins.

Do reasoning models label differently?

Some products expose additional fields for internal chains. Map them carefully when comparing to classic chat metrics.

Can completion include hidden tokens?

Formatting overhead may appear in counts even if the UI hides it. Treat usage logs as ground truth.

Roles guide behavior and can change tokenization boundaries when serialization formats insert markers.

Is fine-tuning about prompts?

Fine-tuning changes model weights so prompts can be shorter while preserving behavior, indirectly shifting token economics.

Does batch JSON change categories?

Batch jobs still distinguish input and output tokens for completions endpoints; always read batch-specific docs.

FAQ guide

What is the difference between prompt and completion tokens?

Quick answer

Prompt tokens measure all text the model processes as context for a request, including system and developer instructions and any history you attach. Completion tokens measure newly generated text emitted after that context up to the stop condition. APIs bill them on separate counters with potentially different rates. The names align with ChatML-style roles but map cleanly to input and output in usage objects.

Older documentation used prompt and completion language while newer dashboards may say input and output. Conceptually they refer to the same split unless a provider documents an exception for niche endpoints. Consistency in your internal wiki reduces onboarding friction.

Function calling introduces structured snippets that belong to the prompt side before tools return results, which become additional prompt material in subsequent steps.

Lifecycle of a chat call

You assemble messages, convert them to tokens, and submit them. The model samples completion tokens until hitting a limit or stop sequence. Logged totals should reconcile with tokenizer rehearsals plus any server-side additions.

When you store transcripts, label whether each segment originated from the user, tool, or assistant to debug future token growth.

Billing nuances

If output pricing exceeds input pricing, compressing verbose answers yields outsized savings. If prompt pricing dominates due to retrieval, invest in evidence selection quality.

Partial completions still bill for emitted tokens even if the client disconnects mid-stream unless policies state otherwise.

Minimal illustration

With a two-thousand-token prompt and a three-hundred-token reply, usage charts should display two thousand prompt tokens and three hundred completion tokens for that interaction.

Common mix-ups

Counting assistant messages in history as completions for the current billable call.
Forgetting tool outputs feed the next prompt, not the prior completion bucket.
Mislabeling embeddings calls which may only expose an input-style metric.
Bookmarking dashboards that combine generations across unrelated keys and then misattributing completion spend.
Ignoring that partial streams still bill completion tokens for text emitted before cancellation fired.

Operational tips

Mirror official usage field names in your telemetry to ease support tickets.
Build dashboards that compare prompt versus completion across environments.
Educate PMs with concrete examples from your own logs, not generic diagrams.
Automate anomaly detection on sudden prompt inflation.
Version templates to know which copy correlated with token spikes.

Continue exploring

Internal links connect calculators, blog guides, and related FAQ articles for stronger topical coverage.

Core tools

Blog & related FAQs

Turn these ideas into concrete dollars

Compare models, simulate monthly traffic, and export shareable estimates in seconds. Numbers follow your config/models.php rates so you can mirror vendor tables before you commit to architecture.

Open calculator OpenAI view Claude view