Do embeddings count as LLM tokens?

Embedding models have their own pricing—track them separately from chat completions.

How does chunk overlap affect cost?

Overlap duplicates tokens across calls—minimize overlap while preserving context.

What about handwritten notes?

OCR quality impacts retries; budget higher completion variance.

Can on-device OCR reduce bills?

It can reduce cloud OCR fees but may shift engineering costs—compare holistically.

Home
AI token cost calculator
Document processing AI cost guide

Use case · Documents

Document processing AI cost guide

Document AI stacks combine OCR, chunking, embeddings, and LLM summarization. Tokens accumulate across stages, so isolate each stage in your ledger.

Token patterns in document pipelines

Long PDFs can span multiple LLM calls if you chunk. Each chunk adds overhead instructions—keep chunk templates short.

Processing scenarios

Scenario	Prompt tokens	Output tokens	Model (est.)	Cost / request
Contract clause map	12000	900	Claude 3.5 Sonnet	$0.0495
Invoice field extract	3500	400	GPT-4o	$0.0128
Daily news digest	6000	500	gemini-2.5-flash	$0.0006

Figures use rates from config/models.php; confirm against your provider before billing decisions.

Monthly estimates

Back-office batch

350 large docs per weekday.

Per request

$0.0383

Monthly (350 req/day × 22 days)

$294.53

Infrastructure considerations

Object storage, OCR vendors, and GPU preprocessing belong in the same business case as LLM tokens.

Model recommendations

Use flash/mini tiers for first-pass extraction and premium models for adjudication.

Optimization recommendations

Deduplicate documents, store intermediate JSON, and avoid re-sending unchanged sections.

ROI examples

Compare manual review hours avoided versus model cost—legal and finance teams often have ready hourly benchmarks.

Budget guidance

Pilot with stratified samples across document types so medians reflect messy real-world scans.

Related calculators & guides

Explore adjacent workflows and long-tail pricing topics without losing your place.

FAQ: Document AI pricing

Short answers mirror the structured data on this page for search engines and readers.

Do embeddings count as LLM tokens?: Embedding models have their own pricing—track them separately from chat completions.
How does chunk overlap affect cost?: Overlap duplicates tokens across calls—minimize overlap while preserving context.
What about handwritten notes?: OCR quality impacts retries; budget higher completion variance.
Can on-device OCR reduce bills?: It can reduce cloud OCR fees but may shift engineering costs—compare holistically.

Calculate document LLM costs

Tune prompt tokens to reflect average pages ingested per job.

Prefilled for this page’s scenario. Pricing loads from config/models.php and /api/pricing.

Calculator

Cost = (prompt ÷ 1000 × P_in) + (completion ÷ 1000 × P_out), then × requests.

Primary model

Prompt tokens

Completion tokens

Requests

Currency

Usage presets

Multi-model comparison

Toggle models to compare the same workload. The cheapest option is highlighted.

Monthly cost simulator

Project from average daily requests (uses tokens above).

Avg. requests / day

Working days / month

Uses primary model rates for projections.

Token estimator

Rough heuristic: ~4 characters ≈ 1 token for Latin text (indicative only).

Paste prompt or completion

Estimated tokens: 0 · Cost @ primary: —

API budget planner

Set a monthly cap to see how many identical requests fit (primary model).

Monthly budget (USD)

Max requests (approx): —

Prompt optimization analyzer

Collapse whitespace and tighten wording to preview savings at the primary model.

Draft prompt

Suggested shorter form:

Token delta: 0 · Est. savings / 1k calls: —

Fine-tuning cost sketch

Order-of-magnitude helper: training tokens × epochs × rate + storage.

Training tokens (billions)

Epochs

USD / 1M train tokens

Checkpoint storage (GB)

Storage USD / GB / mo

Est. training + 1 mo storage: —

Team usage calculator

Multiply per-person daily volume by team size (primary model).

Team members

Requests / person / day

Team monthly (22d): —

Cost per feature

Price a single product surface (e.g., one chat turn or one generated article).

Feature label

Uses / day

Uses prompt & completion tokens from the calculator for one invocation.

Cost per use: — · Monthly @ that cadence: —

Share & export

Serialize inputs in the URL hash or copy a text summary.

Calculation history

Stored in your browser only (LocalStorage).

Primary results

Cost / request: —
Input share: —
Output share: —
Total (batch): —
Monthly (simulator): —
Yearly (simulator): —

Comparison table

Model	$/req	Batch

Optimization insights

Currency note

FX rates are static snapshots for UX (not trading data). USD is the base in app.js; adjust as needed.