How expensive are embedding refreshes?

They scale with corpus size and dimensions—schedule incremental updates.

Does reranking add tokens?

Cross-encoder rerankers may use GPUs instead of LLM tokens; account separately.

What chunk size is cheapest?

Smaller chunks reduce per-query tokens but can hurt answer quality—test retrieval metrics.

Can caching help RAG?

Yes, for repeated FAQs—cache final answers with invalidation rules.

Home
AI token cost calculator
RAG system cost estimator

Use case · RAG

RAG system cost estimator

RAG systems bill across embeddings, vector search infrastructure, and chat completions. Teams that only model chat underestimate true spend.

Token usage patterns in RAG

Each query might retrieve five chunks of six hundred tokens plus a system prompt. Re-querying without caching repeats that cost.

RAG scenarios

Scenario	Prompt tokens	Output tokens	Model (est.)	Cost / request
Support knowledge base	3800	350	GPT-4o mini	$0.0008
Legal research assistant	9500	700	GPT-4o	$0.0308
Developer docs bot	5200	500	Claude 3.5 Haiku	$0.0062

Figures use rates from config/models.php; confirm against your provider before billing decisions.

Monthly estimates

Production RAG assistant

7,000 questions per weekday.

Per request

$0.0009

Monthly (7000 req/day × 22 days)

$140.91

Infrastructure considerations

Vector index size, replication factor, and embedding dimensions drive hosting bills independent of chat tokens.

Model recommendations

Mini tiers suffice when retrieval quality is high; upgrade models when answers need synthesis across conflicting chunks.

Optimization recommendations

Tune top-k, use rerankers sparingly, compress chunks, and deduplicate sources at index time.

ROI examples

RAG wins when it reduces time-to-answer for specialists—quantify that time and error reduction.

Budget guidance

Separate one-time ingestion from steady-state query costs; ingestion spikes during content migrations.

Related calculators & guides

Explore adjacent workflows and long-tail pricing topics without losing your place.

FAQ: RAG operating costs

Short answers mirror the structured data on this page for search engines and readers.

How expensive are embedding refreshes?: They scale with corpus size and dimensions—schedule incremental updates.
Does reranking add tokens?: Cross-encoder rerankers may use GPUs instead of LLM tokens; account separately.
What chunk size is cheapest?: Smaller chunks reduce per-query tokens but can hurt answer quality—test retrieval metrics.
Can caching help RAG?: Yes, for repeated FAQs—cache final answers with invalidation rules.

Estimate RAG chat token costs

Set prompt tokens near your average retrieved chunk total plus instructions.

Prefilled for this page’s scenario. Pricing loads from config/models.php and /api/pricing.

Calculator

Cost = (prompt ÷ 1000 × P_in) + (completion ÷ 1000 × P_out), then × requests.

Primary model

Prompt tokens

Completion tokens

Requests

Currency

Usage presets

Multi-model comparison

Toggle models to compare the same workload. The cheapest option is highlighted.

Monthly cost simulator

Project from average daily requests (uses tokens above).

Avg. requests / day

Working days / month

Uses primary model rates for projections.

Token estimator

Rough heuristic: ~4 characters ≈ 1 token for Latin text (indicative only).

Paste prompt or completion

Estimated tokens: 0 · Cost @ primary: —

API budget planner

Set a monthly cap to see how many identical requests fit (primary model).

Monthly budget (USD)

Max requests (approx): —

Prompt optimization analyzer

Collapse whitespace and tighten wording to preview savings at the primary model.

Draft prompt

Suggested shorter form:

Token delta: 0 · Est. savings / 1k calls: —

Fine-tuning cost sketch

Order-of-magnitude helper: training tokens × epochs × rate + storage.

Training tokens (billions)

Epochs

USD / 1M train tokens

Checkpoint storage (GB)

Storage USD / GB / mo

Est. training + 1 mo storage: —

Team usage calculator

Multiply per-person daily volume by team size (primary model).

Team members

Requests / person / day

Team monthly (22d): —

Cost per feature

Price a single product surface (e.g., one chat turn or one generated article).

Feature label

Uses / day

Uses prompt & completion tokens from the calculator for one invocation.

Cost per use: — · Monthly @ that cadence: —

Share & export

Serialize inputs in the URL hash or copy a text summary.

Calculation history

Stored in your browser only (LocalStorage).

Primary results

Cost / request: —
Input share: —
Output share: —
Total (batch): —
Monthly (simulator): —
Yearly (simulator): —

Comparison table

Model	$/req	Batch

Optimization insights

Currency note

FX rates are static snapshots for UX (not trading data). USD is the base in app.js; adjust as needed.