View Dashboard

Cost Savings Methodology

A transparent explanation of how Plexor calculates and reports cost savings. We believe users deserve to understand exactly how their savings are computed.

On This Page

Overview Reference Model Methodology Calculation Formula Pricing Data Sources Worked Examples Limitations & Caveats

Overview

Plexor helps reduce LLM costs through intelligent provider routing. When you send a request through Plexor, we route it to the most cost-effective provider that can handle your task while maintaining quality. Your cost savings represent the difference between what you would have paid using a standard provider and what you actually paid through Plexor's optimized routing.

Key Principle

We calculate savings against what you would typically pay using Claude directly. This represents the real-world value of using Plexor instead of calling Anthropic's API yourself.

Reference Model Methodology

To calculate meaningful savings, we need a baseline: "What would this request cost without Plexor?" We use Claude Sonnet as our reference model because:

It's the most commonly used production model for agentic coding tasks
It represents what most developers would use as their default
Its pricing is well-documented and stable

Reference Model Pricing (January 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Role
`claude-sonnet-4.5`	$3.00	$15.00	Reference Baseline

Why Not Use the Requested Model?

If we used your requested model as the baseline, routing to a cheaper provider would show zero savings (since baseline = actual). The reference model approach shows the true value: "You saved X compared to using Claude directly."

Calculation Formula

For each API request, we calculate cost savings using these formulas:

baseline_cost = (input_tokens / 1,000,000) × $3.00 + (output_tokens / 1,000,000) × $15.00

Baseline: What Claude Sonnet would cost for this request

actual_cost = (input_tokens / 1,000,000) × provider_input_price + (output_tokens / 1,000,000) × provider_output_price

Actual: What the routed provider charged

savings = baseline_cost - actual_cost

Savings: The difference you didn't have to pay

savings_percent = (savings / baseline_cost) × 100

Savings %: Your cost reduction percentage

Pricing Data Sources

We maintain current pricing data for all supported providers. Here are the key providers and their pricing (per million tokens):

Provider	Model	Input	Output	vs Claude Sonnet
Anthropic	`claude-sonnet-4.5`	$3.00	$15.00	Baseline (0%)
Anthropic	`claude-haiku-4.5`	$1.00	$5.00	~67% cheaper
OpenAI	`gpt-4o`	$2.50	$10.00	~29% cheaper
OpenAI	`gpt-4o-mini`	$0.15	$0.60	~95% cheaper
DeepSeek	`deepseek-chat`	$0.14	$0.28	~95% cheaper
Mistral	`ministral-3b`	$0.04	$0.04	~99% cheaper
Google	`gemini-2.0-flash`	$0.10	$0.40	~97% cheaper

Sources: Pricing data is sourced directly from each provider's official documentation and updated regularly. See our src/plexor/pricing.py file for the complete, up-to-date pricing configuration.

Worked Examples

Example 1: Simple Query Routed to DeepSeek

Request: Code explanation task (eco mode)

Input tokens 1,000

Output tokens 500

Routed to deepseek-chat

Baseline cost (Claude) $0.0105

Actual cost (DeepSeek) $0.00028

Your savings $0.01022 (97.3%)

# Calculation breakdown
baseline = (1000 / 1_000_000) × $3.00 + (500 / 1_000_000) × $15.00
         = $0.003 + $0.0075
         = $0.0105

actual   = (1000 / 1_000_000) × $0.14 + (500 / 1_000_000) × $0.28
         = $0.00014 + $0.00014
         = $0.00028

savings  = $0.0105 - $0.00028 = $0.01022 (97.3% saved)

Example 2: Complex Task Staying on Claude

Request: Complex refactoring (quality mode)

Input tokens 10,000

Output tokens 5,000

Routed to claude-sonnet-4.5

Baseline cost (Claude) $0.105

Actual cost (Claude) $0.105

Your savings $0.00 (0%)

Quality Mode Note

In quality mode or for complex tasks, Plexor may route to Claude to ensure the best results. In these cases, savings may be zero. This is by design: we prioritize quality over cost savings when appropriate.

Limitations & Caveats

What Our Savings Calculation Does NOT Include

Token optimization: We report raw token counts. Some requests may benefit from prompt optimization, but we don't claim savings from compression.
Quality differences: Cheaper models may produce different (sometimes lower quality) outputs. Savings are purely financial.
Latency costs: Some cheaper providers may have higher latency. We don't factor opportunity cost into savings.
Your actual Claude contract: If you have negotiated pricing with Anthropic, your real savings may differ from our baseline.

Potential Biases

Reference model choice: Using Claude Sonnet as baseline maximizes apparent savings when routing to cheaper providers. If you typically use cheaper models, your real savings would be lower.
Task mix: Savings vary significantly by task type. Simple tasks routed to cheap providers show high savings; complex tasks may show zero.

Audit Your Own Data

We encourage you to audit your usage data. Your dashboard shows per-request breakdowns with actual provider used, token counts, and costs. The raw data is available via our API at GET /v1/stats.