Cost Savings Methodology

A transparent explanation of how Plexor calculates and reports cost savings. We believe users deserve to understand exactly how their savings are computed.

Overview

Plexor helps reduce LLM costs through intelligent provider routing. When you send a request through Plexor, we route it to the most cost-effective provider that can handle your task while maintaining quality. Your cost savings represent the difference between what you would have paid using a standard provider and what you actually paid through Plexor's optimized routing.

Key Principle
We calculate savings against what you would typically pay using Claude directly. This represents the real-world value of using Plexor instead of calling Anthropic's API yourself.

Reference Model Methodology

To calculate meaningful savings, we need a baseline: "What would this request cost without Plexor?" We use Claude Sonnet as our reference model because:

  • It's the most commonly used production model for agentic coding tasks
  • It represents what most developers would use as their default
  • Its pricing is well-documented and stable

Reference Model Pricing (January 2026)

Model Input (per 1M tokens) Output (per 1M tokens) Role
claude-sonnet-4.5 $3.00 $15.00 Reference Baseline
Why Not Use the Requested Model?
If we used your requested model as the baseline, routing to a cheaper provider would show zero savings (since baseline = actual). The reference model approach shows the true value: "You saved X compared to using Claude directly."

Calculation Formula

For each API request, we calculate cost savings using these formulas:

baseline_cost = (input_tokens / 1,000,000) × $3.00 + (output_tokens / 1,000,000) × $15.00
Baseline: What Claude Sonnet would cost for this request
actual_cost = (input_tokens / 1,000,000) × provider_input_price + (output_tokens / 1,000,000) × provider_output_price
Actual: What the routed provider charged
savings = baseline_cost - actual_cost
Savings: The difference you didn't have to pay
savings_percent = (savings / baseline_cost) × 100
Savings %: Your cost reduction percentage

Pricing Data Sources

We maintain current pricing data for all supported providers. Here are the key providers and their pricing (per million tokens):

Provider Model Input Output vs Claude Sonnet
Anthropic claude-sonnet-4.5 $3.00 $15.00 Baseline (0%)
Anthropic claude-haiku-4.5 $1.00 $5.00 ~67% cheaper
OpenAI gpt-4o $2.50 $10.00 ~29% cheaper
OpenAI gpt-4o-mini $0.15 $0.60 ~95% cheaper
DeepSeek deepseek-chat $0.14 $0.28 ~95% cheaper
Mistral ministral-3b $0.04 $0.04 ~99% cheaper
Google gemini-2.0-flash $0.10 $0.40 ~97% cheaper

Sources: Pricing data is sourced directly from each provider's official documentation and updated regularly. See our src/plexor/pricing.py file for the complete, up-to-date pricing configuration.

Worked Examples

Example 1: Simple Query Routed to DeepSeek

Request: Code explanation task (eco mode)
Input tokens 1,000
Output tokens 500
Routed to deepseek-chat
Baseline cost (Claude) $0.0105
Actual cost (DeepSeek) $0.00028
Your savings $0.01022 (97.3%)
# Calculation breakdown
baseline = (1000 / 1_000_000) × $3.00 + (500 / 1_000_000) × $15.00
         = $0.003 + $0.0075
         = $0.0105

actual   = (1000 / 1_000_000) × $0.14 + (500 / 1_000_000) × $0.28
         = $0.00014 + $0.00014
         = $0.00028

savings  = $0.0105 - $0.00028 = $0.01022 (97.3% saved)

Example 2: Complex Task Staying on Claude

Request: Complex refactoring (quality mode)
Input tokens 10,000
Output tokens 5,000
Routed to claude-sonnet-4.5
Baseline cost (Claude) $0.105
Actual cost (Claude) $0.105
Your savings $0.00 (0%)
Quality Mode Note
In quality mode or for complex tasks, Plexor may route to Claude to ensure the best results. In these cases, savings may be zero. This is by design: we prioritize quality over cost savings when appropriate.

Limitations & Caveats

What Our Savings Calculation Does NOT Include

  • Token optimization: We report raw token counts. Some requests may benefit from prompt optimization, but we don't claim savings from compression.
  • Quality differences: Cheaper models may produce different (sometimes lower quality) outputs. Savings are purely financial.
  • Latency costs: Some cheaper providers may have higher latency. We don't factor opportunity cost into savings.
  • Your actual Claude contract: If you have negotiated pricing with Anthropic, your real savings may differ from our baseline.

Potential Biases

  • Reference model choice: Using Claude Sonnet as baseline maximizes apparent savings when routing to cheaper providers. If you typically use cheaper models, your real savings would be lower.
  • Task mix: Savings vary significantly by task type. Simple tasks routed to cheap providers show high savings; complex tasks may show zero.

Audit Your Own Data

We encourage you to audit your usage data. Your dashboard shows per-request breakdowns with actual provider used, token counts, and costs. The raw data is available via our API at GET /v1/stats.