Docs HTTP API

HTTP API Reference

Complete reference for the Plexor REST API. Drop-in compatible with both Anthropic's Messages API and OpenAI's Chat Completions API, with intelligent routing and cost optimization features.

Base URL

All API requests should be made to the Plexor API server:

Base URL
https://api.plexor.dev

Authentication

Plexor accepts multiple authentication methods for maximum compatibility with existing SDKs and tooling. All authentication is done via HTTP headers.

Supported Authentication Headers

Header Format Description
Authorization Bearer <API_KEY> Standard Bearer token authentication. Works with both Plexor API keys and JWT tokens.
x-api-key <API_KEY> Anthropic SDK's default header. Fully supported for compatibility with Claude Code and other Anthropic tools.
X-Plexor-Key <API_KEY> Plexor's native authentication header. Takes highest precedence when multiple auth headers are present.

API Key Format

Plexor API keys follow the format plx_<user_id>_<secret>. You can create API keys from your API Keys dashboard.

Example API Key
plx_user_abc123_a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6
Keep Your API Key Secret
Never expose your API key in client-side code, public repositories, or logs. Treat it like a password.

Anthropic-Compatible Endpoint

The primary endpoint for Plexor's intelligent LLM routing. This endpoint is fully compatible with Anthropic's Messages API, making it a drop-in replacement for existing Anthropic integrations.

POST /gateway/anthropic/v1/messages

When you send a request to this endpoint, Plexor analyzes the request complexity and routes it to the optimal provider based on your configured mode (eco, balanced, quality, or passthrough).

OpenAI-Compatible Endpoint

For applications using the OpenAI SDK or expecting OpenAI's response format, Plexor provides a fully compatible Chat Completions endpoint.

POST /gateway/openai/v1/chat/completions

This endpoint accepts OpenAI's request format and returns OpenAI-compatible responses, while still providing Plexor's intelligent routing and cost optimization.

Additional OpenAI Endpoints

Method Endpoint Description
GET /v1/models List all available models across all providers

Routing Headers

Plexor provides special headers to control routing behavior. These are optional and allow you to fine-tune how requests are processed.

X-Plexor-Mode

Controls the cost/quality tradeoff for request routing. If not specified, defaults to balanced.

Mode Behavior Best For
eco Routes to the cheapest capable model. Maximizes cost savings. Simple queries, drafts, brainstorming, high-volume tasks
balanced Intelligently balances cost and quality based on request complexity. Default mode. General use, most applications
quality Prioritizes response quality. Uses premium models for complex tasks. Complex reasoning, production code, critical analysis
passthrough Routes directly to the requested model with no optimization. Bypasses intelligent routing. Testing, benchmarks, specific model requirements

X-Plexor-Provider / X-Force-Provider

Force requests to be routed to a specific provider, overriding intelligent routing. Both header names are supported for compatibility.

Provider Description
auto Let Plexor choose the optimal provider (default)
claude or anthropic Force routing to Anthropic's Claude models
openai Force routing to OpenAI models
deepseek Force routing to DeepSeek models
mistral Force routing to Mistral AI models
gemini Force routing to Google Gemini models

Additional Routing Headers

Header Type Description
X-Plexor-Session-Id String Session ID for conversation continuity. Format: gw_<ULID> or sess_<ULID>
X-Plexor-Skip-Optimization Boolean Set to true to skip prompt optimization

Request Body

The request body follows Anthropic's Messages API format. Below is a complete reference of all supported fields.

Required Fields

Field Type Description
model string The model to use. Examples: claude-3-5-sonnet-20241022, claude-opus-4-5, gpt-4o
messages array Array of message objects. First message must have role user.
max_tokens integer Maximum tokens to generate. Range: 1-200000. Default: 1024

Message Object

Field Type Description
role string The role: user, assistant, system, or tool
content string | array Message content. Can be a string or array of content blocks (text, image, tool_use, tool_result)

Optional Fields

Field Type Default Description
system string | array null System prompt to guide the model's behavior
temperature float 1.0 Sampling temperature (0.0-1.0). Lower values are more deterministic.
top_p float null Nucleus sampling parameter (0.0-1.0)
top_k integer null Top-k sampling parameter
stop_sequences array null Sequences that will stop generation (max 4)
stream boolean false Enable streaming responses (SSE format)
tools array null List of tools the model can use
tool_choice string | object null How to use tools: auto, any, or specific tool

Plexor Extension Fields

You can also include Plexor-specific fields in the request body as an alternative to headers:

Field Type Description
plexor_mode string Same as X-Plexor-Mode header
plexor_provider string Same as X-Plexor-Provider header
plexor_session_id string Same as X-Plexor-Session-Id header

Response Format

Responses follow Anthropic's Messages API format with additional Plexor metadata.

Standard Response Fields

Field Type Description
id string Unique message ID (format: msg_<random>)
type string Always "message"
role string Always "assistant"
content array Array of content blocks (text and/or tool_use)
model string The model that was requested
stop_reason string Why generation stopped: end_turn, max_tokens, stop_sequence, tool_use
usage object Token usage: { input_tokens, output_tokens }

Plexor Extension Fields

Field Type Description
plexor_provider_used string The actual provider that handled the request (e.g., deepseek, anthropic)
plexor_session_id string Session ID for this conversation
plexor_cost_usd float Actual cost incurred for this request in USD
plexor_savings_usd float Cost savings compared to using Claude directly

Response Headers

Plexor also returns detailed metrics in response headers:

Header Description
X-Plexor-Request-Id Unique request identifier for debugging
X-Plexor-Session-Id Session ID for conversation continuity
X-Plexor-Mode The routing mode that was used
X-Plexor-Provider-Used The provider that handled the request
X-Plexor-Actual-Cost Actual cost in USD
X-Plexor-Baseline-Cost What the request would have cost with Claude
X-Plexor-Savings Cost savings in USD
X-Plexor-Savings-Percent Percentage cost savings
X-Plexor-Latency-Ms Total request latency in milliseconds

Example Response

JSON Response
{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello! I'm doing well, thank you for asking. How can I help you today?"
    }
  ],
  "model": "claude-3-5-sonnet-20241022",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 24
  },
  // Plexor extension fields
  "plexor_provider_used": "deepseek",
  "plexor_session_id": "gw_01HQVX8K9JNMKQP3R4STUVWX",
  "plexor_cost_usd": 0.00012,
  "plexor_savings_usd": 0.00288
}

Streaming

Plexor supports streaming responses via Server-Sent Events (SSE). Enable streaming by setting stream: true in your request body.

Streaming Request

cURL with Streaming
curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a haiku about programming"}
    ]
  }'

Stream Events

Streaming responses include the following event types:

Event Type Description
message_start Initial message metadata
content_block_start Start of a content block
content_block_delta Incremental text content
content_block_stop End of a content block
message_delta Final usage information
message_stop End of message

Error Handling

Plexor uses standard HTTP status codes and returns detailed error information in a consistent format.

HTTP Status Codes

Code Meaning Description
200 OK Request succeeded
400 Bad Request Invalid request parameters or body format
401 Unauthorized Missing or invalid API key
403 Forbidden API key valid but lacks required permissions
429 Too Many Requests Rate limit exceeded
500 Internal Server Error Server error - retry with exponential backoff
502 Bad Gateway Upstream provider error
503 Service Unavailable Service temporarily unavailable

Error Response Format

Error Response
{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid model name: 'gpt-5-ultra'. See supported models at https://docs.plexor.ai/models"
  }
}

Error Types

Type Description
authentication_error Invalid or missing API key
invalid_request_error Malformed request or invalid parameters
rate_limit_error Too many requests
api_error Internal server error
overloaded_error Service temporarily overloaded

Rate Limiting

Plexor implements rate limiting to ensure fair usage and service stability. Rate limits are applied per API key.

Rate Limit Headers

Rate limit information is returned in response headers:

Header Description
X-RateLimit-Limit Maximum requests allowed per time window
X-RateLimit-Remaining Remaining requests in current window
X-RateLimit-Reset Unix timestamp when the rate limit resets
Retry-After Seconds to wait before retrying (only on 429)
Handling Rate Limits
Implement exponential backoff when you receive a 429 response. Start with a 1 second delay and double it on each retry, up to a maximum of 60 seconds.

Code Examples

Complete examples for making API requests in various languages.

cURL

cURL
curl -X POST https://api.plexor.dev/gateway/anthropic/v1/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "X-Plexor-Mode: balanced" \
  -d '{
    "model": "claude-3-5-sonnet-20241022",
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "Explain the concept of recursion in programming."
      }
    ],
    "system": "You are a helpful programming tutor. Explain concepts clearly with examples."
  }'

Python

Python (requests)
import requests

# Configuration
API_KEY = "YOUR_API_KEY"
BASE_URL = "https://api.plexor.dev"

# Make the request
response = requests.post(
    f"{BASE_URL}/gateway/anthropic/v1/messages",
    headers={
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}",
        "X-Plexor-Mode": "balanced",
    },
    json={
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 1024,
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }
)

# Handle the response
if response.status_code == 200:
    data = response.json()
    print(f"Response: {data['content'][0]['text']}")
    print(f"Provider used: {data.get('plexor_provider_used', 'unknown')}")
    print(f"Cost: ${data.get('plexor_cost_usd', 0):.6f}")
else:
    print(f"Error: {response.status_code} - {response.text}")

Python with Anthropic SDK

Python (Anthropic SDK)
import anthropic

# Point the Anthropic SDK at Plexor
client = anthropic.Anthropic(
    api_key="YOUR_PLEXOR_API_KEY",
    base_url="https://api.plexor.dev/gateway/anthropic"
)

# Use the SDK normally - requests are routed through Plexor
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(message.content[0].text)

Node.js

Node.js (fetch)
const API_KEY = 'YOUR_API_KEY';
const BASE_URL = 'https://api.plexor.dev';

async function chat(message) {
  const response = await fetch(
    `${BASE_URL}/gateway/anthropic/v1/messages`,
    {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${API_KEY}`,
        'X-Plexor-Mode': 'balanced',
      },
      body: JSON.stringify({
        model: 'claude-3-5-sonnet-20241022',
        max_tokens: 1024,
        messages: [
          { role: 'user', content: message }
        ]
      })
    }
  );

  if (!response.ok) {
    throw new Error(`HTTP error: ${response.status}`);
  }

  const data = await response.json();

  console.log('Response:', data.content[0].text);
  console.log('Provider:', data.plexor_provider_used);
  console.log('Cost: $' + (data.plexor_cost_usd || 0).toFixed(6));

  return data;
}

// Example usage
chat('Explain JavaScript closures in simple terms.')
  .catch(console.error);

Node.js with OpenAI SDK

Node.js (OpenAI SDK)
import OpenAI from 'openai';

// Point the OpenAI SDK at Plexor's OpenAI-compatible endpoint
const client = new OpenAI({
  apiKey: 'YOUR_PLEXOR_API_KEY',
  baseURL: 'https://api.plexor.dev/gateway/openai/v1',
});

async function main() {
  const completion = await client.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      { role: 'user', content: 'Write a haiku about coding.' }
    ],
  });

  console.log(completion.choices[0].message.content);
}

main();

Streaming Example (Python)

Python (Streaming)
import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_PLEXOR_API_KEY",
    base_url="https://api.plexor.dev/gateway/anthropic"
)

# Stream the response
with client.messages.stream(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short story about a robot learning to paint."}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # Final newline
SDK Compatibility
Plexor is designed to work seamlessly with the official Anthropic and OpenAI SDKs. Simply change the base_url to point to Plexor, and all SDK features including streaming, tool use, and vision work automatically.