Cudator docs — One API for every model

Introduction

Cudator is an OpenAI-compatible gateway. Point your base URL at https://api.cudator.ai/v1, use a Cudator key, and every request is routed to the best available model — while residency, spend limits, and request-level auditing come along for free.

Drop-in compatible with the OpenAI SDKs — change two lines.
Route by policy with a single header — no provider plumbing.
Zero payload retention; every call logged with model, region, and cost.

Quickstart

Create a workspace in the console, generate a key, and send your first request. The default "auto" model lets Cudator pick the best provider for the prompt.

route.py

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cudator.ai/v1",
    api_key="cud_live_••••••••",
)

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user",
               "content": "Summarise this record."}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.cudator.ai/v1",
  apiKey: "cud_live_••••••••",
});

const resp = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Summarise this record." }],
});
console.log(resp.choices[0].message.content);

curl https://api.cudator.ai/v1/chat/completions \
  -H "Authorization: Bearer cud_live_••••••••" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Summarise this record."}]
  }'

Already built on OpenAI? Change the base_url and key — your existing SDK calls work unchanged.

Authentication

Authenticate with a workspace key passed as a bearer token. Keys are scoped to a workspace and carry their own spend limit and residency policy — rotate or revoke them anytime in the console.

            
            header
          
Authorization: Bearer cud_live_••••••••

cud_live_ keys hit live providers and meter to your wallet. cud_test_ keys return mocked responses, free of charge.
Never ship keys in client code. Treat them like passwords.

Chat completions

POST /v1/chat/completions mirrors the OpenAI schema. The only Cudator-specific addition is an optional policy header that decides how the request is routed.

Request parameters

Parameter	Description
modelrequiredstring	A specific model ID (e.g. `claude-sonnet-4`) or `"auto"` to let Cudator choose by policy, cost, and latency.
messagesrequiredarray	The conversation so far, as a list of role/content objects — identical to the OpenAI format.
streamboolean	When `true`, tokens are streamed back as server-sent events. Defaults to `false`.
X-Cudator-Policyheader	Routing policy to apply, e.g. `sovereign-eu`. Falls back to the workspace default when omitted.

Routing policies

A policy turns "where does this run?" into a rule rather than a leap of faith. Attach one with the X-Cudator-Policy header and Cudator selects only from credentials and regions that satisfy it.

policy.py

resp = client.chat.completions.create(
    model="auto",
    extra_headers={"X-Cudator-Policy": "sovereign-eu"},
    messages=[{"role": "user",
               "content": "Summarise this record."}],
)

# → served by self-hosted vLLM · eu-west-1
# → $0.0004 metered to wallet · 38ms

const resp = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Summarise this record." }],
}, {
  headers: { "X-Cudator-Policy": "sovereign-eu" },
});

// → served by self-hosted vLLM · eu-west-1

curl https://api.cudator.ai/v1/chat/completions \
  -H "Authorization: Bearer cud_live_••••••••" \
  -H "X-Cudator-Policy: sovereign-eu" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Summarise this record."}]}'

Common policies

sovereign-eu — restrict to EU regions and self-hosted endpoints.
cheapest — optimise for lowest cost per token within quality bounds.
fastest — minimise time-to-first-token across the pool.

Streaming

Set stream: true to receive tokens as server-sent events. The chunk format matches OpenAI's, so existing streaming handlers work without changes.

stream.py

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

const stream = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Write a haiku." }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Models & providers

Reference a provider model by ID, or use "auto" and let routing decide. Any OpenAI-compatible endpoint plugs in — add your own model behind a base URL in minutes.

Ogpt-4oOpenAI · frontier

Cclaude-sonnet-4Anthropic · frontier

Ggemini-2.5-proGoogle · frontier

Lllama-3.1-70bSelf-hosted · vLLM

Fetch the live list any time with GET /v1/models — it reflects exactly what your workspace policy allows.

Sovereignty & residency

Bind a workspace to approved regions and self-hosted endpoints. Out-of-region credentials are simply not in the routing pool — residency is enforced, not requested.

Keep regulated traffic on your VPC or on-prem GPUs; let the frontier handle the rest.
Every call is logged with model, region, and cost. Payloads are never stored.
Export the request-level audit trail for GDPR and your own compliance reviews.

Billing & usage

Every request is metered to a single wallet, regardless of which provider served it. Read usage programmatically or consolidate spend across subsidiaries into one invoice in the console.

            
            usage
          
curl https://api.cudator.ai/v1/usage?period=month \
  -H "Authorization: Bearer cud_live_••••••••"

# → { "spend_usd": 1842.55, "requests": 91204,
#     "by_provider": { "anthropic": 812.10, … } }

Errors

Cudator uses conventional HTTP status codes. Error bodies follow the OpenAI shape, with a code field for programmatic handling.

Status	Meaning
401	Invalid or revoked key.
402	Wallet balance exhausted, or the request exceeds the key's spend limit.
409	No model in the pool satisfies the requested policy (e.g. residency conflict).
429	Rate limit exceeded — back off and retry with the `Retry-After` header.
503	All eligible providers are temporarily unavailable. Cudator already retried the pool.

Rate limits

Limits are set per workspace and surfaced on every response so you can throttle proactively rather than reactively.

X-RateLimit-Remaining and X-RateLimit-Reset headers ride along with each request.
On 429, honour Retry-After and apply exponential backoff.

Get started Create a workspace Console Open the dashboard

Cudator API

Introduction

Quickstart

Authentication

Chat completions

Request parameters

Routing policies

Common policies

Streaming

Models & providers

Sovereignty & residency

Billing & usage

Errors

Rate limits