cudator.ai / docs / introduction

Cudator API

One endpoint for every model — routed by policy, kept on ground you control, and settled to a single wallet. If you've used the OpenAI API, you already know how this works.

Introduction

Cudator is an OpenAI-compatible gateway. Point your base URL at https://api.cudator.ai/v1, use a Cudator key, and every request is routed to the best available model — while residency, spend limits, and request-level auditing come along for free.

  • Drop-in compatible with the OpenAI SDKs — change two lines.
  • Route by policy with a single header — no provider plumbing.
  • Zero payload retention; every call logged with model, region, and cost.

Quickstart

Create a workspace in the console, generate a key, and send your first request. The default "auto" model lets Cudator pick the best provider for the prompt.

route.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.cudator.ai/v1",
    api_key="cud_live_••••••••",
)

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user",
               "content": "Summarise this record."}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.cudator.ai/v1",
  apiKey: "cud_live_••••••••",
});

const resp = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Summarise this record." }],
});
console.log(resp.choices[0].message.content);
curl https://api.cudator.ai/v1/chat/completions \
  -H "Authorization: Bearer cud_live_••••••••" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Summarise this record."}]
  }'
Already built on OpenAI? Change the base_url and key — your existing SDK calls work unchanged.

Authentication

Authenticate with a workspace key passed as a bearer token. Keys are scoped to a workspace and carry their own spend limit and residency policy — rotate or revoke them anytime in the console.

header
Authorization: Bearer cud_live_••••••••
  • cud_live_ keys hit live providers and meter to your wallet. cud_test_ keys return mocked responses, free of charge.
  • Never ship keys in client code. Treat them like passwords.

Chat completions

POST /v1/chat/completions mirrors the OpenAI schema. The only Cudator-specific addition is an optional policy header that decides how the request is routed.

Request parameters

ParameterDescription
modelrequiredstring A specific model ID (e.g. claude-sonnet-4) or "auto" to let Cudator choose by policy, cost, and latency.
messagesrequiredarray The conversation so far, as a list of role/content objects — identical to the OpenAI format.
streamboolean When true, tokens are streamed back as server-sent events. Defaults to false.
X-Cudator-Policyheader Routing policy to apply, e.g. sovereign-eu. Falls back to the workspace default when omitted.

Routing policies

A policy turns "where does this run?" into a rule rather than a leap of faith. Attach one with the X-Cudator-Policy header and Cudator selects only from credentials and regions that satisfy it.

policy.py
resp = client.chat.completions.create(
    model="auto",
    extra_headers={"X-Cudator-Policy": "sovereign-eu"},
    messages=[{"role": "user",
               "content": "Summarise this record."}],
)

# → served by self-hosted vLLM · eu-west-1
# → $0.0004 metered to wallet · 38ms
const resp = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Summarise this record." }],
}, {
  headers: { "X-Cudator-Policy": "sovereign-eu" },
});

// → served by self-hosted vLLM · eu-west-1
curl https://api.cudator.ai/v1/chat/completions \
  -H "Authorization: Bearer cud_live_••••••••" \
  -H "X-Cudator-Policy: sovereign-eu" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Summarise this record."}]}'

Common policies

  • sovereign-eu — restrict to EU regions and self-hosted endpoints.
  • cheapest — optimise for lowest cost per token within quality bounds.
  • fastest — minimise time-to-first-token across the pool.

Streaming

Set stream: true to receive tokens as server-sent events. The chunk format matches OpenAI's, so existing streaming handlers work without changes.

stream.py
stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Write a haiku."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
const stream = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "Write a haiku." }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Models & providers

Reference a provider model by ID, or use "auto" and let routing decide. Any OpenAI-compatible endpoint plugs in — add your own model behind a base URL in minutes.

Ogpt-4oOpenAI · frontier
Cclaude-sonnet-4Anthropic · frontier
Ggemini-2.5-proGoogle · frontier
Lllama-3.1-70bSelf-hosted · vLLM

Fetch the live list any time with GET /v1/models — it reflects exactly what your workspace policy allows.

Sovereignty & residency

Bind a workspace to approved regions and self-hosted endpoints. Out-of-region credentials are simply not in the routing pool — residency is enforced, not requested.

  • Keep regulated traffic on your VPC or on-prem GPUs; let the frontier handle the rest.
  • Every call is logged with model, region, and cost. Payloads are never stored.
  • Export the request-level audit trail for SOC 2, HIPAA, GDPR, and ISO 27001 reviews.

Billing & usage

Every request is metered to a single wallet, regardless of which provider served it. Read usage programmatically or consolidate spend across subsidiaries into one invoice in the console.

usage
curl https://api.cudator.ai/v1/usage?period=month \
  -H "Authorization: Bearer cud_live_••••••••"

# → { "spend_usd": 1842.55, "requests": 91204,
#     "by_provider": { "anthropic": 812.10, … } }

Errors

Cudator uses conventional HTTP status codes. Error bodies follow the OpenAI shape, with a code field for programmatic handling.

StatusMeaning
401Invalid or revoked key.
402Wallet balance exhausted, or the request exceeds the key's spend limit.
409No model in the pool satisfies the requested policy (e.g. residency conflict).
429Rate limit exceeded — back off and retry with the Retry-After header.
503All eligible providers are temporarily unavailable. Cudator already retried the pool.

Rate limits

Limits are set per workspace and surfaced on every response so you can throttle proactively rather than reactively.

  • X-RateLimit-Remaining and X-RateLimit-Reset headers ride along with each request.
  • On 429, honour Retry-After and apply exponential backoff.