Cudator API
One endpoint for every model — routed by policy, kept on ground you control, and settled to a single wallet. If you've used the OpenAI API, you already know how this works.
Introduction
Cudator is an OpenAI-compatible gateway. Point your base URL at https://api.cudator.ai/v1, use a Cudator key, and every request is routed to the best available model — while residency, spend limits, and request-level auditing come along for free.
- Drop-in compatible with the OpenAI SDKs — change two lines.
- Route by policy with a single header — no provider plumbing.
- Zero payload retention; every call logged with model, region, and cost.
Quickstart
Create a workspace in the console, generate a key, and send your first request. The default "auto" model lets Cudator pick the best provider for the prompt.
from openai import OpenAI
client = OpenAI(
base_url="https://api.cudator.ai/v1",
api_key="cud_live_••••••••",
)
resp = client.chat.completions.create(
model="auto",
messages=[{"role": "user",
"content": "Summarise this record."}],
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.cudator.ai/v1",
apiKey: "cud_live_••••••••",
});
const resp = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Summarise this record." }],
});
console.log(resp.choices[0].message.content);curl https://api.cudator.ai/v1/chat/completions \
-H "Authorization: Bearer cud_live_••••••••" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Summarise this record."}]
}'base_url and key — your existing SDK calls work unchanged.
Authentication
Authenticate with a workspace key passed as a bearer token. Keys are scoped to a workspace and carry their own spend limit and residency policy — rotate or revoke them anytime in the console.
Authorization: Bearer cud_live_••••••••
cud_live_keys hit live providers and meter to your wallet.cud_test_keys return mocked responses, free of charge.- Never ship keys in client code. Treat them like passwords.
Chat completions
POST /v1/chat/completions mirrors the OpenAI schema. The only Cudator-specific addition is an optional policy header that decides how the request is routed.
Request parameters
| Parameter | Description |
|---|---|
| modelrequiredstring | A specific model ID (e.g. claude-sonnet-4) or "auto" to let Cudator choose by policy, cost, and latency. |
| messagesrequiredarray | The conversation so far, as a list of role/content objects — identical to the OpenAI format. |
| streamboolean | When true, tokens are streamed back as server-sent events. Defaults to false. |
| X-Cudator-Policyheader | Routing policy to apply, e.g. sovereign-eu. Falls back to the workspace default when omitted. |
Routing policies
A policy turns "where does this run?" into a rule rather than a leap of faith. Attach one with the X-Cudator-Policy header and Cudator selects only from credentials and regions that satisfy it.
resp = client.chat.completions.create(
model="auto",
extra_headers={"X-Cudator-Policy": "sovereign-eu"},
messages=[{"role": "user",
"content": "Summarise this record."}],
)
# → served by self-hosted vLLM · eu-west-1
# → $0.0004 metered to wallet · 38msconst resp = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Summarise this record." }],
}, {
headers: { "X-Cudator-Policy": "sovereign-eu" },
});
// → served by self-hosted vLLM · eu-west-1curl https://api.cudator.ai/v1/chat/completions \
-H "Authorization: Bearer cud_live_••••••••" \
-H "X-Cudator-Policy: sovereign-eu" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Summarise this record."}]}'Common policies
sovereign-eu— restrict to EU regions and self-hosted endpoints.cheapest— optimise for lowest cost per token within quality bounds.fastest— minimise time-to-first-token across the pool.
Streaming
Set stream: true to receive tokens as server-sent events. The chunk format matches OpenAI's, so existing streaming handlers work without changes.
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Write a haiku."}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")const stream = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Write a haiku." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0].delta.content ?? "");
}Models & providers
Reference a provider model by ID, or use "auto" and let routing decide. Any OpenAI-compatible endpoint plugs in — add your own model behind a base URL in minutes.
Fetch the live list any time with GET /v1/models — it reflects exactly what your workspace policy allows.
Sovereignty & residency
Bind a workspace to approved regions and self-hosted endpoints. Out-of-region credentials are simply not in the routing pool — residency is enforced, not requested.
- Keep regulated traffic on your VPC or on-prem GPUs; let the frontier handle the rest.
- Every call is logged with model, region, and cost. Payloads are never stored.
- Export the request-level audit trail for SOC 2, HIPAA, GDPR, and ISO 27001 reviews.
Billing & usage
Every request is metered to a single wallet, regardless of which provider served it. Read usage programmatically or consolidate spend across subsidiaries into one invoice in the console.
curl https://api.cudator.ai/v1/usage?period=month \
-H "Authorization: Bearer cud_live_••••••••"
# → { "spend_usd": 1842.55, "requests": 91204,
# "by_provider": { "anthropic": 812.10, … } }
Errors
Cudator uses conventional HTTP status codes. Error bodies follow the OpenAI shape, with a code field for programmatic handling.
| Status | Meaning |
|---|---|
| 401 | Invalid or revoked key. |
| 402 | Wallet balance exhausted, or the request exceeds the key's spend limit. |
| 409 | No model in the pool satisfies the requested policy (e.g. residency conflict). |
| 429 | Rate limit exceeded — back off and retry with the Retry-After header. |
| 503 | All eligible providers are temporarily unavailable. Cudator already retried the pool. |
Rate limits
Limits are set per workspace and surfaced on every response so you can throttle proactively rather than reactively.
X-RateLimit-RemainingandX-RateLimit-Resetheaders ride along with each request.- On
429, honourRetry-Afterand apply exponential backoff.