Authentication

Tomoul uses bearer-token auth — same shape as OpenAI. Every request to `api.tomoul.ai/v1` needs an `Authorization: Bearer` header.

Get a key

Create a key from the console. Keys are scoped to one organization and one set of models. Tomoul will only ever show the full key once — copy it out, store it in a secrets manager.

Heads-up.

Tomoul keys start with tomoul_sk_. If yours starts with sk-, that's an OpenAI key in the wrong env var.

Make a request

Set TOMOUL_KEY in your environment and call any endpoint with the bearer header. The example below hits chat completions; the same header works for embeddings, rerank, and audio.

curl https://api.tomoul.ai/v1/chat/completions \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "tomoul/inkubalm-0.4b",
  "messages": [{"role": "user", "content": "Habari"}]
}'

SDK swap from OpenAI

If you already use the OpenAI SDK, only the base URL and key change.

from openai import OpenAI

client = OpenAI(
  api_key="$TOMOUL_KEY",
  base_url="https://api.tomoul.ai/v1",
)
resp = client.chat.completions.create(
  model="tomoul/phi-4",
  messages=[{"role": "user", "content": "Hi"}],
)

Rotating keys

Keys can be rotated at any time from the console. Old keys keep working for up to 24 hours after you mark them stale — enough time to redeploy. Hard-revoke is one click if you suspect compromise.

Per-key rate limits

Each key has its own rate-limit envelope, defaulting to your plan's ceiling. Buckets are per-key, per-region, per-model — a noisy key on eu-helsinki1 doesn't back-pressure another key on eu-frankfurt1.

Responses include X-RateLimit-Remaining, X-RateLimit-Reset, and on throttle a 429 with Retry-After — never a 503.

Last updated 12 May 2026Edit this page on GitHub