Your first request

Three things to know before you call the API: the base URL, the auth header, and which model to ask for.

Where to send it

All endpoints live under https://api.tomoul.ai/v1. There is no staging hostname — we run the same fleet for everyone. To pin a region, add X-Tomoul-Region: eu-helsinki1 to the request (see Regions & residency).

How to authenticate

Bearer token, OpenAI-style:

Authorization: Bearer $TOMOUL_KEY

Keys never appear in URLs or query params. Full details: Authentication.

What to send

For chat completions:

curl https://api.tomoul.ai/v1/chat/completions \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "tomoul/phi-4",
  "messages": [
    {"role": "user", "content": "Write a haiku about a bird called tomoul."}
  ]
}'

For embeddings:

curl https://api.tomoul.ai/v1/embeddings \
  -H "Authorization: Bearer $TOMOUL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "baai/bge-m3", "input": "Habari za asubuhi."}'

What you get back

A JSON object that matches OpenAI's response schema for the endpoint.

  • Chat: the text lives at choices[0].message.content.
  • Embeddings: the vector lives at data[0].embedding.
  • Every response carries usage.prompt_tokens, usage.completion_tokens, and usage.total_tokens so you can reconcile billing against your invoice.

Every response also includes an X-Request-ID header. Quote it when you contact support — it's the fastest way to find your call in our logs.

Streaming

Add "stream": true to a chat-completions request and read the response as Server-Sent Events. Each chunk is a JSON object on a data: line; the stream ends with data: [DONE]. Full pattern with cancellation handling: Streaming completions guide.

Last updated 13 May 2026Edit this page on GitHub