Your first request
Three things to know before you call the API: the base URL, the auth header, and which model to ask for.
Where to send it
All endpoints live under https://api.tomoul.ai/v1. There is no staging
hostname — we run the same fleet for everyone. To pin a region, add
X-Tomoul-Region: eu-helsinki1 to the request (see
Regions & residency).
How to authenticate
Bearer token, OpenAI-style:
Authorization: Bearer $TOMOUL_KEY
Keys never appear in URLs or query params. Full details: Authentication.
What to send
For chat completions:
curl https://api.tomoul.ai/v1/chat/completions \ -H "Authorization: Bearer $TOMOUL_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "tomoul/phi-4", "messages": [ {"role": "user", "content": "Write a haiku about a bird called tomoul."} ] }'
For embeddings:
curl https://api.tomoul.ai/v1/embeddings \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "baai/bge-m3", "input": "Habari za asubuhi."}'
What you get back
A JSON object that matches OpenAI's response schema for the endpoint.
- Chat: the text lives at
choices[0].message.content. - Embeddings: the vector lives at
data[0].embedding. - Every response carries
usage.prompt_tokens,usage.completion_tokens, andusage.total_tokensso you can reconcile billing against your invoice.
Every response also includes an X-Request-ID header. Quote it when you
contact support — it's the fastest way to find your call in our logs.
Streaming
Add "stream": true to a chat-completions request and read the response as
Server-Sent Events. Each chunk is a JSON object on a data: line; the stream
ends with data: [DONE]. Full pattern with cancellation handling:
Streaming completions guide.