Build with Tomoul.
An OpenAI-compatible inference cloud, hand-tuned on Zig. Embeddings, rerankers, and small models that get out of your way. EU at launch — Lagos and Nairobi land H2 2026.
Most teams point their existing SDK at https://api.tomoul.ai/v1, swap the API key, and ship.
Read the 3-minute migration →
Choose your path
Local first, with the CLI
One static binary. Run bge-m3 on your laptop in 90 seconds.
Then flip to --cloud when you need it.
Cloud, OpenAI-compatible
Drop our base URL into the OpenAI SDK. Same chat.completions,
same embeddings, lower spend.
Build a RAG pipeline
End-to-end guide: embed with bge-m3, rerank with
bge-reranker, generate with phi-4.
90-second quickstart
The fastest path is to point any OpenAI-compatible client at our base URL.
Get a key from the console, set TOMOUL_KEY, and:
# Embed something. No SDK required. curl https://api.tomoul.ai/v1/embeddings \ -H "Authorization: Bearer $TOMOUL_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "baai/bge-m3", "input": "Habari za asubuhi." }'
API reference
Every endpoint mirrors the OpenAI spec it imitates, plus a handful of Tomoul-native extensions for region pinning and deterministic seeds.
Where to get help
The fastest way to a real answer is our Discord (real engineers, on East Africa
- EU hours) or [email protected] — a human reads it. Status & incident history at status.tomoul.com.