Build with Tomoul.

An OpenAI-compatible inference cloud, hand-tuned on Zig. Embeddings, rerankers, and small models that get out of your way. EU at launch — Lagos and Nairobi land H2 2026.

↗

Migrating from OpenAI?

Most teams point their existing SDK at https://api.tomoul.ai/v1, swap the API key, and ship. Read the 3-minute migration →

Choose your path

Local first, with the CLI

One static binary. Run bge-m3 on your laptop in 90 seconds. Then flip to --cloud when you need it.

tomoul serve →2

Cloud, OpenAI-compatible

Drop our base URL into the OpenAI SDK. Same chat.completions, same embeddings, lower spend.

First request →3

Build a RAG pipeline

End-to-end guide: embed with bge-m3, rerank with bge-reranker, generate with phi-4.

RAG guide →

90-second quickstart

The fastest path is to point any OpenAI-compatible client at our base URL. Get a key from the console, set TOMOUL_KEY, and:

# Embed something. No SDK required.
curl https://api.tomoul.ai/v1/embeddings \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "baai/bge-m3",
  "input": "Habari za asubuhi."
}'

API reference

Every endpoint mirrors the OpenAI spec it imitates, plus a handful of Tomoul-native extensions for region pinning and deterministic seeds.

POST/v1/chat/completionsstream POST/v1/embeddingsbatch POST/v1/reranknative POST/v1/audio/transcriptions GET/v1/modelslist GET/v1/usagenative

Where to get help

The fastest way to a real answer is our Discord (real engineers, on East Africa

EU hours) or [email protected] — a human reads it. Status & incident history at status.tomoul.com.

Quickstart

Last updated 12 May 2026Edit this page on GitHub