Models & deployments
Tomoul hosts open-weights models. Every model is identified by a `provider/model-name` slug — the same string you pass as `model` in any API call.
Model slugs
Tomoul model slugs have the form provider/model-name[-variant]:
baai/bge-m3
tomoul/inkubalm-0.4b
microsoft/phi-4
openai/gpt-oss-20b
The provider segment matches the model's origin (BAAI, Microsoft, Qwen) —
except for Tomoul exclusives, which use the tomoul/ prefix. Use
GET /v1/models to list every slug live.
The catalog
The Day-1 catalog spans three tiers:
| Tier | Examples | Why it's here |
|---|---|---|
| Embeddings & rerankers | baai/bge-m3, baai/bge-reranker-v2-m3, alibaba/gte-qwen2-1.5b | Under-served on OpenRouter; first-class here. |
| Trending small/mid LLMs | microsoft/phi-4, openai/gpt-oss-20b, qwen/qwen3-30b-a3b | Top-20-adjacent. Sparse providers. |
| Tomoul exclusives | tomoul/inkubalm-0.4b, partner-hosted African-language models | 1-of-1 hosting. The moat. |
Full live catalog: models page.
Shared vs dedicated
- Shared. Default. Pay per token. Cold-start handled for you. Best for most workloads.
- Dedicated (Phase 2). Reserved capacity, hourly billing, your own pod. Use when you need predictable latency or quantization-specific routing.
At launch, every endpoint runs on shared infra. Dedicated is on the roadmap — not yet bookable.
Versions and pinning
Each model has a base slug (microsoft/phi-4) and an optional version pin
(microsoft/phi-4@2025-12-01). The base slug always points at our recommended
version. Pin a version when you need bit-stable outputs across deploys.
curl https://api.tomoul.ai/v1/chat/completions \
-H "Authorization: Bearer $TOMOUL_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/phi-4@2025-12-01",
"messages": [...]
}'
Lifecycle and deprecation
We give 90 days' notice before retiring any version. Deprecations show up in two places:
GET /v1/modelsincludes adeprecated_atfield on the affected version.- Calls to a deprecated model return a
Tomoul-Deprecatedresponse header.
Wire the header into your monitoring — by the time you notice the 90-day clock has run out, it's already running.