Migrating from OpenAI

Three minutes. Two changes. Most OpenAI codebases run against Tomoul after a base-URL swap and a key swap. Here's the exact diff.

The two-line swap

Python — set the base URL, swap the key, ship.

# Before
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# After
from openai import OpenAI
client = OpenAI(
  api_key=os.environ["TOMOUL_KEY"],
  base_url="https://api.tomoul.ai/v1",
)

Everything else — chat.completions.create, embeddings.create, streaming, function calling, JSON mode — works unchanged.

Picking a Tomoul model

There is no gpt-4o. Map by use case:

OpenAITomoul equivalentNotes
gpt-4o-minimicrosoft/phi-4Strong reasoning, 14B, much cheaper.
gpt-4oqwen/qwen3-30b-a3b or openai/gpt-oss-120bLarger context, top-20 quality.
text-embedding-3-smallbaai/bge-m3Multilingual, cheaper.
text-embedding-3-largeintfloat/e5-mistral-7b-instructPremium quality.
whisper-1openai/whisper-large-v3Same model, in EU.

The live catalog — with pricing, regions, and capability flags — is at GET /v1/models.

Behavioural differences

A handful of quirks to budget for:

  • No gpt- prefix. Models use provider/model slugs.
  • seed is deterministic only on Tomoul-exclusive models. Third-party models route on best-effort.
  • Token counts differ. Different tokenizers — plan for ±15% drift versus your current OpenAI bill on the same prompts.
  • Rate-limit headers match OpenAI's shape. See Rate limits.
  • Streaming is identical — Server-Sent Events, data: [DONE] terminator.
Heads-up.

Run your existing test suite against the Tomoul base URL on a feature branch before flipping production. Most teams find one or two spots that hard-coded an OpenAI-only model name.

What we don't do

A short list to plan migrations around:

  • No Assistants / Threads API. Stateful assistants aren't on the roadmap. Build state in your app, or use Files for blob storage.
  • No Realtime API. Voice / realtime is a separate roadmap item — not at launch.
  • No image generation. Use the audio transcription endpoint or wait for Phase 2.

Everything else — chat, embeddings, function calling, JSON mode, tools — works the same. The full list is in SDKs & clients.

Last updated 13 May 2026Edit this page on GitHub