Pricing & metering

Per-token billing. No subscriptions, no minimums, no hidden egress. You pay for what your requests actually consume.

What we meter

Chat completions — input tokens and output tokens, priced independently.
Embeddings — input tokens only.
Rerank — query tokens + document tokens (combined).
Audio transcription — audio seconds.

Every response carries a usage object with the exact counts that hit your invoice — reconcile against your own metering at any time.

Where to find rates

Live rates are on the pricing page. Programmatic access via GET /v1/models — every entry includes a pricing object with prompt_per_million, completion_per_million, and cache_read_per_million.

Prompt-cache pricing

When you send the same prefix twice within 5 minutes, the second call hits our prompt cache. Cached tokens are billed at ~10% of normal input rates and show up as a separate line on your invoice and in usage.cache_read_tokens.

Caching is automatic.

You don't opt in. If you want to opt out for a specific call, send X-Tomoul-Cache: no-store.

Billing cycles

We bill on the first of each month for the prior month's usage. Auto top-up converts a prepaid balance instead — useful if your procurement team prefers credits.

Invoices ship as PDF and JSON. The JSON form is available via the /v1/usage endpoint for programmatic reconciliation.

Currency

Invoices are in USD at launch. Local-currency display (NGN, KES, ZAR, EGP, GHS, EUR) ships with the local-rails wave (Flutterwave, Paystack, M-Pesa). Both invoice currency and payment method are independent settings in the console.

← Previous

Models

Rate limits

Last updated 13 May 2026Edit this page on GitHub