Errors & retries

Tomoul's error responses match the OpenAI error schema. The right retry strategy depends on the status code.

Error shape

{
  "error": {
    "type":    "rate_limit_error",
    "message": "Rate limit exceeded for model microsoft/phi-4",
    "code":    "rpm_exceeded",
    "param":   null
  }
}

Status codes

Code	Meaning	Retry?
`400`	Bad request (malformed JSON, invalid param)	No — fix the request.
`401`	Invalid or missing API key	No — fix the key.
`403`	Key revoked or scope insufficient	No.
`404`	Unknown model or endpoint	No.
`409`	Model not available in pinned region	No — change region or remove pin.
`422`	Validation error (e.g. context too long)	No.
`429`	Rate limited	Yes — honour `Retry-After`.
`499`	Client cancelled	No.
`500`	Internal error	Yes — bounded backoff.
`503`	We don't return this for capacity. If you see it, something is genuinely broken.	Yes — alert your team.
`504`	Upstream timeout	Yes.

Retry strategy

Exponential backoff with jitter, starting at 250 ms, capped at 8 s, max 5 attempts. Retry only on 429, 500, 502, 503, 504. The official OpenAI SDK does this for free — if you're using it, you don't need to write the loop.

Idempotency

Send Idempotency-Key: <uuid> on POST requests to make them safe to retry. Tomoul deduplicates inside a 24-hour window. The first response wins; subsequent retries with the same key get the same response, no extra billing.

curl https://api.tomoul.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOMOUL_KEY" \
  -H "Idempotency-Key: $(uuidgen)" \
  -H "Content-Type: application/json" \
  -d '{ "model": "microsoft/phi-4", "messages": [...] }'

Request IDs

Every response has an X-Request-ID header. Quote it when you contact support — it's the fastest way for us to find your call in our logs.

Log them.

Stash X-Request-ID alongside every API call in your application logs. When a customer reports a bad answer six weeks later, you'll be glad you did.

← Previous

Rate limits

Regions

Last updated 13 May 2026Edit this page on GitHub