Errors & retries

Tomoul's error responses match the OpenAI error schema. The right retry strategy depends on the status code.

Error shape

{
  "error": {
    "type":    "rate_limit_error",
    "message": "Rate limit exceeded for model microsoft/phi-4",
    "code":    "rpm_exceeded",
    "param":   null
  }
}

Status codes

CodeMeaningRetry?
400Bad request (malformed JSON, invalid param)No — fix the request.
401Invalid or missing API keyNo — fix the key.
403Key revoked or scope insufficientNo.
404Unknown model or endpointNo.
409Model not available in pinned regionNo — change region or remove pin.
422Validation error (e.g. context too long)No.
429Rate limitedYes — honour Retry-After.
499Client cancelledNo.
500Internal errorYes — bounded backoff.
503We don't return this for capacity. If you see it, something is genuinely broken.Yes — alert your team.
504Upstream timeoutYes.

Retry strategy

Exponential backoff with jitter, starting at 250 ms, capped at 8 s, max 5 attempts. Retry only on 429, 500, 502, 503, 504. The official OpenAI SDK does this for free — if you're using it, you don't need to write the loop.

Idempotency

Send Idempotency-Key: <uuid> on POST requests to make them safe to retry. Tomoul deduplicates inside a 24-hour window. The first response wins; subsequent retries with the same key get the same response, no extra billing.

curl https://api.tomoul.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOMOUL_KEY" \
  -H "Idempotency-Key: $(uuidgen)" \
  -H "Content-Type: application/json" \
  -d '{ "model": "microsoft/phi-4", "messages": [...] }'

Request IDs

Every response has an X-Request-ID header. Quote it when you contact support — it's the fastest way for us to find your call in our logs.

Log them.

Stash X-Request-ID alongside every API call in your application logs. When a customer reports a bad answer six weeks later, you'll be glad you did.

Last updated 13 May 2026Edit this page on GitHub