Limits & quotas
Account-level ceilings that are separate from per-request rate limits — how many keys, how many parallel deploys, how big a request body.
Account limits
| Limit | Free / PAYG | Scale |
|---|---|---|
| API keys | 10 | 100 |
| Webhook endpoints | 5 | 50 |
| Audit log retention | 90 days | 1 year |
| Spend cap | Self-set | Self-set |
Per-request limits
| Limit | Value |
|---|---|
| Request body | 10 MB |
| Embedding batch size | 2048 inputs |
Chat messages array | 256 messages |
| File upload | 25 MB (audio), 100 MB (batch) |
| Streaming idle timeout | 60 s |
These are hard limits, enforced at the edge. Hitting them returns 422 with
a precise error message — not 400.
Raising a limit
Contact us with:
- The limit you want lifted.
- The workload that's hitting it.
- The projected scale (peak, sustained, retention period).
Most reasonable lifts go through the day they're asked. The exceptions are limits tied to abuse vectors (key count, webhook fan-out) where we may want to talk first.
For per-request rate limits (RPM / TPM), see Rate limits.
Last updated 13 May 2026Edit this page on GitHub