Embeddings
`POST /v1/embeddings` — convert text to vectors. Tomoul's embeddings catalog is wider than most OpenAI-compat providers: bge-m3, jina-v3, e5-mistral, gte-qwen, mxbai.
Endpoint
POST https://api.tomoul.ai/v1/embeddings
Request body
{
"model": "baai/bge-m3",
"input": ["Habari za asubuhi.", "Good morning."]
}
input accepts a string or an array of strings. Max batch: 2048 inputs,
8192 tokens per input (model-dependent — check context_length in
GET /v1/models).
Response
{
"object": "list",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.012, -0.034, ...] },
{ "object": "embedding", "index": 1, "embedding": [0.045, 0.011, ...] }
],
"model": "baai/bge-m3",
"usage": { "prompt_tokens": 7, "total_tokens": 7 }
}
Available embedding models
| Slug | Dimensions | Max input | Best for |
|---|---|---|---|
baai/bge-m3 | 1024 | 8192 | Multilingual default. Strong on 100+ languages incl. Swahili, Yoruba. |
baai/bge-large-en-v1.5 | 1024 | 512 | English-only, fastest. |
jinaai/jina-embeddings-v3 | 1024 | 8192 | Multimodal (text + image). |
intfloat/e5-mistral-7b-instruct | 4096 | 4096 | Premium quality, retrieval-tuned. |
alibaba/gte-qwen2-1.5b-instruct | 1536 | 8192 | Strong quality / cost. |
mixedbread/mxbai-embed-large | 1024 | 512 | Compact, fast. |
Output dimensions
Matryoshka-style models (bge-m3, jina-v3) accept a dimensions parameter
to truncate the output. Smaller dimensions = smaller storage, slightly worse
recall.
{ "model": "baai/bge-m3", "input": "...", "dimensions": 512 }
Batching
Send up to 2048 inputs per request. Tomoul packs them onto the same GPU forward pass — batching is dramatically cheaper than one-call-per-input.
For a typical RAG indexing job (~1M short documents), batches of 256–512 inputs hit the latency/throughput sweet spot. Going bigger is fine, going smaller wastes spend.