Reranking
`POST /v1/rerank` — score documents against a query. Native endpoint (not OpenAI-shaped). The fastest way to lift RAG quality.
Cohere-compatible shape.
/v1/rerank is a Tomoul extension — the shape matches Cohere's rerank API,
so existing Cohere clients work after a base-URL swap.
Endpoint
POST https://api.tomoul.ai/v1/rerank
Request body
{
"model": "baai/bge-reranker-v2-m3",
"query": "What is bge-m3?",
"documents": [
"bge-m3 is BAAI's multilingual embedding model.",
"BGE stands for BAAI General Embedding.",
"Helsinki is the capital of Finland."
],
"top_n": 2
}
| Field | Type | Notes |
|---|---|---|
model | string | Reranker slug (see below). |
query | string | The query to score documents against. |
documents | string[] | Up to 1000 candidates per call. |
top_n | int (optional) | Return only the top N. Default: all candidates. |
Response
{
"results": [
{ "index": 0, "relevance_score": 0.984 },
{ "index": 1, "relevance_score": 0.812 }
],
"model": "baai/bge-reranker-v2-m3",
"usage": { "total_tokens": 64 }
}
results is sorted by relevance_score descending. index refers back to
the position in the original documents array.
Available rerankers
| Slug | Best for |
|---|---|
baai/bge-reranker-v2-m3 | Multilingual default. |
baai/bge-reranker-v2-gemma | English, highest quality. |
mixedbread/mxbai-rerank-large-v1 | Compact, fast. |
RAG pattern
Embed → top-K retrieve → rerank → generate. The rerank step typically halves the LLM context needed for the same answer quality. Full walkthrough: RAG with bge-m3 guide.
import requests def rerank(query, candidates, top_n=3): r = requests.post( "https://api.tomoul.ai/v1/rerank", headers={"Authorization": f"Bearer $TOMOUL_KEY"}, json={ "model": "baai/bge-reranker-v2-m3", "query": query, "documents": candidates, "top_n": top_n, }, ).json() return [candidates[hit["index"]] for hit in r["results"]]
Last updated 13 May 2026Edit this page on GitHub