Reranking

`POST /v1/rerank` — score documents against a query. Native endpoint (not OpenAI-shaped). The fastest way to lift RAG quality.

Cohere-compatible shape.

/v1/rerank is a Tomoul extension — the shape matches Cohere's rerank API, so existing Cohere clients work after a base-URL swap.

Endpoint

POST https://api.tomoul.ai/v1/rerank

Request body

{
  "model": "baai/bge-reranker-v2-m3",
  "query": "What is bge-m3?",
  "documents": [
    "bge-m3 is BAAI's multilingual embedding model.",
    "BGE stands for BAAI General Embedding.",
    "Helsinki is the capital of Finland."
  ],
  "top_n": 2
}
FieldTypeNotes
modelstringReranker slug (see below).
querystringThe query to score documents against.
documentsstring[]Up to 1000 candidates per call.
top_nint (optional)Return only the top N. Default: all candidates.

Response

{
  "results": [
    { "index": 0, "relevance_score": 0.984 },
    { "index": 1, "relevance_score": 0.812 }
  ],
  "model": "baai/bge-reranker-v2-m3",
  "usage": { "total_tokens": 64 }
}

results is sorted by relevance_score descending. index refers back to the position in the original documents array.

Available rerankers

SlugBest for
baai/bge-reranker-v2-m3Multilingual default.
baai/bge-reranker-v2-gemmaEnglish, highest quality.
mixedbread/mxbai-rerank-large-v1Compact, fast.

RAG pattern

Embed → top-K retrieve → rerank → generate. The rerank step typically halves the LLM context needed for the same answer quality. Full walkthrough: RAG with bge-m3 guide.

import requests

def rerank(query, candidates, top_n=3):
  r = requests.post(
      "https://api.tomoul.ai/v1/rerank",
      headers={"Authorization": f"Bearer $TOMOUL_KEY"},
      json={
          "model": "baai/bge-reranker-v2-m3",
          "query": query,
          "documents": candidates,
          "top_n": top_n,
      },
  ).json()
  return [candidates[hit["index"]] for hit in r["results"]]
Last updated 13 May 2026Edit this page on GitHub