Reranking#

Learn how to use Termite's reranking capabilities to improve search result relevance.

Overview#

Reranking uses cross-encoder models to re-score search results based on query relevance. Unlike bi-encoder models (used for initial retrieval), cross-encoders process the query and document together for more accurate relevance scoring.

When to Use Reranking#

  • After initial retrieval - Rerank top-N results from vector search
  • Hybrid search - Combine BM25 and vector results, then rerank
  • RAG pipelines - Select the most relevant context for LLM prompts

Supported Models#

ModelDescription
BAAI/bge-reranker-v2-m3Multilingual, high quality
BAAI/bge-reranker-baseFast, English-focused
cross-encoder/ms-marco-MiniLM-L-6-v2Lightweight, fast

Quick Start#

curl -X POST http://localhost:8082/api/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "BAAI/bge-reranker-v2-m3",
    "query": "machine learning applications",
    "prompts": [
      "Introduction to Machine Learning: This guide covers...",
      "Deep Learning Fundamentals: Neural networks are...",
      "Cooking recipes for beginners: Start with..."
    ]
  }'

Response#

{
  "model": "BAAI/bge-reranker-v2-m3",
  "scores": [0.92, 0.87, 0.12]
}

The scores indicate relevance to the query. Sort results by score descending to get the most relevant documents first.

Best Practices#

Limit Input Size#

Cross-encoders are computationally expensive. Best practices:

  1. Pre-filter - Use vector search to get top 50-100 candidates
  2. Rerank top-N - Rerank only top 10-20 results for final ranking
  3. Truncate long documents - Most models have 512 token limit

Prompt Preparation#

The client is responsible for preparing prompts:

# Extract and format document fields before sending
prompts = [
    f"Title: {doc['title']}\n{doc['content'][:500]}"
    for doc in search_results
]

Quantized Models#

Use quantized model variants for faster inference:

{
  "model": "BAAI/bge-reranker-v2-m3:i8",
  "query": "...",
  "prompts": [...]
}

Integration with Antfly#

Configure Antfly to use Termite for reranking:

query:
  reranker:
    enabled: true
    termite_url: http://localhost:8082
    model: BAAI/bge-reranker-v2-m3
    top_n: 10

Example: Two-Stage Retrieval#

# Stage 1: Fast vector search
results = antfly.query(
    query="machine learning",
    limit=50
)

# Stage 2: Rerank top results
prompts = [r.content for r in results]
reranked = termite.rerank(
    model="BAAI/bge-reranker-v2-m3",
    query="machine learning",
    prompts=prompts
)

# Sort by reranking scores
final_results = sorted(
    zip(results, reranked.scores),
    key=lambda x: x[1],
    reverse=True
)[:10]

Next Steps#