Reranking
Learn how to use Termite's reranking capabilities to improve search result relevance.
Overview
Reranking uses cross-encoder models to re-score search results based on query relevance. Unlike bi-encoder models (used for initial retrieval), cross-encoders process the query and document together for more accurate relevance scoring.
When to Use Reranking
- After initial retrieval - Rerank top-N results from vector search
- Hybrid search - Combine BM25 and vector results, then rerank
- RAG pipelines - Select the most relevant context for LLM prompts
Supported Models
| Model | Description |
|---|---|
BAAI/bge-reranker-v2-m3 | Multilingual, high quality |
BAAI/bge-reranker-base | Fast, English-focused |
cross-encoder/ms-marco-MiniLM-L-6-v2 | Lightweight, fast |
Quick Start
curl -X POST http://localhost:8082/api/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "BAAI/bge-reranker-v2-m3",
"query": "machine learning applications",
"prompts": [
"Introduction to Machine Learning: This guide covers...",
"Deep Learning Fundamentals: Neural networks are...",
"Cooking recipes for beginners: Start with..."
]
}'Response
{
"model": "BAAI/bge-reranker-v2-m3",
"scores": [0.92, 0.87, 0.12]
}The scores indicate relevance to the query. Sort results by score descending to get the most relevant documents first.
Best Practices
Limit Input Size
Cross-encoders are computationally expensive. Best practices:
- Pre-filter - Use vector search to get top 50-100 candidates
- Rerank top-N - Rerank only top 10-20 results for final ranking
- Truncate long documents - Most models have 512 token limit
Prompt Preparation
The client is responsible for preparing prompts:
# Extract and format document fields before sending
prompts = [
f"Title: {doc['title']}\n{doc['content'][:500]}"
for doc in search_results
]Quantized Models
Use quantized model variants for faster inference:
{
"model": "BAAI/bge-reranker-v2-m3:i8",
"query": "...",
"prompts": [...]
}Integration with Antfly
Configure Antfly to use Termite for reranking:
query:
reranker:
enabled: true
termite_url: http://localhost:8082
model: BAAI/bge-reranker-v2-m3
top_n: 10Example: Two-Stage Retrieval
# Stage 1: Fast vector search
results = antfly.query(
query="machine learning",
limit=50
)
# Stage 2: Rerank top results
prompts = [r.content for r in results]
reranked = termite.rerank(
model="BAAI/bge-reranker-v2-m3",
query="machine learning",
prompts=prompts
)
# Sort by reranking scores
final_results = sorted(
zip(results, reranked.scores),
key=lambda x: x[1],
reverse=True
)[:10]Next Steps
- API Reference - Reranking API details
- Embedding Models - Initial retrieval
- Chunking - Prepare documents for search