- How do I install Antfly?
- How do I start Antfly for local development?
- How do I create my first table and index?
- How do I load data into Antfly?
- How do I run a search query?
- How do I use RAG with Termite for AI-powered answers?
AntflyDB is a document database built on top of Raft for distribution, Pebble for persistance, Bleve for full text indexing, and SPANN for vector indexing. This guide will help you get started with setting up your AntflyDB instance!
Download Antfly and CLI
Visit the Downloads page and follow the instructions for the your platform.
Download Models
Pull the embedding models and generation model using Termite (Antfly's built-in ML inference service):
antfly termite pull --variants i8 BAAI/bge-small-en-v1.5 openai/clip-vit-base-patch32 mixedbread-ai/mxbai-rerank-base-v1
antfly termite pull hf:onnx-community/gemma-3-270m-it-ONNXBAAI/bge-small-en-v1.5is a small, fast text embedding model (384 dimensions) for semantic search, INT8 quantized for fast CPU inferenceopenai/clip-vit-base-patch32is a multimodal embedding model (512 dimensions) for image searchmixedbread-ai/mxbai-rerank-base-v1is a cross-encoder reranker for improving search result relevanceonnx-community/gemma-3-270m-it-ONNXis a small (~270M parameter) Gemma 3 model for query classification
Model size matters for generation. The onnx-community/gemma-3-270m-it-ONNX model is small and fast to download, but only suitable for query classification (--classify). For generation features like --generate, --reasoning, and --followup, you'll need a larger model. See the RAG section below for examples using Gemini, Ollama, or a larger Termite model.
Termite runs ONNX-optimized models locally for fast CPU inference without external API dependencies. Models are stored in ~/.termite/models/ and auto-discovered when Antfly starts. For more options, see antfly termite list --remote or the Termite guide.
Start Antfly in Swarm Mode
Open up a new terminal and run:
antfly swarm 2>&1 | tee "antfly.log"You should see logs indicating the metadata server, storage nodes, and Termite are starting. The API will be available at http://localhost:8080.
Swarm mode starts Antfly with metadata, data nodes, and Termite in a single process. Termite auto-discovers models from ~/.termite/models/ and serves them for embedding, chunking, reranking, and generation. For production deployments, consider running separate metadata and worker nodes.
You can also access the web dashboard at http://localhost:8080 to manage tables, run queries, and monitor your cluster.
Download Sample Data
In another terminal... We'll use a sample of 10,000 Wikipedia articles enriched with thumbnail images:
# Download the sample data
curl -L -o wiki-articles.json https://cdn.antfly.io/datasets/wiki-articles-10k-v001.json
# Preview the data structure
head -n 1 wiki-articles.json | jq '.'Each article has:
url: Wikipedia URLtitle: Article titlebody: Article contentthumbnail_url: Wikipedia thumbnail image (when available)
Create a Table
Create a table with two indexes: a text embedding index for semantic search and a CLIP image index for searching by thumbnail. Both use Termite for local ONNX inference.
antfly table create --table wikipedia \
--index '{
"name": "title_body",
"type": "embeddings",
"template": "{{title}} {{body}}",
"embedder": {
"provider": "termite",
"model": "BAAI/bge-small-en-v1.5"
},
"chunker": {
"provider": "antfly",
"text": {
"target_tokens": 200,
"overlap_tokens": 25
}
}
}' \
--index '{
"name": "thumbnail",
"type": "embeddings",
"template": "{{media url=thumbnail_url}}",
"embedder": {
"provider": "termite",
"model": "openai/clip-vit-base-patch32"
},
"dimension": 512,
"distance_metric": "cosine"
}'title_bodyusesBAAI/bge-small-en-v1.5(384 dimensions) for text semantic search with chunkingthumbnailuses CLIP (512 dimensions) for image search with cosine similarity — Antfly automatically fetches and embeds each article's Wikipedia thumbnail. Articles without athumbnail_urlare skipped.
For alternative models, run antfly termite list --remote or see the Termite guide.
antfly table list
antfly index list --table wikipediaLoad the Sample Data
# Load the Wikipedia articles using the title as the document ID
antfly load --table wikipedia \
--file wiki-articles.json \
--id-field titleThis will load 10,000 articles in 10 batches of 1000 documents each.
Generating embeddings can take some time, especially for larger datasets.
You can monitor the progress in the logs or by checking active_vectors in the index:
antfly index list --table wikipediaQuery the Data
You can run these queries using the CLI (shown below) or the web dashboard at http://localhost:8080.
Full-Text Search
Search for articles containing specific terms (a full specification of the query syntax is available in the Bleve Query String Query documentation):
# Search for articles about Korea
antfly query --table wikipedia \
--full-text-search 'body:"Korea"' \
--fields "title,url" \
--limit 5Semantic Similarity Search
Find articles semantically similar to a query:
# Find articles about anatomy and physiology
antfly query --table wikipedia \
--semantic-search "anatomy and physiology" \
--indexes "title_body" \
--fields "title,url" \
--limit 5Hybrid Search
Combine full-text, vector search, reranking, and pruning for better relevance:
# Find articles about Einstein with semantic understanding
antfly query --table wikipedia \
--full-text-search 'body:Einstein' \
--semantic-search "theory of relativity and physics" \
--indexes "title_body" \
--fields "title,url" \
--limit 10 \
--reranker '{
"provider": "termite",
"model": "mixedbread-ai/mxbai-rerank-base-v1",
"field": "body"
}' \
--pruner '{"min_score_ratio": 0.01}'Reranking vs Pruning:
- Reranking uses a cross-encoder model (like
mixedbread-ai/mxbai-rerank-base-v1) to re-score results based on query-document relevance. It improves ordering but keeps the same number of results. - Pruning filters out low-relevance results based on score quality. Use
min_score_ratioto keep only results scoring at least N% of the top result, ormax_score_gap_percentto detect "elbows" in score distribution.
Image Search
Search for articles by visual content using the CLIP thumbnail index. CLIP understands visual concepts, so you can describe what you're looking for:
# Find articles with images of maps or geography
antfly query --table wikipedia \
--semantic-search "map of a country" \
--indexes "thumbnail" \
--fields "title,url,thumbnail_url" \
--limit 5CLIP responds best to concrete visual descriptions ("red sports car", "mountain landscape") rather than abstract concepts. Articles without a thumbnail_url are skipped by the CLIP index. For a deeper dive, see the Image Search with CLIP example.
Multi-Index Hybrid Search
Combine text, image, and full-text search across multiple indexes:
# Search with both text and image indexes
antfly query --table wikipedia \
--full-text-search 'title:History' \
--semantic-search "ancient civilizations and archaeology" \
--indexes "title_body,thumbnail" \
--fields "title,url,thumbnail_url" \
--limit 10RAG (Retrieval-Augmented Generation)
Combine search with LLM-powered classification and generation using the retrieval agent.
Use the small gemma-3-270m-it-ONNX model you pulled earlier to classify whether a query needs retrieval:
antfly agents retrieval --table wikipedia \
--semantic-search "What are the major events in Korean history?" \
--indexes "title_body" \
--fields "title,body" \
--limit 5 \
--reranker '{
"provider": "termite",
"model": "mixedbread-ai/mxbai-rerank-base-v1",
"field": "body"
}' \
--pruner '{"min_score_ratio": 0.6, "max_score_gap_percent": 40}' \
--generator '{
"provider": "termite",
"model": "onnx-community/gemma-3-270m-it-ONNX"
}' \
--max-context-tokens 512 \
--classifyThis command:
- Searches for semantically similar articles using the generated embeddings
- Reranks results using the built-in cross-encoder reranker for better relevance
- Prunes low-quality results (keeps only those scoring ≥60% of top result, stops at large score gaps)
- Fits the remaining documents into a 512-token context budget, dropping lowest-ranked documents that don't fit
- Classifies the query intent using the small Gemma 3 model
Pruning for RAG: Pruning is especially valuable for RAG pipelines. Score-based pruning (--pruner) filters out marginally relevant documents, while token-based pruning (--max-context-tokens) ensures the context fits within the model's limits.
For full generation features (--generate, --reasoning, --followup), you need a larger model. Choose a provider below:
Pull the larger Gemma 3 ONNX model (~5.7 GB) first:
antfly termite pull hf:onnxruntime/Gemma-3-ONNXantfly agents retrieval --table wikipedia \
--semantic-search "What are the major events in Korean history?" \
--indexes "title_body" \
--fields "title,body" \
--limit 5 \
--generator '{
"provider": "termite",
"model": "onnxruntime/Gemma-3-ONNX"
}' \
--max-context-tokens 512 \
--classify --reasoning --generate --followupChoosing a generator model:
onnx-community/gemma-3-270m-it-ONNX(~270M params, 1.1 GB) — fast, runs anywhere, good for--classifyonlyonnxruntime/Gemma-3-ONNX(~4B params, 5.7 GB) — local ONNX inference via Termite, supports all RAG features- Ollama
gemma3:4b-it-qat— local inference via Ollama, quantized for fast CPU/GPU performance - Gemini
gemini-2.5-flash— cloud API, fast and high quality, requires API key - OpenAI
gpt-4.1-mini— cloud API, fast and high quality, requires API key
Add --streaming=false to get structured JSON output with source references instead of streaming text. This works with any of the generator providers above:
antfly agents retrieval --table wikipedia \
--semantic-search "Explain the theory of relativity" \
--indexes "title_body" \
--fields "title,body,url" \
--limit 5 \
--generator '{"provider": "gemini", "model": "gemini-2.5-flash"}' \
--streaming=false --generateAdd --eval to run evaluators on the query results and include scores in the response. Evaluation requires a model capable of generation — use a larger model as the judge:
antfly agents retrieval --table wikipedia \
--semantic-search "What are the major events in Korean history?" \
--indexes "title_body" \
--fields "title,body" \
--limit 5 \
--generator '{"provider": "gemini", "model": "gemini-2.5-flash"}' \
--generate \
--eval '{
"evaluators": ["faithfulness", "relevance"],
"judge": {"provider": "gemini", "model": "gemini-3.0-flash"}
}'Available evaluators:
- Retrieval metrics (require
ground_truth.relevant_ids):recall,precision,ndcg,mrr,map - LLM-as-judge metrics (require
judgeconfig):relevance,faithfulness,completeness,coherence,safety,helpfulness,correctness,citation_quality
Next Steps
- Image Search Example: Build a full image search app with CLIP — see the Image Search with CLIP example
- Explore Different Models: Browse available Termite models with
antfly termite list --remote— trymixedbread-ai/mxbai-embed-large-v1for higher quality text embeddings ormixedbread-ai/mxbai-rerank-base-v1for neural reranking - Backup and Restore: Try backing up your data:
antfly backup --table wikipedia \
--backup-id wiki-backup-$(date +%Y%m%d) \
--location "file:///tmp/antfly_backups"- Production Deployment: For production, run metadata and storage nodes separately for better scalability
Additional Resources
- Antfly CLI Documentation
- Main Antfly README
- REST API Documentation
- Termite Models - Browse available embedding, chunking, and reranking models
- Configuration Reference - Configure Termite, embedding providers, and more