- How do I install Antfly?
- How do I start Antfly for local development?
- How do I create my first table and index?
- How do I load data into Antfly?
- How do I run a search query?
- How do I use RAG with Ollama for AI-powered answers?
AntflyDB is a document database built on top of Raft for distribution, Pebble for persistance, Bleve for full text indexing, and SPANN for vector indexing. This guide will help you get started with setting up your AntflyDB instance!
Download Antfly and CLI
Visit the Downloads page and follow the instructions for the your platform.
Install and Start Ollama
Install Ollama for your platform, then pull an embedding model and a generation model:
ollama pull all-minilm gemma3:4b-it-qatall-minilmis a small, fast embedding model (384 dimensions) for vector searchgemma3:4b-it-qatis a small language model for RAG-powered answers
Make sure Ollama is running with ollama serve in a separate terminal, or as a background service with brew services start ollama on macOS.
Start Antfly in Swarm Mode
Open up a new terminal and run:
antfly swarm 2>&1 | tee "antfly.log"You should see logs indicating the metadata server and storage nodes are starting. The API will be available at http://localhost:8080.
Swarm mode is a convenient way to run Antfly with both metadata and data nodes in a single process. For production deployments, consider running separate metadata and worker nodes.
You can also access the web dashboard at http://localhost:8080 to manage tables, run queries, and monitor your cluster.
Download Sample Data
In another terminal... We'll use a sample of 10,000 Wikipedia articles:
# Download the sample data (11MB)
curl -L -o wiki-articles-1000.json http://fulmicoton.com/tantivy-files/wiki-articles-1000.json
# Preview the data structure
head -n 1 wiki-articles-1000.json | jq '.'Each article has:
url: Wikipedia URLtitle: Article titlebody: Article content
Create a Table
Create a table and index to store the sample data. This index uses the Ollama all-minilm model for vector search and a fixed-size chunker to split long documents.
antflycli table create --table wikipedia \
--index '{
"name": "title_body",
"type": "aknn_v0",
"template": "{{title}} {{body}}",
"embedder": {
"provider": "ollama",
"model": "all-minilm"
},
"chunker": {
"provider": "antfly",
"text": {
"target_tokens": 200,
"overlap_tokens": 25
}
}
}'The all-minilm model via Ollama generates 384-dimensional embeddings for semantic search. The chunker splits documents into chunks of approximately target_tokens size with overlap_tokens overlap between consecutive chunks. For alternative embedding models, browse Ollama embedding models or see the Termite guide for cached model management.
antflycli table list
antflycli index list --table wikipediaLoad the Sample Data
# Load the Wikipedia articles using the title as the document ID
antflycli load --table wikipedia \
--file wiki-articles-1000.json \
--id-field titleThis will load 10,000 articles in 10 batches of 1000 documents each.
Generating embeddings can take some time, especially for larger datasets.
You can monitor the progress in the logs or by checking active_vectors in the index:
antflycli index list --table wikipediaQuery the Data
You can run these queries using the CLI (shown below) or the web dashboard at http://localhost:8080.
Full-Text Search
Search for articles containing specific terms (a full specification of the query syntax is available in the Bleve Query String Query documentation):
# Search for articles about Korea
antflycli query --table wikipedia \
--full-text-search 'body:"Korea"' \
--fields "title,url" \
--limit 5Semantic Similarity Search
Find articles semantically similar to a query:
# Find articles about anatomy and physiology
antflycli query --table wikipedia \
--semantic-search "anatomy and physiology" \
--indexes "title_body" \
--fields "title,url" \
--limit 5Hybrid Search
Combine full-text, vector search, reranking, and pruning for better relevance:
# Find articles about Einstein with semantic understanding
antflycli query --table wikipedia \
--full-text-search 'body:Einstein' \
--semantic-search "theory of relativity and physics" \
--indexes "title_body" \
--fields "title,url" \
--limit 10 \
--reranker '{
"provider": "antfly",
"field": "body"
}' \
--pruner '{"min_score_ratio": 0.5}'Reranking vs Pruning:
- Reranking uses a cross-encoder model to re-score results based on query-document relevance. It improves ordering but keeps the same number of results.
- Pruning filters out low-relevance results based on score quality. Use
min_score_ratioto keep only results scoring at least N% of the top result, ormax_score_gap_percentto detect "elbows" in score distribution.
Advanced Queries
Create an Additional Index
You can create multiple indexes on the same table with different embedding models or configurations:
# Create an index using only the body field with a different chunking config
antflycli index create --table wikipedia \
--index body_small_chunks \
--type aknn_v0 \
--field "body" \
--embedder '{
"provider": "ollama",
"model": "all-minilm"
}' \
--chunker '{
"provider": "antfly",
"text": {
"target_tokens": 100
}
}'Multi-Index Hybrid Search
Search across multiple indexes combining full-text and vector search:
# Search with multiple indexes and result ordering
antflycli query --table wikipedia \
--full-text-search 'title:History' \
--semantic-search "ancient civilizations and archaeology" \
--indexes "body_small_chunks,title_body" \
--fields "title,url,body" \
--order-by "title:asc" \
--limit 20RAG (Retrieval-Augmented Generation)
Combine search with LLM-powered summarization using the gemma3:4b-it-qat model you pulled earlier.
Ask a question and get an AI-generated answer based on the Wikipedia articles:
antflycli agents retrieval --table wikipedia \
--semantic-search "What are the major events in Korean history?" \
--indexes "title_body" \
--fields "title,body" \
--limit 5 \
--reranker '{
"provider": "antfly",
"field": "body"
}' \
--pruner '{"min_score_ratio": 0.6, "max_score_gap_percent": 40}' \
--generator '{
"provider": "ollama",
"model": "gemma3:4b-it-qat"
}' \
--system-prompt "You are a helpful assistant. Answer the question based on the provided context."This command:
- Searches for semantically similar articles using the generated embeddings
- Reranks results using the built-in cross-encoder reranker for better relevance
- Prunes low-quality results (keeps only those scoring ≥60% of top result, stops at large score gaps)
- Sends the filtered results to Gemma 3 to generate a coherent answer
Pruning for RAG: Pruning is especially valuable for RAG pipelines. By filtering out marginally relevant documents before sending to the LLM, you reduce noise in the context window and improve answer quality. The max_score_gap_percent option is useful for automatically detecting where relevance drops off sharply.
Add --streaming=false to get structured JSON output with source references instead of streaming text:
antflycli agents retrieval --table wikipedia \
--semantic-search "Explain the theory of relativity" \
--indexes "title_body" \
--fields "title,body,url" \
--limit 5 \
--generator '{
"provider": "ollama",
"model": "gemma3:4b-it-qat"
}' \
--streaming=falsePull a larger model to use as a judge, then add --eval to run evaluators on the query results and include scores in the response:
ollama pull gemma3:12b-it-qatantflycli agents retrieval --table wikipedia \
--semantic-search "What are the major events in Korean history?" \
--indexes "title_body" \
--fields "title,body" \
--limit 5 \
--generator '{
"provider": "ollama",
"model": "gemma3:4b-it-qat"
}' \
--eval '{
"evaluators": ["faithfulness", "relevance"],
"judge": {
"provider": "ollama",
"model": "gemma3:12b-it-qat"
}
}'Available evaluators:
- Retrieval metrics (require
ground_truth.relevant_ids):recall,precision,ndcg,mrr,map - LLM-as-judge metrics (require
judgeconfig):relevance,faithfulness,completeness,coherence,safety,helpfulness,correctness,citation_quality
Next Steps
- Explore Different Models: Try larger Ollama embedders like
nomic-embed-text(768d, 8192 token context) for better quality, or browse Termite models withantfly termite list --remotefor options likemxbai-embed-large-v1or multimodalclip-vit-base-patch32 - Schema Design: Experiment with different field types and schemas
- Backup and Restore: Try backing up your data:
antflycli backup --table wikipedia \
--backup-id wiki-backup-$(date +%Y%m%d) \
--location "file:///tmp/antfly_backups"- Production Deployment: For production, run metadata and storage nodes separately for better scalability
Additional Resources
- Antfly CLI README
- Main Antfly README
- REST API Documentation
- Termite Models - Browse available embedding, chunking, and reranking models
- Configuration Reference - Configure Termite, embedding providers, and more
- Ollama - Local LLM inference for RAG summarization
- Ollama Model Library - Browse available LLM models