Common questions about this section
  • How do I install Antfly?
  • How do I start Antfly for local development?
  • How do I create my first table and index?
  • How do I load data into Antfly?
  • How do I run a search query?
  • How do I use RAG with Ollama for AI-powered answers?

AntflyDB is a document database built on top of Raft for distribution, Pebble for persistance, Bleve for full text indexing, and SPANN for vector indexing. This guide will help you get started with setting up your AntflyDB instance!

Download Antfly and CLI#

Visit the Downloads page and follow the instructions for the your platform.

Install and Start Ollama#

Install Ollama for your platform, then pull an embedding model and a generation model:

ollama pull all-minilm gemma3:4b-it-qat
  • all-minilm is a small, fast embedding model (384 dimensions) for vector search
  • gemma3:4b-it-qat is a small language model for RAG-powered answers

Make sure Ollama is running with ollama serve in a separate terminal, or as a background service with brew services start ollama on macOS.

Start Antfly in Swarm Mode#

Open up a new terminal and run:

antfly swarm 2>&1 | tee "antfly.log"

You should see logs indicating the metadata server and storage nodes are starting. The API will be available at http://localhost:8080.

Swarm mode is a convenient way to run Antfly with both metadata and data nodes in a single process. For production deployments, consider running separate metadata and worker nodes.

You can also access the web dashboard at http://localhost:8080 to manage tables, run queries, and monitor your cluster.

Download Sample Data#

In another terminal... We'll use a sample of 10,000 Wikipedia articles:

# Download the sample data (11MB)
curl -L -o wiki-articles-1000.json http://fulmicoton.com/tantivy-files/wiki-articles-1000.json
# Preview the data structure
head -n 1 wiki-articles-1000.json | jq '.'

Each article has:

  • url: Wikipedia URL
  • title: Article title
  • body: Article content

Create a Table#

1

Create a table and index#

Create a table and index to store the sample data. This index uses the Ollama all-minilm model for vector search and a fixed-size chunker to split long documents.

antflycli table create --table wikipedia \
  --index '{
    "name": "title_body",
    "type": "aknn_v0",
    "template": "{{title}} {{body}}",
    "embedder": {
      "provider": "ollama",
      "model": "all-minilm"
    },
    "chunker": {
      "provider": "antfly",
      "text": {
        "target_tokens": 200,
        "overlap_tokens": 25
      }
    }
  }'

The all-minilm model via Ollama generates 384-dimensional embeddings for semantic search. The chunker splits documents into chunks of approximately target_tokens size with overlap_tokens overlap between consecutive chunks. For alternative embedding models, browse Ollama embedding models or see the Termite guide for cached model management.

2

Verify the table and indexes were created#

antflycli table list
antflycli index list --table wikipedia

Load the Sample Data#

# Load the Wikipedia articles using the title as the document ID
antflycli load --table wikipedia \
  --file wiki-articles-1000.json \
  --id-field title

This will load 10,000 articles in 10 batches of 1000 documents each. Generating embeddings can take some time, especially for larger datasets. You can monitor the progress in the logs or by checking active_vectors in the index:

antflycli index list --table wikipedia

Query the Data#

You can run these queries using the CLI (shown below) or the web dashboard at http://localhost:8080.

Search for articles containing specific terms (a full specification of the query syntax is available in the Bleve Query String Query documentation):

# Search for articles about Korea
antflycli query --table wikipedia \
  --full-text-search 'body:"Korea"' \
  --fields "title,url" \
  --limit 5

Find articles semantically similar to a query:

# Find articles about anatomy and physiology
antflycli query --table wikipedia \
  --semantic-search "anatomy and physiology" \
  --indexes "title_body" \
  --fields "title,url" \
  --limit 5

Combine full-text, vector search, reranking, and pruning for better relevance:

# Find articles about Einstein with semantic understanding
antflycli query --table wikipedia \
  --full-text-search 'body:Einstein' \
  --semantic-search "theory of relativity and physics" \
  --indexes "title_body" \
  --fields "title,url" \
  --limit 10 \
  --reranker '{
      "provider": "antfly",
      "field": "body"
    }' \
  --pruner '{"min_score_ratio": 0.5}'

Reranking vs Pruning:

  • Reranking uses a cross-encoder model to re-score results based on query-document relevance. It improves ordering but keeps the same number of results.
  • Pruning filters out low-relevance results based on score quality. Use min_score_ratio to keep only results scoring at least N% of the top result, or max_score_gap_percent to detect "elbows" in score distribution.

Advanced Queries#

Create an Additional Index#

You can create multiple indexes on the same table with different embedding models or configurations:

# Create an index using only the body field with a different chunking config
antflycli index create --table wikipedia \
  --index body_small_chunks \
  --type aknn_v0 \
  --field "body" \
  --embedder '{
    "provider": "ollama",
    "model": "all-minilm"
  }' \
  --chunker '{
    "provider": "antfly",
    "text": {
      "target_tokens": 100
    }
  }'

Search across multiple indexes combining full-text and vector search:

# Search with multiple indexes and result ordering
antflycli query --table wikipedia \
  --full-text-search 'title:History' \
  --semantic-search "ancient civilizations and archaeology" \
  --indexes "body_small_chunks,title_body" \
  --fields "title,url,body" \
  --order-by "title:asc" \
  --limit 20

RAG (Retrieval-Augmented Generation)#

Combine search with LLM-powered summarization using the gemma3:4b-it-qat model you pulled earlier.

1

Run a RAG query#

Ask a question and get an AI-generated answer based on the Wikipedia articles:

antflycli agents retrieval --table wikipedia \
  --semantic-search "What are the major events in Korean history?" \
  --indexes "title_body" \
  --fields "title,body" \
  --limit 5 \
  --reranker '{
      "provider": "antfly",
      "field": "body"
    }' \
  --pruner '{"min_score_ratio": 0.6, "max_score_gap_percent": 40}' \
  --generator '{
    "provider": "ollama",
    "model": "gemma3:4b-it-qat"
  }' \
  --system-prompt "You are a helpful assistant. Answer the question based on the provided context."

This command:

  1. Searches for semantically similar articles using the generated embeddings
  2. Reranks results using the built-in cross-encoder reranker for better relevance
  3. Prunes low-quality results (keeps only those scoring ≥60% of top result, stops at large score gaps)
  4. Sends the filtered results to Gemma 3 to generate a coherent answer

Pruning for RAG: Pruning is especially valuable for RAG pipelines. By filtering out marginally relevant documents before sending to the LLM, you reduce noise in the context window and improve answer quality. The max_score_gap_percent option is useful for automatically detecting where relevance drops off sharply.

2

RAG with structured output#

Add --streaming=false to get structured JSON output with source references instead of streaming text:

antflycli agents retrieval --table wikipedia \
  --semantic-search "Explain the theory of relativity" \
  --indexes "title_body" \
  --fields "title,body,url" \
  --limit 5 \
  --generator '{
    "provider": "ollama",
    "model": "gemma3:4b-it-qat"
  }' \
  --streaming=false
3

RAG with evaluation#

Pull a larger model to use as a judge, then add --eval to run evaluators on the query results and include scores in the response:

ollama pull gemma3:12b-it-qat
antflycli agents retrieval --table wikipedia \
  --semantic-search "What are the major events in Korean history?" \
  --indexes "title_body" \
  --fields "title,body" \
  --limit 5 \
  --generator '{
    "provider": "ollama",
    "model": "gemma3:4b-it-qat"
  }' \
  --eval '{
    "evaluators": ["faithfulness", "relevance"],
    "judge": {
      "provider": "ollama",
      "model": "gemma3:12b-it-qat"
    }
  }'

Available evaluators:

  • Retrieval metrics (require ground_truth.relevant_ids): recall, precision, ndcg, mrr, map
  • LLM-as-judge metrics (require judge config): relevance, faithfulness, completeness, coherence, safety, helpfulness, correctness, citation_quality

Next Steps#

  1. Explore Different Models: Try larger Ollama embedders like nomic-embed-text (768d, 8192 token context) for better quality, or browse Termite models with antfly termite list --remote for options like mxbai-embed-large-v1 or multimodal clip-vit-base-patch32
  2. Schema Design: Experiment with different field types and schemas
  3. Backup and Restore: Try backing up your data:
antflycli backup --table wikipedia \
  --backup-id wiki-backup-$(date +%Y%m%d) \
  --location "file:///tmp/antfly_backups"
  1. Production Deployment: For production, run metadata and storage nodes separately for better scalability

Additional Resources#