Configuration Reference
Configuration schema for Antfly distributed key-value store and vector search engine.
Antfly is configured using a YAML configuration file. This reference documents all available configuration options.
Minimal Configuration
The minimum required configuration for running Antfly:
storage:
local:
base_dir: antflydb
metadata:
orchestration_urls:
"1": "http://localhost:5001"
"2": "http://localhost:5002"
"3": "http://localhost:5003"
replication_factor: 3
max_shard_size_bytes: 1073741824
max_shards_per_table: 100
default_shards_per_table: 4For development, you can use antfly swarm mode which runs a single-node cluster. For production deployments, use distributed mode with multiple metadata and storage nodes.
Complete Example
A comprehensive configuration showing all available options:
# Logging configuration for Termite services
log:
# Logging verbosity level
level: info
# Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
style: terminal
# Port for the health/metrics server. Defaults to 4200.
health_port: 4200
storage:
local:
# Root directory for all antfly data storage. Defaults to 'antflydb'.
base_dir: antflydb
# Storage backend type
data: local
# Storage backend type
metadata: local
s3: {}
metadata:
# Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster
orchestration_urls:
"1": "http://localhost:5001"
"2": "http://localhost:5002"
"3": "http://localhost:5003"
termite:
# URL of the Termite embedding/chunking service
api_url: "http://localhost:8080"
# Base directory containing model subdirectories. Termite auto-discovers models from:
# - `{models_dir}/embedders/` - Embedding models (ONNX)
# - `{models_dir}/chunkers/` - Chunking models (ONNX)
# - `{models_dir}/rerankers/` - Reranking models (ONNX)
# - `{models_dir}/recognizers/` - Recognition models (ONNX)
# - `{models_dir}/rewriters/` - Seq2Seq rewriter models (ONNX)
#
# Defaults to ~/.termite/models (set via viper). If not set, only built-in fixed chunking is available.
#
models_dir: ~/.termite/models
content_security:
# Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
allowed_hosts:
- example.com
- cdn.example.com
- 192.0.2.1
# Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
block_private_ips: true
# Maximum size of downloaded content in bytes
max_download_size_bytes: 104857600
# Timeout for individual download operations in seconds
download_timeout_seconds: 30
# Maximum image width/height in pixels (images will be resized)
max_image_dimension: 2048
# Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
allowed_paths:
- /Users/data/
- my-bucket/uploads/
s3_credentials:
# S3-compatible endpoint (e.g., 's3.amazonaws.com' or 'localhost:9000' for MinIO)
endpoint: s3.amazonaws.com
# Enable SSL/TLS for S3 connections (default: true for AWS, false for local MinIO)
use_ssl: true
# AWS access key ID. Supports keystore syntax for secret lookup. Falls back to AWS_ACCESS_KEY_ID environment variable if not set.
access_key_id: your-access-key-id
# AWS secret access key. Supports keystore syntax for secret lookup. Falls back to AWS_SECRET_ACCESS_KEY environment variable if not set.
secret_access_key: your-secret-access-key
# Optional AWS session token for temporary credentials. Supports keystore syntax for secret lookup.
session_token: your-session-token
# How long to keep models loaded in memory after last use (Ollama-compatible).
# Models are automatically unloaded after this duration of inactivity.
# Use Go duration format: "5m" (5 minutes), "1h" (1 hour), "0" (eager loading).
# Defaults to "5m" (lazy loading) like Ollama. Set to "0" to explicitly enable eager loading
# where all models are loaded at startup and never unloaded.
#
keep_alive: 5m
# Maximum number of models to keep loaded in memory simultaneously.
# When this limit is reached, the least recently used model is unloaded (LRU eviction).
# Set to 0 for unlimited (default). Only effective when keep_alive is non-zero.
#
max_loaded_models: 3
# Number of concurrent inference pipelines per model. Each pipeline loads
# a copy of the model, so higher values use more memory but allow more
# concurrent requests. Set to 0 to use the default (min(NumCPU, 4)).
#
pool_size: 1
# Backend priority order for model loading with optional device specifiers.
# Format: `backend` or `backend:device` where device defaults to `auto`.
#
# Termite tries entries in order and uses the first available backend+device
# combination that supports the model.
#
# **Backends** (depend on build tags):
# - `go` - Pure Go inference (always available, CPU only, slowest)
# - `onnx` - ONNX Runtime (requires -tags="onnx,ORT", fastest)
# - `xla` - GoMLX XLA (requires -tags="xla,XLA", TPU/CUDA/CPU)
#
# **Devices**:
# - `auto` - Auto-detect best available (default)
# - `cuda` - NVIDIA CUDA GPU
# - `coreml` - Apple CoreML (macOS only, used by ONNX)
# - `tpu` - Google TPU (used by XLA)
# - `cpu` - Force CPU only
#
# **Examples**:
# - `["onnx", "xla", "go"]` - Try backends with auto device detection
# - `["onnx:cuda", "xla:tpu", "onnx:cpu", "go"]` - Prefer GPU, fall back to CPU
# - `["onnx:coreml", "go"]` - macOS with CoreML acceleration
#
backend_priority:
- "onnx:cuda"
- "xla:tpu"
- "onnx:cpu"
- "xla:cpu"
- go
# Maximum number of concurrent inference requests allowed.
# Additional requests will be queued up to max_queue_size.
# Set to 0 for unlimited (default).
#
max_concurrent_requests: 4
# Maximum number of requests to queue when max_concurrent_requests is reached.
# When the queue is full, new requests receive 503 Service Unavailable with Retry-After header.
# Set to 0 for unlimited queue (default). Only effective when max_concurrent_requests > 0.
#
max_queue_size: 100
# Maximum time to wait for a request to complete, including queue wait time.
# Use Go duration format: "30s", "1m", "0" (no timeout, default).
# Requests exceeding this timeout receive 504 Gateway Timeout.
#
request_timeout: 30s
# List of model names to preload at startup (Ollama-compatible).
# These models are loaded immediately when Termite starts, avoiding first-request latency.
# Model names should match those in models_dir/embedders/ (e.g., "BAAI/bge-small-en-v1.5").
# Only effective when keep_alive is non-zero (lazy loading mode).
#
preload:
- BAAI/bge-small-en-v1.5
- openai/clip-vit-base-patch32
# Maximum memory (in MB) to use for loaded models.
# When this limit is approached, least recently used models are unloaded.
# Set to 0 for unlimited (default). This is an advisory limit - actual memory
# usage depends on model sizes and may temporarily exceed this value.
# Works alongside max_loaded_models for fine-grained control.
#
max_memory_mb: 4096
# Per-model loading strategy overrides. Maps model names to their loading strategy.
# Models not in this map use the default strategy based on keep_alive:
# - If keep_alive>0 (default "5m"): lazy loading (load on demand, unload after idle)
# - If keep_alive="0": eager loading (load at startup, never unload)
#
# When a model has strategy "eager" in this map:
# - It is loaded at startup (as part of preload)
# - It is never unloaded, even when keep_alive>0 (pinned in memory)
#
# This allows mixing eager and lazy models in the same pool.
#
model_strategies:
"BAAI/bge-small-en-v1.5": "eager"
"mirth/chonky-mmbert-small-multilingual-1": "lazy"
# Whether the dashboard should show model download commands.
# Defaults to true for standalone/swarm mode. Set to false in managed
# deployments (e.g., Kubernetes operator) where models are managed externally.
#
allow_downloads: true
# Logging configuration for Termite services
log:
# Logging verbosity level
level: info
# Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
style: terminal
tls:
# Path to TLS certificate file
cert: /path/to/cert.pem
# Path to TLS key file
key: /path/to/key.pem
# Configuration for remote content fetching (remotePDF, remoteMedia, remoteText templates).
# Consolidates S3 credentials and security settings separate from backup storage.
#
# **Credential Resolution Order:**
# 1. Explicit `credentials="name"` parameter in template
# 2. First credential where `buckets` glob pattern matches URL's bucket
# 3. `default_s3` credential
# 4. Legacy fallback: `storage.s3` credentials (backward compatibility)
#
remote_content:
security:
# Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
allowed_hosts:
- example.com
- cdn.example.com
- 192.0.2.1
# Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
block_private_ips: true
# Maximum size of downloaded content in bytes
max_download_size_bytes: 104857600
# Timeout for individual download operations in seconds
download_timeout_seconds: 30
# Maximum image width/height in pixels (images will be resized)
max_image_dimension: 2048
# Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
allowed_paths:
- /Users/data/
- my-bucket/uploads/
# Default S3 credential name when no bucket pattern matches.
default_s3: primary
# Named S3 credentials for remote content fetching.
s3:
"primary": "[object Object]"
"untrusted": "[object Object]"
# Named HTTP credentials for authenticated endpoints.
http:
"internal-api": "[object Object]"
# Named speech-to-text provider configurations.
#
# Define named STT providers that can be referenced by templates and API calls.
# The first provider defined becomes the default when no provider name is specified.
#
# **Example:**
# ```yaml
# speech_to_text:
# whisper-local:
# provider: termite
# api_url: "http://localhost:8080"
# model: openai/whisper-base
# openai-whisper:
# provider: openai
# model: whisper-1
# ```
#
# Then in templates: `{{transcribeAudio url="..." provider="whisper-local"}}`
#
speech_to_text:
"whisper-local": "[object Object]"
"openai-whisper": "[object Object]"
cors:
# Controls whether CORS is enabled
enabled: true
# List of allowed origins for CORS requests. Use ['*'] to allow all origins. Defaults to ['*'] if empty and enabled is true.
allowed_origins:
- "https://example.com"
- "https://app.example.com"
# HTTP methods allowed for CORS requests
allowed_methods:
- GET
- POST
- PUT
- DELETE
# Headers that can be used in CORS requests
allowed_headers:
- Content-Type
- Authorization
# Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read.
exposed_headers:
- X-Total-Count
# Indicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*'].
allow_credentials: false
# How long (in seconds) the results of a preflight request can be cached
max_age: 3600
# How many replicas of each shard should be maintained.
replication_factor: 3
# Enables authentication and authorization (RBAC) for the API.
enable_auth: false
# Disables automatic shard reallocation (splitting/merging).
disable_shard_alloc: false
# Cooldown period after shard operations (start/stop/split). Format: duration string like '1m', '30s'. Default: '1m' (one minute).
shard_cooldown_period: 1m
# Maximum duration for a shard split operation before triggering rollback. Format: duration string like '5m', '30s'. Default: '5m' (five minutes).
split_timeout: 5m
# Maximum size of a shard in bytes. Used to determine when to split shards.
max_shard_size_bytes: 1073741824
# Maximum number of shards that can be created for a single table.
max_shards_per_table: 100
# Default number of shards to create for a new table.
default_shards_per_table: 4
# URL of the model registry for the Antfarm dashboard. Defaults to https://registry.antfly.io/v1
registry_url: "https://registry.antfly.io/v1"
# Named embedder configurations for embedding operations.
#
# Define named embedders that can be referenced by indexes, templates, and API calls.
# The first embedder defined becomes the default when no embedder name is specified.
#
# **API Key Configuration:**
#
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
#
# 1. **Keystore** (recommended for production):
# ```bash
# antfly keystore create
# antfly keystore add openai.api_key
# ```
# Then reference in config: `api_key: ${secret:openai.api_key}`
#
# 2. **Environment variable** (simpler for development):
# Omit `api_key` from config and set the appropriate env var:
# - OpenAI: `OPENAI_API_KEY`
# - Gemini: `GEMINI_API_KEY`
# - Anthropic: `ANTHROPIC_API_KEY`
# - Cohere: `COHERE_API_KEY`
#
# See [Secrets Management](/docs/secrets) for complete documentation.
#
# **Example:**
# ```yaml
# embedders:
# openai-small:
# provider: openai
# model: text-embedding-3-small
# termite-local:
# provider: termite
# model: bge-base-en-v1.5
# api_url: "http://localhost:8082"
# ```
#
embedders:
"openai-small": "[object Object]"
"termite-local": "[object Object]"
# Named generator configurations for AI operations.
#
# Define named generators that can be referenced by chains, templates, and API calls.
# The first generator defined becomes the default when no generator name is specified.
#
# **API Key Configuration:**
#
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
#
# 1. **Keystore** (recommended for production):
# ```bash
# antfly keystore create
# antfly keystore add gemini.api_key
# ```
# Then reference in config: `api_key: ${secret:gemini.api_key}`
#
# 2. **Environment variable** (simpler for development):
# Omit `api_key` from config and set the appropriate env var:
# - Gemini: `GEMINI_API_KEY`
# - OpenAI: `OPENAI_API_KEY`
# - Anthropic: `ANTHROPIC_API_KEY`
#
# See [Secrets Management](/docs/secrets) for complete documentation.
#
# **Example:**
# ```yaml
# generators:
# gemini-flash:
# provider: gemini
# model: gemini-2.5-flash
# ollama-local:
# provider: ollama
# model: llama3
# openai-gpt4:
# provider: openai
# model: gpt-4.1
# ```
#
generators:
"gemini-flash": "[object Object]"
"ollama-local": "[object Object]"
"openai-gpt4": "[object Object]"
# Named chain configurations for fallback/retry logic.
#
# Chains are ordered lists of generators with retry and fallback logic.
# Each link references a generator by name from the `generators` map.
# The first chain defined becomes the default when no chain name is specified.
#
# **Chain Conditions:**
# - `on_error`: Try next generator on any error (default)
# - `on_rate_limit`: Try next only on rate limit (429) errors
# - `on_timeout`: Try next only on timeout errors
# - `always`: Always try the next generator
#
# **Example:**
# ```yaml
# chains:
# default:
# - generator: gemini-flash # Reference by name
# retry:
# max_attempts: 3
# condition: on_rate_limit
# - generator: ollama-local # Reference by name
# with-inline:
# - generator: gemini-flash
# - generator_config: # Inline config
# provider: anthropic
# model: claude-sonnet-4-5-20250929
# ```
#
# Then in API calls: `chain: "default"` or `chain: "with-inline"`
#
chains:
"default": "[object Object],[object Object]"
"with-inline": "[object Object],[object Object]"
# Named reranker configurations for search result reranking.
#
# Define named rerankers that can be referenced by indexes, search queries, and API calls.
# The first reranker defined becomes the default when no reranker name is specified.
#
# **Example:**
# ```yaml
# rerankers:
# cohere-english:
# provider: cohere
# model: rerank-english-v3.0
# termite-local:
# provider: termite
# model: mxbai-rerank-base-v1
# url: "http://localhost:8080"
# ```
#
rerankers:
"cohere-english": "[object Object]"
"termite-local": "[object Object]"
# Named chunker configurations for text chunking.
#
# Define named chunkers that can be referenced by indexes and API calls.
# The first chunker defined becomes the default when no chunker name is specified.
#
# **Example:**
# ```yaml
# chunkers:
# fixed-500:
# provider: termite
# model: fixed
# target_tokens: 500
# overlap_tokens: 50
# semantic:
# provider: termite
# model: semantic-chunker
# api_url: "http://localhost:8080"
# ```
#
chunkers:
"fixed-500": "[object Object]"
"semantic": "[object Object]"Configuration Properties
Core Settings
Essential configuration for running Antfly
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
log | object | Logging configuration for Termite services | ||
health_port | integer | 4200 | Port for the health/metrics server. Defaults to 4200. | |
replication_factor | uint64 | ✓ | How many replicas of each shard should be maintained. (min: 1, max: 5) | |
enable_auth | boolean | false | Enables authentication and authorization (RBAC) for the API. |
Storage Configuration
Configure local and remote storage backends
S3 credentials should never be stored directly in config files. Use the keystore system with $\{secret:aws.access_key_id\} or environment variables. See the secrets documentation for details.
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
storage | object | ✓ |
StorageConfig Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
storage.local | object | ✓ | ||
storage.data | enum: local, s3 | local | Storage backend type | |
storage.metadata | enum: local, s3 | local | Storage backend type | |
storage.s3 | object |
LocalStorageConfig Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
base_dir | string | ✓ | antflydb | Root directory for all antfly data storage. Defaults to 'antflydb'. (minLength: 1) |
S3Info Properties:
| Property | Type | Required | Default | Description |
|---|
Metadata Configuration
Metadata orchestration cluster settings
The orchestration_urls map keys are metadata node IDs (1, 2, 3, etc.). Each storage node needs to know all metadata node URLs to enroll in the cluster.
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
metadata | object | ✓ |
MetadataInfo Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
metadata.orchestration_urls | map[string → string] | ✓ | Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster |
Shard Management
Control automatic shard allocation and sizing
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
disable_shard_alloc | boolean | false | Disables automatic shard reallocation (splitting/merging). | |
max_shard_size_bytes | uint64 | ✓ | Maximum size of a shard in bytes. Used to determine when to split shards. (min: 1048576, max: 46170898227200) | |
max_shards_per_table | uint64 | ✓ | Maximum number of shards that can be created for a single table. (min: 1) | |
default_shards_per_table | uint64 | ✓ | Default number of shards to create for a new table. (min: 1) |
Security & CORS
TLS, content security, and cross-origin settings
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
tls | object | |||
cors | object |
TLSInfo Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
tls.cert | string | Path to TLS certificate file | ||
tls.key | string | Path to TLS key file |
ContentSecurityConfig Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
allowed_hosts | array[string] | Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true). | ||
block_private_ips | boolean | true | Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16) | |
max_download_size_bytes | int64 | 104857600 | Maximum size of downloaded content in bytes (min: 0) | |
download_timeout_seconds | integer | 30 | Timeout for individual download operations in seconds (min: 1) | |
max_image_dimension | integer | 2048 | Maximum image width/height in pixels (images will be resized) (min: 1) | |
allowed_paths | array[string] | Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/). |
CORSConfig Properties:
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
cors.enabled | boolean | true | Controls whether CORS is enabled | |
cors.allowed_origins | array[string] | List of allowed origins for CORS requests. Use [''] to allow all origins. Defaults to [''] if empty and enabled is true. | ||
cors.allowed_methods | array[string] | ["GET","POST","PUT","DELETE","OPTIONS","PATCH"] | HTTP methods allowed for CORS requests | |
cors.allowed_headers | array[string] | ["Content-Type","Authorization","X-Requested-With","Accept","Origin"] | Headers that can be used in CORS requests | |
cors.exposed_headers | array[string] | Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read. | ||
cors.allow_credentials | boolean | false | Indicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*']. | |
cors.max_age | integer | 3600 | How long (in seconds) the results of a preflight request can be cached (min: 0) |
External Services
Optional Termite and MCP service integration
Termite provides centralized embedding generation and document chunking with caching. Enable it with the --termite flag when starting Antfly.
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
termite | object |