Configuration Reference#

Configuration schema for Antfly distributed key-value store and vector search engine.

Antfly is configured using a YAML configuration file. This reference documents all available configuration options.

Minimal Configuration#

The minimum required configuration for running Antfly:

storage:
  local:
    base_dir: antflydb
metadata:
  orchestration_urls:
    "1": "http://localhost:5001"
    "2": "http://localhost:5002"
    "3": "http://localhost:5003"
replication_factor: 3
max_shard_size_bytes: 1073741824
max_shards_per_table: 100
default_shards_per_table: 4

For development, you can use antfly swarm mode which runs a single-node cluster. For production deployments, use distributed mode with multiple metadata and storage nodes.

Complete Example#

A comprehensive configuration showing all available options:

# Logging configuration for Termite services
log:
  # Logging verbosity level
  level: info
  # Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
  style: terminal
# Port for the health/metrics server. Defaults to 4200.
health_port: 4200
storage:
  local:
    # Root directory for all antfly data storage. Defaults to 'antflydb'.
    base_dir: antflydb
  # Storage backend type
  data: local
  # Storage backend type
  metadata: local
  s3: {}
metadata:
  # Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster
  orchestration_urls:
    "1": "http://localhost:5001"
    "2": "http://localhost:5002"
    "3": "http://localhost:5003"
termite:
  # URL of the Termite embedding/chunking service
  api_url: "http://localhost:8080"
  # Base directory containing model subdirectories. Termite auto-discovers models from:
  # - `{models_dir}/embedders/` - Embedding models (ONNX)
  # - `{models_dir}/chunkers/` - Chunking models (ONNX)
  # - `{models_dir}/rerankers/` - Reranking models (ONNX)
  # - `{models_dir}/recognizers/` - Recognition models (ONNX)
  # - `{models_dir}/rewriters/` - Seq2Seq rewriter models (ONNX)
  # 
  # Defaults to ~/.termite/models (set via viper). If not set, only built-in fixed chunking is available.
  # 
  models_dir: ~/.termite/models
  content_security:
    # Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
    allowed_hosts:
      - example.com
      - cdn.example.com
      - 192.0.2.1
    # Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
    block_private_ips: true
    # Maximum size of downloaded content in bytes
    max_download_size_bytes: 104857600
    # Timeout for individual download operations in seconds
    download_timeout_seconds: 30
    # Maximum image width/height in pixels (images will be resized)
    max_image_dimension: 2048
    # Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
    allowed_paths:
      - /Users/data/
      - my-bucket/uploads/
    # User-Agent header for HTTP downloads. Defaults to 'AntflyDB/1.0' if not set. Some servers (e.g., Wikipedia) reject requests without a User-Agent.
    user_agent: AntflyDB/1.0
  s3_credentials:
    # S3-compatible endpoint (e.g., 's3.amazonaws.com' or 'localhost:9000' for MinIO)
    endpoint: s3.amazonaws.com
    # Enable SSL/TLS for S3 connections (default: true for AWS, false for local MinIO)
    use_ssl: true
    # AWS access key ID. Supports keystore syntax for secret lookup. Falls back to AWS_ACCESS_KEY_ID environment variable if not set.
    access_key_id: your-access-key-id
    # AWS secret access key. Supports keystore syntax for secret lookup. Falls back to AWS_SECRET_ACCESS_KEY environment variable if not set.
    secret_access_key: your-secret-access-key
    # Optional AWS session token for temporary credentials. Supports keystore syntax for secret lookup.
    session_token: your-session-token
  # How long to keep models loaded in memory after last use (Ollama-compatible).
  # Models are automatically unloaded after this duration of inactivity.
  # Use Go duration format: "5m" (5 minutes), "1h" (1 hour), "0" (eager loading).
  # Defaults to "5m" (lazy loading) like Ollama. Set to "0" to explicitly enable eager loading
  # where all models are loaded at startup and never unloaded.
  # 
  keep_alive: 5m
  # Maximum number of models to keep loaded in memory simultaneously.
  # When this limit is reached, the least recently used model is unloaded (LRU eviction).
  # Set to 0 for unlimited (default). Only effective when keep_alive is non-zero.
  # 
  max_loaded_models: 3
  # Number of concurrent inference pipelines per model. Each pipeline loads
  # a copy of the model, so higher values use more memory but allow more
  # concurrent requests. Set to 0 to use the default (min(NumCPU, 4)).
  # 
  pool_size: 1
  # Backend priority order for model loading with optional device specifiers.
  # Format: `backend` or `backend:device` where device defaults to `auto`.
  # 
  # Termite tries entries in order and uses the first available backend+device
  # combination that supports the model.
  # 
  # **Backends** (depend on build tags):
  # - `go` - Pure Go inference (always available, CPU only, slowest)
  # - `onnx` - ONNX Runtime (requires -tags="onnx,ORT", fastest)
  # - `xla` - GoMLX XLA (requires -tags="xla,XLA", TPU/CUDA/CPU)
  # 
  # **Devices**:
  # - `auto` - Auto-detect best available (default)
  # - `cuda` - NVIDIA CUDA GPU
  # - `coreml` - Apple CoreML (macOS only, used by ONNX)
  # - `tpu` - Google TPU (used by XLA)
  # - `cpu` - Force CPU only
  # 
  # **Examples**:
  # - `["onnx", "xla", "go"]` - Try backends with auto device detection
  # - `["onnx:cuda", "xla:tpu", "onnx:cpu", "go"]` - Prefer GPU, fall back to CPU
  # - `["onnx:coreml", "go"]` - macOS with CoreML acceleration
  # 
  backend_priority:
    - "onnx:cuda"
    - "xla:tpu"
    - "onnx:cpu"
    - "xla:cpu"
    - go
  # Maximum number of concurrent inference requests allowed.
  # Additional requests will be queued up to max_queue_size.
  # Set to 0 for unlimited (default).
  # 
  max_concurrent_requests: 4
  # Maximum number of requests to queue when max_concurrent_requests is reached.
  # When the queue is full, new requests receive 503 Service Unavailable with Retry-After header.
  # Set to 0 for unlimited queue (default). Only effective when max_concurrent_requests > 0.
  # 
  max_queue_size: 100
  # Maximum time to wait for a request to complete, including queue wait time.
  # Use Go duration format: "30s", "1m", "0" (no timeout, default).
  # Requests exceeding this timeout receive 504 Gateway Timeout.
  # 
  request_timeout: 30s
  # List of model names to preload at startup (Ollama-compatible).
  # These models are loaded immediately when Termite starts, avoiding first-request latency.
  # Model names should match those in models_dir/embedders/ (e.g., "BAAI/bge-small-en-v1.5").
  # Only effective when keep_alive is non-zero (lazy loading mode).
  # 
  preload:
    - BAAI/bge-small-en-v1.5
    - openai/clip-vit-base-patch32
  # Maximum memory (in MB) to use for loaded models.
  # When this limit is approached, least recently used models are unloaded.
  # Set to 0 for unlimited (default). This is an advisory limit - actual memory
  # usage depends on model sizes and may temporarily exceed this value.
  # Works alongside max_loaded_models for fine-grained control.
  # 
  max_memory_mb: 4096
  # Per-model loading strategy overrides. Maps model names to their loading strategy.
  # Models not in this map use the default strategy based on keep_alive:
  # - If keep_alive>0 (default "5m"): lazy loading (load on demand, unload after idle)
  # - If keep_alive="0": eager loading (load at startup, never unload)
  # 
  # When a model has strategy "eager" in this map:
  # - It is loaded at startup (as part of preload)
  # - It is never unloaded, even when keep_alive>0 (pinned in memory)
  # 
  # This allows mixing eager and lazy models in the same pool.
  # 
  model_strategies:
    "BAAI/bge-small-en-v1.5": "eager"
    "mirth/chonky-mmbert-small-multilingual-1": "lazy"
  # Whether the dashboard should show model download commands.
  # Defaults to true for standalone/swarm mode. Set to false in managed
  # deployments (e.g., Kubernetes operator) where models are managed externally.
  # 
  allow_downloads: true
  # Logging configuration for Termite services
  log:
    # Logging verbosity level
    level: info
    # Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
    style: terminal
tls:
  # Path to TLS certificate file
  cert: /path/to/cert.pem
  # Path to TLS key file
  key: /path/to/key.pem
# Configuration for remote content fetching (remotePDF, remoteMedia, remoteText templates).
# Consolidates S3 credentials and security settings separate from backup storage.
# 
# **Credential Resolution Order:**
# 1. Explicit `credentials="name"` parameter in template
# 2. First credential where `buckets` glob pattern matches URL's bucket
# 3. `default_s3` credential
# 4. Legacy fallback: `storage.s3` credentials (backward compatibility)
# 
remote_content:
  security:
    # Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
    allowed_hosts:
      - example.com
      - cdn.example.com
      - 192.0.2.1
    # Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
    block_private_ips: true
    # Maximum size of downloaded content in bytes
    max_download_size_bytes: 104857600
    # Timeout for individual download operations in seconds
    download_timeout_seconds: 30
    # Maximum image width/height in pixels (images will be resized)
    max_image_dimension: 2048
    # Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
    allowed_paths:
      - /Users/data/
      - my-bucket/uploads/
    # User-Agent header for HTTP downloads. Defaults to 'AntflyDB/1.0' if not set. Some servers (e.g., Wikipedia) reject requests without a User-Agent.
    user_agent: AntflyDB/1.0
  # Default S3 credential name when no bucket pattern matches.
  default_s3: primary
  # Named S3 credentials for remote content fetching.
  s3:
    "primary": "[object Object]"
    "untrusted": "[object Object]"
  # Named HTTP credentials for authenticated endpoints.
  http:
    "internal-api": "[object Object]"
# Named speech-to-text provider configurations.
# 
# Define named STT providers that can be referenced by templates and API calls.
# The first provider defined becomes the default when no provider name is specified.
# 
# **Example:**
# ```yaml
# speech_to_text:
#   whisper-local:
#     provider: termite
#     api_url: "http://localhost:8080"
#     model: openai/whisper-base
#   openai-whisper:
#     provider: openai
#     model: whisper-1
# ```
# 
# Then in templates: `{{transcribeAudio url="..." provider="whisper-local"}}`
# 
speech_to_text:
  "whisper-local": "[object Object]"
  "openai-whisper": "[object Object]"
cors:
  # Controls whether CORS is enabled
  enabled: true
  # List of allowed origins for CORS requests. Use ['*'] to allow all origins. Defaults to ['*'] if empty and enabled is true.
  allowed_origins:
    - "https://example.com"
    - "https://app.example.com"
  # HTTP methods allowed for CORS requests
  allowed_methods:
    - GET
    - POST
    - PUT
    - DELETE
  # Headers that can be used in CORS requests
  allowed_headers:
    - Content-Type
    - Authorization
  # Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read.
  exposed_headers:
    - X-Total-Count
  # Indicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*'].
  allow_credentials: false
  # How long (in seconds) the results of a preflight request can be cached
  max_age: 3600
# How many replicas of each shard should be maintained.
replication_factor: 3
# Enables authentication and authorization (RBAC) for the API.
enable_auth: false
# Disables automatic shard reallocation (splitting/merging).
disable_shard_alloc: false
# Cooldown period after shard operations (start/stop/split). Format: duration string like '1m', '30s'. Default: '1m' (one minute).
shard_cooldown_period: 1m
# Maximum duration for a shard split operation before triggering rollback. Format: duration string like '5m', '30s'. Default: '5m' (five minutes).
split_timeout: 5m
# Maximum size of a shard in bytes. Used to determine when to split shards.
max_shard_size_bytes: 1073741824
# Maximum number of shards that can be created for a single table.
max_shards_per_table: 100
# Default number of shards to create for a new table.
default_shards_per_table: 4
# URL of the model registry for the Antfarm dashboard. Defaults to https://registry.antfly.io/v1
registry_url: "https://registry.antfly.io/v1"
# Named embedder configurations for embedding operations.
# 
# Define named embedders that can be referenced by indexes, templates, and API calls.
# The first embedder defined becomes the default when no embedder name is specified.
# 
# **API Key Configuration:**
# 
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
# 
# 1. **Keystore** (recommended for production):
#    ```bash
#    antfly keystore create
#    antfly keystore add openai.api_key
#    ```
#    Then reference in config: `api_key: ${secret:openai.api_key}`
# 
# 2. **Environment variable** (simpler for development):
#    Omit `api_key` from config and set the appropriate env var:
#    - OpenAI: `OPENAI_API_KEY`
#    - Gemini: `GEMINI_API_KEY`
#    - Anthropic: `ANTHROPIC_API_KEY`
#    - Cohere: `COHERE_API_KEY`
# 
# See [Secrets Management](/docs/secrets) for complete documentation.
# 
# **Example:**
# ```yaml
# embedders:
#   openai-small:
#     provider: openai
#     model: text-embedding-3-small
#   termite-local:
#     provider: termite
#     model: bge-base-en-v1.5
#     api_url: "http://localhost:8082"
# ```
# 
embedders:
  "openai-small": "[object Object]"
  "termite-local": "[object Object]"
# Named generator configurations for AI operations.
# 
# Define named generators that can be referenced by chains, templates, and API calls.
# The first generator defined becomes the default when no generator name is specified.
# 
# **API Key Configuration:**
# 
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
# 
# 1. **Keystore** (recommended for production):
#    ```bash
#    antfly keystore create
#    antfly keystore add gemini.api_key
#    ```
#    Then reference in config: `api_key: ${secret:gemini.api_key}`
# 
# 2. **Environment variable** (simpler for development):
#    Omit `api_key` from config and set the appropriate env var:
#    - Gemini: `GEMINI_API_KEY`
#    - OpenAI: `OPENAI_API_KEY`
#    - Anthropic: `ANTHROPIC_API_KEY`
# 
# See [Secrets Management](/docs/secrets) for complete documentation.
# 
# **Example:**
# ```yaml
# generators:
#   gemini-flash:
#     provider: gemini
#     model: gemini-2.5-flash
#   ollama-local:
#     provider: ollama
#     model: llama3
#   openai-gpt4:
#     provider: openai
#     model: gpt-4.1
# ```
# 
generators:
  "gemini-flash": "[object Object]"
  "ollama-local": "[object Object]"
  "openai-gpt4": "[object Object]"
# Named chain configurations for fallback/retry logic.
# 
# Chains are ordered lists of generators with retry and fallback logic.
# Each link references a generator by name from the `generators` map.
# The first chain defined becomes the default when no chain name is specified.
# 
# **Chain Conditions:**
# - `on_error`: Try next generator on any error (default)
# - `on_rate_limit`: Try next only on rate limit (429) errors
# - `on_timeout`: Try next only on timeout errors
# - `always`: Always try the next generator
# 
# **Example:**
# ```yaml
# chains:
#   default:
#     - generator: gemini-flash  # Reference by name
#       retry:
#         max_attempts: 3
#       condition: on_rate_limit
#     - generator: ollama-local  # Reference by name
#   with-inline:
#     - generator: gemini-flash
#     - generator_config:  # Inline config
#         provider: anthropic
#         model: claude-sonnet-4-5-20250929
# ```
# 
# Then in API calls: `chain: "default"` or `chain: "with-inline"`
# 
chains:
  "default": "[object Object],[object Object]"
  "with-inline": "[object Object],[object Object]"
# Named reranker configurations for search result reranking.
# 
# Define named rerankers that can be referenced by indexes, search queries, and API calls.
# The first reranker defined becomes the default when no reranker name is specified.
# 
# **Example:**
# ```yaml
# rerankers:
#   cohere-english:
#     provider: cohere
#     model: rerank-english-v3.0
#   termite-local:
#     provider: termite
#     model: mxbai-rerank-base-v1
#     url: "http://localhost:8080"
# ```
# 
rerankers:
  "cohere-english": "[object Object]"
  "termite-local": "[object Object]"
# Named chunker configurations for text chunking.
# 
# Define named chunkers that can be referenced by indexes and API calls.
# The first chunker defined becomes the default when no chunker name is specified.
# 
# **Example:**
# ```yaml
# chunkers:
#   fixed-500:
#     provider: termite
#     model: fixed
#     target_tokens: 500
#     overlap_tokens: 50
#   semantic:
#     provider: termite
#     model: semantic-chunker
#     api_url: "http://localhost:8080"
# ```
# 
chunkers:
  "fixed-500": "[object Object]"
  "semantic": "[object Object]"

Configuration Properties#

Core Settings#

Essential configuration for running Antfly

Property	Type	Required	Default	Description
`log`	object			Logging configuration for Termite services
`health_port`	integer		`4200`	Port for the health/metrics server. Defaults to 4200.
`replication_factor`	uint64	✓		How many replicas of each shard should be maintained. (min: 1, max: 5)
`enable_auth`	boolean		`false`	Enables authentication and authorization (RBAC) for the API.

Storage Configuration#

Configure local and remote storage backends

S3 credentials should never be stored directly in config files. Use the keystore system with $\{secret:aws.access_key_id\} or environment variables. See the secrets documentation for details.

Property	Type	Required	Default	Description
`storage`	object	✓

StorageConfig Properties:

Property	Type	Required	Default	Description
`storage.local`	object	✓
`storage.data`	enum: `local`, `s3`		`local`	Storage backend type
`storage.metadata`	enum: `local`, `s3`		`local`	Storage backend type
`storage.s3`	object

LocalStorageConfig Properties:

Property	Type	Required	Default	Description
`base_dir`	string	✓	`antflydb`	Root directory for all antfly data storage. Defaults to 'antflydb'. (minLength: 1)

S3Info Properties:

Property	Type	Required	Default	Description

Metadata Configuration#

Metadata orchestration cluster settings

The orchestration_urls map keys are metadata node IDs (1, 2, 3, etc.). Each storage node needs to know all metadata node URLs to enroll in the cluster.

Property	Type	Required	Default	Description
`metadata`	object	✓

MetadataInfo Properties:

Property	Type	Required	Default	Description
`metadata.orchestration_urls`	map[string → string]	✓		Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster

Shard Management#

Control automatic shard allocation and sizing

Property	Type	Required	Default	Description
`disable_shard_alloc`	boolean		`false`	Disables automatic shard reallocation (splitting/merging).
`max_shard_size_bytes`	uint64	✓		Maximum size of a shard in bytes. Used to determine when to split shards. (min: 1048576, max: 46170898227200)
`max_shards_per_table`	uint64	✓		Maximum number of shards that can be created for a single table. (min: 1)
`default_shards_per_table`	uint64	✓		Default number of shards to create for a new table. (min: 1)

Security & CORS#

TLS, content security, and cross-origin settings

Property	Type	Required	Default	Description
`tls`	object
`cors`	object

TLSInfo Properties:

Property	Type	Required	Default	Description
`tls.cert`	string			Path to TLS certificate file
`tls.key`	string			Path to TLS key file

ContentSecurityConfig Properties:

Property	Type	Default	Description
`allowed_hosts`	array[string]		Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
`block_private_ips`	boolean	`true`	Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
`max_download_size_bytes`	int64	`104857600`	Maximum size of downloaded content in bytes (min: 0)
`download_timeout_seconds`	integer	`30`	Timeout for individual download operations in seconds (min: 1)
`max_image_dimension`	integer	`2048`	Maximum image width/height in pixels (images will be resized) (min: 1)
`allowed_paths`	array[string]		Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
`user_agent`	string		User-Agent header for HTTP downloads. Defaults to 'AntflyDB/1.0' if not set. Some servers (e.g., Wikipedia) reject requests without a User-Agent.

CORSConfig Properties:

Property	Type	Default	Description
`cors.enabled`	boolean	`true`	Controls whether CORS is enabled
`cors.allowed_origins`	array[string]		List of allowed origins for CORS requests. Use [''] to allow all origins. Defaults to [''] if empty and enabled is true.
`cors.allowed_methods`	array[string]	`["GET","POST","PUT","DELETE","OPTIONS","PATCH"]`	HTTP methods allowed for CORS requests
`cors.allowed_headers`	array[string]	`["Content-Type","Authorization","X-Requested-With","Accept","Origin"]`	Headers that can be used in CORS requests
`cors.exposed_headers`	array[string]		Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read.
`cors.allow_credentials`	boolean	`false`	Indicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*'].
`cors.max_age`	integer	`3600`	How long (in seconds) the results of a preflight request can be cached (min: 0)

External Services#

Optional Termite and MCP service integration

Termite provides centralized embedding generation and document chunking with caching. Enable it with the --termite flag when starting Antfly.

Property	Type	Required	Default	Description
`termite`	object