Configuration Reference#

Configuration schema for Antfly distributed key-value store and vector search engine.

Antfly is configured using a YAML configuration file. This reference documents all available configuration options.

Minimal Configuration#

The minimum required configuration for running Antfly:

storage:
  local:
    base_dir: antflydb
metadata:
  orchestration_urls:
    "1": "http://localhost:5001"
    "2": "http://localhost:5002"
    "3": "http://localhost:5003"
replication_factor: 3
max_shard_size_bytes: 1073741824
max_shards_per_table: 100
default_shards_per_table: 4

For development, you can use antfly swarm mode which runs a single-node cluster. For production deployments, use distributed mode with multiple metadata and storage nodes.

Complete Example#

A comprehensive configuration showing all available options:

# Logging configuration for Termite services
log:
  # Logging verbosity level
  level: info
  # Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
  style: terminal
# Port for the health/metrics server. Defaults to 4200.
health_port: 4200
storage:
  local:
    # Root directory for all antfly data storage. Defaults to 'antflydb'.
    base_dir: antflydb
  # Storage backend type
  data: local
  # Storage backend type
  metadata: local
  s3: {}
metadata:
  # Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster
  orchestration_urls:
    "1": "http://localhost:5001"
    "2": "http://localhost:5002"
    "3": "http://localhost:5003"
termite:
  # URL of the Termite embedding/chunking service
  api_url: "http://localhost:8080"
  # Base directory containing model subdirectories. Termite auto-discovers models from:
  # - `{models_dir}/embedders/` - Embedding models (ONNX)
  # - `{models_dir}/chunkers/` - Chunking models (ONNX)
  # - `{models_dir}/rerankers/` - Reranking models (ONNX)
  # - `{models_dir}/recognizers/` - Recognition models (ONNX)
  # - `{models_dir}/rewriters/` - Seq2Seq rewriter models (ONNX)
  # 
  # Defaults to ~/.termite/models (set via viper). If not set, only built-in fixed chunking is available.
  # 
  models_dir: ~/.termite/models
  content_security:
    # Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
    allowed_hosts:
      - example.com
      - cdn.example.com
      - 192.0.2.1
    # Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
    block_private_ips: true
    # Maximum size of downloaded content in bytes
    max_download_size_bytes: 104857600
    # Timeout for individual download operations in seconds
    download_timeout_seconds: 30
    # Maximum image width/height in pixels (images will be resized)
    max_image_dimension: 2048
    # Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
    allowed_paths:
      - /Users/data/
      - my-bucket/uploads/
  s3_credentials:
    # S3-compatible endpoint (e.g., 's3.amazonaws.com' or 'localhost:9000' for MinIO)
    endpoint: s3.amazonaws.com
    # Enable SSL/TLS for S3 connections (default: true for AWS, false for local MinIO)
    use_ssl: true
    # AWS access key ID. Supports keystore syntax for secret lookup. Falls back to AWS_ACCESS_KEY_ID environment variable if not set.
    access_key_id: your-access-key-id
    # AWS secret access key. Supports keystore syntax for secret lookup. Falls back to AWS_SECRET_ACCESS_KEY environment variable if not set.
    secret_access_key: your-secret-access-key
    # Optional AWS session token for temporary credentials. Supports keystore syntax for secret lookup.
    session_token: your-session-token
  # How long to keep models loaded in memory after last use (Ollama-compatible).
  # Models are automatically unloaded after this duration of inactivity.
  # Use Go duration format: "5m" (5 minutes), "1h" (1 hour), "0" (eager loading).
  # Defaults to "5m" (lazy loading) like Ollama. Set to "0" to explicitly enable eager loading
  # where all models are loaded at startup and never unloaded.
  # 
  keep_alive: 5m
  # Maximum number of models to keep loaded in memory simultaneously.
  # When this limit is reached, the least recently used model is unloaded (LRU eviction).
  # Set to 0 for unlimited (default). Only effective when keep_alive is non-zero.
  # 
  max_loaded_models: 3
  # Number of concurrent inference pipelines per model. Each pipeline loads
  # a copy of the model, so higher values use more memory but allow more
  # concurrent requests. Set to 0 to use the default (min(NumCPU, 4)).
  # 
  pool_size: 1
  # Backend priority order for model loading with optional device specifiers.
  # Format: `backend` or `backend:device` where device defaults to `auto`.
  # 
  # Termite tries entries in order and uses the first available backend+device
  # combination that supports the model.
  # 
  # **Backends** (depend on build tags):
  # - `go` - Pure Go inference (always available, CPU only, slowest)
  # - `onnx` - ONNX Runtime (requires -tags="onnx,ORT", fastest)
  # - `xla` - GoMLX XLA (requires -tags="xla,XLA", TPU/CUDA/CPU)
  # 
  # **Devices**:
  # - `auto` - Auto-detect best available (default)
  # - `cuda` - NVIDIA CUDA GPU
  # - `coreml` - Apple CoreML (macOS only, used by ONNX)
  # - `tpu` - Google TPU (used by XLA)
  # - `cpu` - Force CPU only
  # 
  # **Examples**:
  # - `["onnx", "xla", "go"]` - Try backends with auto device detection
  # - `["onnx:cuda", "xla:tpu", "onnx:cpu", "go"]` - Prefer GPU, fall back to CPU
  # - `["onnx:coreml", "go"]` - macOS with CoreML acceleration
  # 
  backend_priority:
    - "onnx:cuda"
    - "xla:tpu"
    - "onnx:cpu"
    - "xla:cpu"
    - go
  # Maximum number of concurrent inference requests allowed.
  # Additional requests will be queued up to max_queue_size.
  # Set to 0 for unlimited (default).
  # 
  max_concurrent_requests: 4
  # Maximum number of requests to queue when max_concurrent_requests is reached.
  # When the queue is full, new requests receive 503 Service Unavailable with Retry-After header.
  # Set to 0 for unlimited queue (default). Only effective when max_concurrent_requests > 0.
  # 
  max_queue_size: 100
  # Maximum time to wait for a request to complete, including queue wait time.
  # Use Go duration format: "30s", "1m", "0" (no timeout, default).
  # Requests exceeding this timeout receive 504 Gateway Timeout.
  # 
  request_timeout: 30s
  # List of model names to preload at startup (Ollama-compatible).
  # These models are loaded immediately when Termite starts, avoiding first-request latency.
  # Model names should match those in models_dir/embedders/ (e.g., "BAAI/bge-small-en-v1.5").
  # Only effective when keep_alive is non-zero (lazy loading mode).
  # 
  preload:
    - BAAI/bge-small-en-v1.5
    - openai/clip-vit-base-patch32
  # Maximum memory (in MB) to use for loaded models.
  # When this limit is approached, least recently used models are unloaded.
  # Set to 0 for unlimited (default). This is an advisory limit - actual memory
  # usage depends on model sizes and may temporarily exceed this value.
  # Works alongside max_loaded_models for fine-grained control.
  # 
  max_memory_mb: 4096
  # Per-model loading strategy overrides. Maps model names to their loading strategy.
  # Models not in this map use the default strategy based on keep_alive:
  # - If keep_alive>0 (default "5m"): lazy loading (load on demand, unload after idle)
  # - If keep_alive="0": eager loading (load at startup, never unload)
  # 
  # When a model has strategy "eager" in this map:
  # - It is loaded at startup (as part of preload)
  # - It is never unloaded, even when keep_alive>0 (pinned in memory)
  # 
  # This allows mixing eager and lazy models in the same pool.
  # 
  model_strategies:
    "BAAI/bge-small-en-v1.5": "eager"
    "mirth/chonky-mmbert-small-multilingual-1": "lazy"
  # Whether the dashboard should show model download commands.
  # Defaults to true for standalone/swarm mode. Set to false in managed
  # deployments (e.g., Kubernetes operator) where models are managed externally.
  # 
  allow_downloads: true
  # Logging configuration for Termite services
  log:
    # Logging verbosity level
    level: info
    # Logging output format style. 'terminal' for colorized console, 'json' for structured JSON, 'logfmt' for token-efficient key=value pairs, 'noop' for silent.
    style: terminal
tls:
  # Path to TLS certificate file
  cert: /path/to/cert.pem
  # Path to TLS key file
  key: /path/to/key.pem
# Configuration for remote content fetching (remotePDF, remoteMedia, remoteText templates).
# Consolidates S3 credentials and security settings separate from backup storage.
# 
# **Credential Resolution Order:**
# 1. Explicit `credentials="name"` parameter in template
# 2. First credential where `buckets` glob pattern matches URL's bucket
# 3. `default_s3` credential
# 4. Legacy fallback: `storage.s3` credentials (backward compatibility)
# 
remote_content:
  security:
    # Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
    allowed_hosts:
      - example.com
      - cdn.example.com
      - 192.0.2.1
    # Block requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
    block_private_ips: true
    # Maximum size of downloaded content in bytes
    max_download_size_bytes: 104857600
    # Timeout for individual download operations in seconds
    download_timeout_seconds: 30
    # Maximum image width/height in pixels (images will be resized)
    max_image_dimension: 2048
    # Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).
    allowed_paths:
      - /Users/data/
      - my-bucket/uploads/
  # Default S3 credential name when no bucket pattern matches.
  default_s3: primary
  # Named S3 credentials for remote content fetching.
  s3:
    "primary": "[object Object]"
    "untrusted": "[object Object]"
  # Named HTTP credentials for authenticated endpoints.
  http:
    "internal-api": "[object Object]"
# Named speech-to-text provider configurations.
# 
# Define named STT providers that can be referenced by templates and API calls.
# The first provider defined becomes the default when no provider name is specified.
# 
# **Example:**
# ```yaml
# speech_to_text:
#   whisper-local:
#     provider: termite
#     api_url: "http://localhost:8080"
#     model: openai/whisper-base
#   openai-whisper:
#     provider: openai
#     model: whisper-1
# ```
# 
# Then in templates: `{{transcribeAudio url="..." provider="whisper-local"}}`
# 
speech_to_text:
  "whisper-local": "[object Object]"
  "openai-whisper": "[object Object]"
cors:
  # Controls whether CORS is enabled
  enabled: true
  # List of allowed origins for CORS requests. Use ['*'] to allow all origins. Defaults to ['*'] if empty and enabled is true.
  allowed_origins:
    - "https://example.com"
    - "https://app.example.com"
  # HTTP methods allowed for CORS requests
  allowed_methods:
    - GET
    - POST
    - PUT
    - DELETE
  # Headers that can be used in CORS requests
  allowed_headers:
    - Content-Type
    - Authorization
  # Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read.
  exposed_headers:
    - X-Total-Count
  # Indicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*'].
  allow_credentials: false
  # How long (in seconds) the results of a preflight request can be cached
  max_age: 3600
# How many replicas of each shard should be maintained.
replication_factor: 3
# Enables authentication and authorization (RBAC) for the API.
enable_auth: false
# Disables automatic shard reallocation (splitting/merging).
disable_shard_alloc: false
# Cooldown period after shard operations (start/stop/split). Format: duration string like '1m', '30s'. Default: '1m' (one minute).
shard_cooldown_period: 1m
# Maximum duration for a shard split operation before triggering rollback. Format: duration string like '5m', '30s'. Default: '5m' (five minutes).
split_timeout: 5m
# Maximum size of a shard in bytes. Used to determine when to split shards.
max_shard_size_bytes: 1073741824
# Maximum number of shards that can be created for a single table.
max_shards_per_table: 100
# Default number of shards to create for a new table.
default_shards_per_table: 4
# URL of the model registry for the Antfarm dashboard. Defaults to https://registry.antfly.io/v1
registry_url: "https://registry.antfly.io/v1"
# Named embedder configurations for embedding operations.
# 
# Define named embedders that can be referenced by indexes, templates, and API calls.
# The first embedder defined becomes the default when no embedder name is specified.
# 
# **API Key Configuration:**
# 
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
# 
# 1. **Keystore** (recommended for production):
#    ```bash
#    antfly keystore create
#    antfly keystore add openai.api_key
#    ```
#    Then reference in config: `api_key: ${secret:openai.api_key}`
# 
# 2. **Environment variable** (simpler for development):
#    Omit `api_key` from config and set the appropriate env var:
#    - OpenAI: `OPENAI_API_KEY`
#    - Gemini: `GEMINI_API_KEY`
#    - Anthropic: `ANTHROPIC_API_KEY`
#    - Cohere: `COHERE_API_KEY`
# 
# See [Secrets Management](/docs/secrets) for complete documentation.
# 
# **Example:**
# ```yaml
# embedders:
#   openai-small:
#     provider: openai
#     model: text-embedding-3-small
#   termite-local:
#     provider: termite
#     model: bge-base-en-v1.5
#     api_url: "http://localhost:8082"
# ```
# 
embedders:
  "openai-small": "[object Object]"
  "termite-local": "[object Object]"
# Named generator configurations for AI operations.
# 
# Define named generators that can be referenced by chains, templates, and API calls.
# The first generator defined becomes the default when no generator name is specified.
# 
# **API Key Configuration:**
# 
# API keys can be provided via the encrypted keystore (recommended) or environment variables:
# 
# 1. **Keystore** (recommended for production):
#    ```bash
#    antfly keystore create
#    antfly keystore add gemini.api_key
#    ```
#    Then reference in config: `api_key: ${secret:gemini.api_key}`
# 
# 2. **Environment variable** (simpler for development):
#    Omit `api_key` from config and set the appropriate env var:
#    - Gemini: `GEMINI_API_KEY`
#    - OpenAI: `OPENAI_API_KEY`
#    - Anthropic: `ANTHROPIC_API_KEY`
# 
# See [Secrets Management](/docs/secrets) for complete documentation.
# 
# **Example:**
# ```yaml
# generators:
#   gemini-flash:
#     provider: gemini
#     model: gemini-2.5-flash
#   ollama-local:
#     provider: ollama
#     model: llama3
#   openai-gpt4:
#     provider: openai
#     model: gpt-4.1
# ```
# 
generators:
  "gemini-flash": "[object Object]"
  "ollama-local": "[object Object]"
  "openai-gpt4": "[object Object]"
# Named chain configurations for fallback/retry logic.
# 
# Chains are ordered lists of generators with retry and fallback logic.
# Each link references a generator by name from the `generators` map.
# The first chain defined becomes the default when no chain name is specified.
# 
# **Chain Conditions:**
# - `on_error`: Try next generator on any error (default)
# - `on_rate_limit`: Try next only on rate limit (429) errors
# - `on_timeout`: Try next only on timeout errors
# - `always`: Always try the next generator
# 
# **Example:**
# ```yaml
# chains:
#   default:
#     - generator: gemini-flash  # Reference by name
#       retry:
#         max_attempts: 3
#       condition: on_rate_limit
#     - generator: ollama-local  # Reference by name
#   with-inline:
#     - generator: gemini-flash
#     - generator_config:  # Inline config
#         provider: anthropic
#         model: claude-sonnet-4-5-20250929
# ```
# 
# Then in API calls: `chain: "default"` or `chain: "with-inline"`
# 
chains:
  "default": "[object Object],[object Object]"
  "with-inline": "[object Object],[object Object]"
# Named reranker configurations for search result reranking.
# 
# Define named rerankers that can be referenced by indexes, search queries, and API calls.
# The first reranker defined becomes the default when no reranker name is specified.
# 
# **Example:**
# ```yaml
# rerankers:
#   cohere-english:
#     provider: cohere
#     model: rerank-english-v3.0
#   termite-local:
#     provider: termite
#     model: mxbai-rerank-base-v1
#     url: "http://localhost:8080"
# ```
# 
rerankers:
  "cohere-english": "[object Object]"
  "termite-local": "[object Object]"
# Named chunker configurations for text chunking.
# 
# Define named chunkers that can be referenced by indexes and API calls.
# The first chunker defined becomes the default when no chunker name is specified.
# 
# **Example:**
# ```yaml
# chunkers:
#   fixed-500:
#     provider: termite
#     model: fixed
#     target_tokens: 500
#     overlap_tokens: 50
#   semantic:
#     provider: termite
#     model: semantic-chunker
#     api_url: "http://localhost:8080"
# ```
# 
chunkers:
  "fixed-500": "[object Object]"
  "semantic": "[object Object]"

Configuration Properties#

Core Settings#

Essential configuration for running Antfly

PropertyTypeRequiredDefaultDescription
logobjectLogging configuration for Termite services
health_portinteger4200Port for the health/metrics server. Defaults to 4200.
replication_factoruint64How many replicas of each shard should be maintained. (min: 1, max: 5)
enable_authbooleanfalseEnables authentication and authorization (RBAC) for the API.

Storage Configuration#

Configure local and remote storage backends

S3 credentials should never be stored directly in config files. Use the keystore system with $\{secret:aws.access_key_id\} or environment variables. See the secrets documentation for details.

PropertyTypeRequiredDefaultDescription
storageobject

StorageConfig Properties:

PropertyTypeRequiredDefaultDescription
storage.localobject
storage.dataenum: local, s3localStorage backend type
storage.metadataenum: local, s3localStorage backend type
storage.s3object

LocalStorageConfig Properties:

PropertyTypeRequiredDefaultDescription
base_dirstringantflydbRoot directory for all antfly data storage. Defaults to 'antflydb'. (minLength: 1)

S3Info Properties:

PropertyTypeRequiredDefaultDescription

Metadata Configuration#

Metadata orchestration cluster settings

The orchestration_urls map keys are metadata node IDs (1, 2, 3, etc.). Each storage node needs to know all metadata node URLs to enroll in the cluster.

PropertyTypeRequiredDefaultDescription
metadataobject

MetadataInfo Properties:

PropertyTypeRequiredDefaultDescription
metadata.orchestration_urlsmap[string → string]Mapping from Metadata Node ID (hex string) to its URL used by store nodes for enrolling into the cluster

Shard Management#

Control automatic shard allocation and sizing

PropertyTypeRequiredDefaultDescription
disable_shard_allocbooleanfalseDisables automatic shard reallocation (splitting/merging).
max_shard_size_bytesuint64Maximum size of a shard in bytes. Used to determine when to split shards. (min: 1048576, max: 46170898227200)
max_shards_per_tableuint64Maximum number of shards that can be created for a single table. (min: 1)
default_shards_per_tableuint64Default number of shards to create for a new table. (min: 1)

Security & CORS#

TLS, content security, and cross-origin settings

PropertyTypeRequiredDefaultDescription
tlsobject
corsobject

TLSInfo Properties:

PropertyTypeRequiredDefaultDescription
tls.certstringPath to TLS certificate file
tls.keystringPath to TLS key file

ContentSecurityConfig Properties:

PropertyTypeRequiredDefaultDescription
allowed_hostsarray[string]Whitelist of allowed hostnames/IPs for link downloads. If empty, all hosts are allowed (except private IPs if block_private_ips is true).
block_private_ipsbooleantrueBlock requests to private IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
max_download_size_bytesint64104857600Maximum size of downloaded content in bytes (min: 0)
download_timeout_secondsinteger30Timeout for individual download operations in seconds (min: 1)
max_image_dimensioninteger2048Maximum image width/height in pixels (images will be resized) (min: 1)
allowed_pathsarray[string]Whitelist of allowed path prefixes for file:// and s3:// URLs. If empty, all paths are allowed. For file:// use absolute paths (e.g., /Users/data/). For s3:// use bucket/prefix (e.g., my-bucket/uploads/).

CORSConfig Properties:

PropertyTypeRequiredDefaultDescription
cors.enabledbooleantrueControls whether CORS is enabled
cors.allowed_originsarray[string]List of allowed origins for CORS requests. Use [''] to allow all origins. Defaults to [''] if empty and enabled is true.
cors.allowed_methodsarray[string]["GET","POST","PUT","DELETE","OPTIONS","PATCH"]HTTP methods allowed for CORS requests
cors.allowed_headersarray[string]["Content-Type","Authorization","X-Requested-With","Accept","Origin"]Headers that can be used in CORS requests
cors.exposed_headersarray[string]Headers exposed to the client. Useful if your API returns custom headers that the frontend needs to read.
cors.allow_credentialsbooleanfalseIndicates whether credentials (cookies, auth headers) are allowed. Note: If true, allowed_origins cannot be ['*'].
cors.max_ageinteger3600How long (in seconds) the results of a preflight request can be cached (min: 0)

External Services#

Optional Termite and MCP service integration

Termite provides centralized embedding generation and document chunking with caching. Enable it with the --termite flag when starting Antfly.

PropertyTypeRequiredDefaultDescription
termiteobject