Table Management#

Manage tables in your Antfly cluster. Tables store your documents and support multiple indexes.

Schema#

Schemas define the structure and types of data in your tables using JSON Schema with custom Antfly extensions. While optional, defining a schema provides several benefits:

  • Type Safety: Ensures data consistency across documents
  • Optimized Indexing: Full Text Index uses type information for appropriate Bleve mappings
  • Better Search: Type-aware indexing improves search relevance and performance
  • Query Optimization: Enables type-specific query operations

JSON Schema Structure#

Antfly uses standard JSON Schema with custom extensions:

{
  "document_schemas": {
    "article": {
      "schema": {
        "type": "object",
        "properties": {
          "title": {
            "type": "string",
            "x-antfly-types": ["text", "keyword"]
          },
          "body": {
            "type": "string",
            "x-antfly-types": ["html"]
          },
          "tags": {
            "type": "array",
            "items": {"type": "string"},
            "x-antfly-types": ["keyword"]
          },
          "metadata": {
            "type": "object",
            "properties": {
              "author": {"type": "string"}
            }
          }
        },
        "additionalProperties": false,
        "x-antfly-include-in-all": ["title", "body"]
      }
    }
  }
}

Dynamic Indexing (additionalProperties):

  • "additionalProperties": false - Only index fields defined in properties (recommended for performance)
  • "additionalProperties": true - Index all fields dynamically (flexible but slower)
  • If not specified, dynamic indexing is disabled

Nested Objects:

  • Use "type": "object" with nested properties for structured data
  • Nested objects don't inherit x-antfly-include-in-all from parent
  • Arrays of objects: Use "type": "array" with "items": {"type": "object", ...}

Antfly Extensions#

x-antfly-types#

Array of Antfly type strings specifying how to index a field. Multiple types create multiple field mappings:

  • Single type: "x-antfly-types": ["text"] - Creates standard text field
  • Multiple types: "x-antfly-types": ["text", "keyword"] - Creates fieldName (text) and fieldName__keyword
  • With search-as-you-type: "x-antfly-types": ["text", "search_as_you_type"] - Creates fieldName (text) and fieldName__2gram

Type Combination Rules:

  • text and html are mutually exclusive (choose one as primary type)
  • Primary types (text, html) can be combined with variants (keyword, search_as_you_type)
  • search_as_you_type + keyword allowed without primary type (auto-adds text)

x-antfly-index#

Boolean to disable indexing for a field:

{"type": "string", "x-antfly-index": false}

x-antfly-include-in-all#

Schema-level array of field names to include in the special _all field for cross-field search:

{
  "x-antfly-include-in-all": ["title", "body", "description"]
}

Only text-based types (text, html, keyword, search_as_you_type, link) are included, and only the primary field (not __keyword or __2gram variants).

Antfly Types#

Text Types#

  • text - Full-text searchable with tokenization and analysis. Use for articles, descriptions, searchable content.
  • html - Full-text searchable with HTML tag stripping. Use for HTML content.
  • keyword - Exact-match strings (not analyzed). Use for IDs, tags, categories, filters.
  • search_as_you_type - Edge n-gram analyzer for autocomplete. Creates fieldName__2gram mapping.
  • link - URL/link field (keyword analyzer).

Numeric and Boolean Types#

  • numeric - Numbers (integers and floats). Use for counts, prices, scores.
  • boolean - True/false values.

Date/Time Types#

  • datetime - Timestamp fields. Use for created/updated dates, events.

Geospatial Types#

  • geopoint - Latitude/longitude point. Use for locations, coordinates.
  • geoshape - Complex geographic shape. Use for boundaries, regions.

Special Types#

  • embedding - Vector embeddings (not indexed in Bleve). Use for similarity search.
  • blob - Binary data (not indexed). Use for images, files, binary content.

Type Inference#

If x-antfly-types is not specified, types are inferred from JSON Schema type:

  • "type": "string"["text"]
  • "type": "number" or "integer"["numeric"]
  • "type": "boolean"["boolean"]

Automatic Fields#

Antfly automatically adds these fields to all document schemas:

  • _timestamp: Datetime field for document timestamps
  • _summaries: Object field (map of index names to summary text) for AI-generated summaries

Field Naming Conventions#

When multiple types are specified, Antfly creates additional field mappings with suffixes:

  • Primary field (text or html): Uses original field name
  • Keyword variant: fieldName__keyword
  • Search-as-you-type: fieldName__2gram

Example: "x-antfly-types": ["text", "keyword", "search_as_you_type"] creates:

  • title (text analyzer)
  • title__keyword (keyword analyzer)
  • title__2gram (edge n-gram analyzer)

Best Practices#

  • Use text for searchable content, keyword for exact matching
  • Combine types for flexible querying: ["text", "keyword"] enables both full-text and exact search
  • Add search_as_you_type for autocomplete functionality
  • Use html type for HTML content to strip tags during indexing
  • Specify x-antfly-include-in-all for important searchable fields

Performance Considerations#

  • text and html fields have higher indexing cost due to text analysis
  • keyword fields are faster to index but don't support partial matches
  • search_as_you_type adds edge n-gram indexing overhead
  • embedding and blob fields are automatically excluded from text indexing
  • Multiple types per field increase index size and indexing time

Document TTL (Time-To-Live)#

Antfly supports automatic expiration and deletion of documents after a specified duration. Documents are automatically removed by a background cleanup job running on the Raft leader.

Basic Configuration#

Configure TTL when creating a table using ttl_duration:

{
  "name": "sessions",
  "ttl_duration": "24h",
  "document_schemas": {
    "session": {
      "schema": {
        "type": "object",
        "properties": {
          "user_id": {"type": "string"},
          "data": {"type": "object"}
        }
      }
    }
  }
}

How it works:

  • Documents expire after the specified duration from their timestamp
  • Uses _timestamp field by default (automatically added at insertion)
  • Background cleanup runs every 30 seconds on the Raft leader
  • Expired documents are filtered from queries immediately (before cleanup)
  • Deletions are batched (1000 documents at a time) and go through Raft consensus

Custom TTL Reference Field#

Use a custom timestamp field instead of _timestamp:

{
  "name": "events",
  "ttl_duration": "7d",
  "ttl_field": "created_at",
  "document_schemas": {
    "event": {
      "schema": {
        "type": "object",
        "properties": {
          "created_at": {"type": "string", "format": "date-time"},
          "event_type": {"type": "string"}
        },
        "required": ["created_at"]
      }
    }
  }
}

Requirements:

  • Custom TTL field must be present in all documents
  • Must be in RFC3339 format (e.g., 2025-01-01T12:00:00Z)
  • Documents without the field will fail validation

Duration Format#

TTL durations use Go duration format:

  • 30s - 30 seconds
  • 5m - 5 minutes
  • 24h - 24 hours
  • 7d - 7 days (treated as 168h)
  • 168h - 1 week

Use Cases#

Session Management:

{
  "ttl_duration": "1h",
  "ttl_field": "last_accessed"
}

Sessions expire 1 hour after last_accessed. Update the timestamp on each access to extend the session.

Temporary Caching:

{
  "ttl_duration": "24h"
}

Cache entries expire 24 hours after insertion.

Log Rotation:

{
  "ttl_duration": "30d"
}

Logs are automatically deleted after 30 days.

Performance Characteristics#

Optimized Storage:

  • TTL timestamps stored separately (:t suffix keys) for O(1) lookups
  • No JSON deserialization needed during cleanup scans
  • ~100-1000x faster than scanning full document bodies
  • Minimal storage overhead (~30 bytes per document)

Query Filtering:

  • Single key lookup to check expiration (~microseconds)
  • No impact on query latency
  • Scales to millions of documents

Cleanup Behavior:

  • Runs every 30 seconds (configurable)
  • Leader-only operation (prevents duplicate work)
  • 5-second grace period ensures writes are fully replicated
  • Batched deletions (1000 docs) prevent overwhelming the system

Monitoring#

TTL operations are logged with cleanup metrics:

INFO  Starting TTL cleanup job ttl_duration=24h cleanup_interval=30s
INFO  Cleaned up expired documents count=42 duration=245ms total_expired=1337

Modifying TTL Configuration#

Adding TTL to existing table:

  • Update schema with ttl_duration field
  • Applies retroactively to all documents
  • Documents already expired are marked for immediate deletion

Removing TTL:

  • Set ttl_duration to empty string
  • All expiration processing stops
  • Previously expired documents remain

Changing duration:

  • Update schema with new ttl_duration
  • New duration applies immediately to all documents
  • Expiration recalculated using existing timestamps

Limitations#

  • Clock synchronization required: Use NTP to sync clocks across cluster nodes
  • Cleanup latency: Expired documents deleted within ~30 seconds (cleanup interval)
  • Table-level configuration: TTL applies to all documents in the table
  • Timestamp format: Must be RFC3339 or RFC3339Nano
Common questions about this section
  • Do I need to define a schema before inserting documents?
  • How do I add search capabilities to a table?
  • What's the difference between text and keyword types?

List all tables#

GET/tables

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Parameters#

NameTypeLocationRequiredDescription
prefixstringqueryNoFilter tables by name prefix (e.g., "prod_")
patternstringqueryNoFilter tables by regex pattern (e.g., "^prod_.*_v[0-9]+$")

Code Examples#

curl -X GET "/api/v1/tables?prefix=prod_&pattern=^user_.*" \
    -H "Authorization: Bearer YOUR_API_KEY"

Responses#

[
  {
    "name": "string",
    "description": "Table for user data",
    "indexes": {},
    "shards": {},
    "schema": {
      "version": 0,
      "default_type": "string",
      "enforce_types": true,
      "document_schemas": {},
      "ttl_field": "string",
      "ttl_duration": "string",
      "dynamic_templates": [
        {
          "name": "string",
          "match": "string",
          "unmatch": "string",
          "path_match": "string",
          "path_unmatch": "string",
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "analyzer": "string",
            "index": true,
            "store": true,
            "include_in_all": true,
            "doc_values": true
          }
        }
      ]
    },
    "replication_sources": [
      {
        "type": "postgres",
        "dsn": "${secret:pg_dsn}",
        "postgres_table": "users",
        "key_template": "id",
        "slot_name": "string",
        "publication_name": "string",
        "on_update": [
          {
            "op": "$set",
            "path": "email",
            "value": "{{user_email}}"
          },
          {
            "op": "$set",
            "path": "score",
            "value": "{{score}}"
          },
          {
            "op": "$merge",
            "value": "{{metadata}}"
          },
          {
            "op": "$set",
            "path": "active",
            "value": true
          }
        ],
        "on_delete": [
          {
            "op": "$set",
            "path": "active",
            "value": false
          }
        ],
        "publication_filter": {
          "term": "string",
          "field": "string",
          "boost": 0
        },
        "routes": [
          {
            "target_table": "premium_users",
            "where": {
              "term": "premium",
              "field": "tier"
            }
          },
          {
            "target_table": "free_users",
            "where": {
              "term": "free",
              "field": "tier"
            }
          }
        ]
      }
    ]
  }
]

Create a new table#

POST/tables/{tableName}

Creates a new table with optional schema definition, indexes, and configuration.

Use Cases#

Simple table for unstructured data:

{
  "num_shards": 1
}

Table with full-text search:

{
  "num_shards": 3,
  "schema": {
    "document_schemas": {
      "article": {
        "schema": {
          "type": "object",
          "properties": {
            "id": {
              "type": "string",
              "x-antfly-types": ["keyword"]
            },
            "title": {
              "type": "string",
              "x-antfly-types": ["text", "keyword"]
            },
            "body": {
              "type": "string",
              "x-antfly-types": ["text"]
            }
          },
          "x-antfly-include-in-all": ["title", "body"]
        }
      }
    },
    "default_type": "article"
  },
  "indexes": {
    "search_idx": {
      "type": "full_text_v0"
    }
  }
}

Table with vector similarity search:

{
  "num_shards": 5,
  "description": "Product catalog with semantic search",
  "schema": {
    "document_schemas": {
      "product": {
        "schema": {
          "type": "object",
          "properties": {
            "product_id": {
              "type": "string",
              "x-antfly-types": ["keyword"]
            },
            "name": {
              "type": "string",
              "x-antfly-types": ["text", "keyword"]
            },
            "description": {
              "type": "string",
              "x-antfly-types": ["text"]
            },
            "price": {
              "type": "number",
              "x-antfly-types": ["numeric"]
            }
          },
          "x-antfly-include-in-all": ["name", "description"]
        }
      }
    },
    "default_type": "product"
  },
  "indexes": {
    "semantic_idx": {
      "type": "aknn_v0",
      "field": "description",
      "embedder": {
        "provider": "ollama",
        "model": "all-minilm",
        "url": "http://localhost:11434"
      }
    }
  }
}

Best Practices#

  • Define schema for core fields to improve performance
  • Start with fewer shards for small datasets (1-3)
  • Use meaningful table names (e.g., "products", "users", "articles")
  • Consider adding both full-text and vector indexes for hybrid search

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "num_shards": 3,
    "description": "User profiles with embeddings for semantic search",
    "indexes": {
        "search_index": {
            "type": "full_text_v0"
        },
        "embedding_index": {
            "type": "aknn_v0",
            "dimension": 384,
            "embedder": {
                "provider": "ollama",
                "model": "all-minilm"
            }
        }
    },
    "schema": {
        "version": 0,
        "default_type": "string",
        "enforce_types": true,
        "document_schemas": {},
        "ttl_field": "string",
        "ttl_duration": "string",
        "dynamic_templates": [
            {
                "name": "string",
                "match": "string",
                "unmatch": "string",
                "path_match": "string",
                "path_unmatch": "string",
                "match_mapping_type": "string",
                "mapping": {
                    "type": "text",
                    "analyzer": "string",
                    "index": true,
                    "store": true,
                    "include_in_all": true,
                    "doc_values": true
                }
            }
        ]
    },
    "replication_sources": [
        {
            "type": "postgres",
            "dsn": "${secret:pg_dsn}",
            "postgres_table": "users",
            "key_template": "id",
            "slot_name": "string",
            "publication_name": "string",
            "on_update": [
                {
                    "op": "$set",
                    "path": "email",
                    "value": "{{user_email}}"
                },
                {
                    "op": "$set",
                    "path": "score",
                    "value": "{{score}}"
                },
                {
                    "op": "$merge",
                    "value": "{{metadata}}"
                },
                {
                    "op": "$set",
                    "path": "active",
                    "value": true
                }
            ],
            "on_delete": [
                {
                    "op": "$set",
                    "path": "active",
                    "value": false
                }
            ],
            "publication_filter": {
                "term": "string",
                "field": "string",
                "boost": 0
            },
            "routes": [
                {
                    "target_table": "premium_users",
                    "where": {
                        "term": "premium",
                        "field": "tier"
                    }
                },
                {
                    "target_table": "free_users",
                    "where": {
                        "term": "free",
                        "field": "tier"
                    }
                }
            ]
        }
    ]
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "num_shards": 3,
    "description": "User profiles with embeddings for semantic search",
    "indexes": {
        "search_index": {
            "type": "full_text_v0"
        },
        "embedding_index": {
            "type": "aknn_v0",
            "dimension": 384,
            "embedder": {
                "provider": "ollama",
                "model": "all-minilm"
            }
        }
    },
    "schema": {
        "version": 0,
        "default_type": "string",
        "enforce_types": true,
        "document_schemas": {},
        "ttl_field": "string",
        "ttl_duration": "string",
        "dynamic_templates": [
            {
                "name": "string",
                "match": "string",
                "unmatch": "string",
                "path_match": "string",
                "path_unmatch": "string",
                "match_mapping_type": "string",
                "mapping": {
                    "type": "text",
                    "analyzer": "string",
                    "index": true,
                    "store": true,
                    "include_in_all": true,
                    "doc_values": true
                }
            }
        ]
    },
    "replication_sources": [
        {
            "type": "postgres",
            "dsn": "${secret:pg_dsn}",
            "postgres_table": "users",
            "key_template": "id",
            "slot_name": "string",
            "publication_name": "string",
            "on_update": [
                {
                    "op": "$set",
                    "path": "email",
                    "value": "{{user_email}}"
                },
                {
                    "op": "$set",
                    "path": "score",
                    "value": "{{score}}"
                },
                {
                    "op": "$merge",
                    "value": "{{metadata}}"
                },
                {
                    "op": "$set",
                    "path": "active",
                    "value": true
                }
            ],
            "on_delete": [
                {
                    "op": "$set",
                    "path": "active",
                    "value": false
                }
            ],
            "publication_filter": {
                "term": "string",
                "field": "string",
                "boost": 0
            },
            "routes": [
                {
                    "target_table": "premium_users",
                    "where": {
                        "term": "premium",
                        "field": "tier"
                    }
                },
                {
                    "target_table": "free_users",
                    "where": {
                        "term": "free",
                        "field": "tier"
                    }
                }
            ]
        }
    ]
}'

Responses#

{
  "name": "string",
  "description": "Table for user data",
  "indexes": {},
  "shards": {},
  "schema": {
    "version": 0,
    "default_type": "string",
    "enforce_types": true,
    "document_schemas": {},
    "ttl_field": "string",
    "ttl_duration": "string",
    "dynamic_templates": [
      {
        "name": "string",
        "match": "string",
        "unmatch": "string",
        "path_match": "string",
        "path_unmatch": "string",
        "match_mapping_type": "string",
        "mapping": {
          "type": "text",
          "analyzer": "string",
          "index": true,
          "store": true,
          "include_in_all": true,
          "doc_values": true
        }
      }
    ]
  },
  "replication_sources": [
    {
      "type": "postgres",
      "dsn": "${secret:pg_dsn}",
      "postgres_table": "users",
      "key_template": "id",
      "slot_name": "string",
      "publication_name": "string",
      "on_update": [
        {
          "op": "$set",
          "path": "email",
          "value": "{{user_email}}"
        },
        {
          "op": "$set",
          "path": "score",
          "value": "{{score}}"
        },
        {
          "op": "$merge",
          "value": "{{metadata}}"
        },
        {
          "op": "$set",
          "path": "active",
          "value": true
        }
      ],
      "on_delete": [
        {
          "op": "$set",
          "path": "active",
          "value": false
        }
      ],
      "publication_filter": {
        "term": "string",
        "field": "string",
        "boost": 0
      },
      "routes": [
        {
          "target_table": "premium_users",
          "where": {
            "term": "premium",
            "field": "tier"
          }
        },
        {
          "target_table": "free_users",
          "where": {
            "term": "free",
            "field": "tier"
          }
        }
      ]
    }
  ]
}

Drop a table#

DELETE/tables/{tableName}

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Code Examples#

curl -X DELETE "/api/v1/tables/{tableName}" \
    -H "Authorization: Bearer YOUR_API_KEY"

Responses#

No response body

Get table details#

GET/tables/{tableName}

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Code Examples#

curl -X GET "/api/v1/tables/{tableName}" \
    -H "Authorization: Bearer YOUR_API_KEY"

Responses#

{
  "name": "string",
  "description": "Table for user data",
  "indexes": {},
  "shards": {},
  "schema": {
    "version": 0,
    "default_type": "string",
    "enforce_types": true,
    "document_schemas": {},
    "ttl_field": "string",
    "ttl_duration": "string",
    "dynamic_templates": [
      {
        "name": "string",
        "match": "string",
        "unmatch": "string",
        "path_match": "string",
        "path_unmatch": "string",
        "match_mapping_type": "string",
        "mapping": {
          "type": "text",
          "analyzer": "string",
          "index": true,
          "store": true,
          "include_in_all": true,
          "doc_values": true
        }
      }
    ]
  },
  "replication_sources": [
    {
      "type": "postgres",
      "dsn": "${secret:pg_dsn}",
      "postgres_table": "users",
      "key_template": "id",
      "slot_name": "string",
      "publication_name": "string",
      "on_update": [
        {
          "op": "$set",
          "path": "email",
          "value": "{{user_email}}"
        },
        {
          "op": "$set",
          "path": "score",
          "value": "{{score}}"
        },
        {
          "op": "$merge",
          "value": "{{metadata}}"
        },
        {
          "op": "$set",
          "path": "active",
          "value": true
        }
      ],
      "on_delete": [
        {
          "op": "$set",
          "path": "active",
          "value": false
        }
      ],
      "publication_filter": {
        "term": "string",
        "field": "string",
        "boost": 0
      },
      "routes": [
        {
          "target_table": "premium_users",
          "where": {
            "term": "premium",
            "field": "tier"
          }
        },
        {
          "target_table": "free_users",
          "where": {
            "term": "free",
            "field": "tier"
          }
        }
      ]
    }
  ]
}

Update a table's schema#

PUT/tables/{tableName}/schema

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "version": 0,
    "default_type": "string",
    "enforce_types": true,
    "document_schemas": {},
    "ttl_field": "string",
    "ttl_duration": "string",
    "dynamic_templates": [
        {
            "name": "string",
            "match": "string",
            "unmatch": "string",
            "path_match": "string",
            "path_unmatch": "string",
            "match_mapping_type": "string",
            "mapping": {
                "type": "text",
                "analyzer": "string",
                "index": true,
                "store": true,
                "include_in_all": true,
                "doc_values": true
            }
        }
    ]
}

Code Examples#

curl -X PUT "/api/v1/tables/{tableName}/schema" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "version": 0,
    "default_type": "string",
    "enforce_types": true,
    "document_schemas": {},
    "ttl_field": "string",
    "ttl_duration": "string",
    "dynamic_templates": [
        {
            "name": "string",
            "match": "string",
            "unmatch": "string",
            "path_match": "string",
            "path_unmatch": "string",
            "match_mapping_type": "string",
            "mapping": {
                "type": "text",
                "analyzer": "string",
                "index": true,
                "store": true,
                "include_in_all": true,
                "doc_values": true
            }
        }
    ]
}'

Responses#

{
  "name": "string",
  "description": "Table for user data",
  "indexes": {},
  "shards": {},
  "schema": {
    "version": 0,
    "default_type": "string",
    "enforce_types": true,
    "document_schemas": {},
    "ttl_field": "string",
    "ttl_duration": "string",
    "dynamic_templates": [
      {
        "name": "string",
        "match": "string",
        "unmatch": "string",
        "path_match": "string",
        "path_unmatch": "string",
        "match_mapping_type": "string",
        "mapping": {
          "type": "text",
          "analyzer": "string",
          "index": true,
          "store": true,
          "include_in_all": true,
          "doc_values": true
        }
      }
    ]
  },
  "replication_sources": [
    {
      "type": "postgres",
      "dsn": "${secret:pg_dsn}",
      "postgres_table": "users",
      "key_template": "id",
      "slot_name": "string",
      "publication_name": "string",
      "on_update": [
        {
          "op": "$set",
          "path": "email",
          "value": "{{user_email}}"
        },
        {
          "op": "$set",
          "path": "score",
          "value": "{{score}}"
        },
        {
          "op": "$merge",
          "value": "{{metadata}}"
        },
        {
          "op": "$set",
          "path": "active",
          "value": true
        }
      ],
      "on_delete": [
        {
          "op": "$set",
          "path": "active",
          "value": false
        }
      ],
      "publication_filter": {
        "term": "string",
        "field": "string",
        "boost": 0
      },
      "routes": [
        {
          "target_table": "premium_users",
          "where": {
            "term": "premium",
            "field": "tier"
          }
        },
        {
          "target_table": "free_users",
          "where": {
            "term": "free",
            "field": "tier"
          }
        }
      ]
    }
  ]
}