Data Operations#

Perform data operations including batch inserts/deletes, linear merge sync, backup/restore, and individual key lookups.

Batch Operations#

Efficiently insert or delete multiple documents in a single request. The batch endpoint accepts both inserts (map of key-value pairs) and deletes (array of keys).

Distributed Transactions#

Antfly provides atomic cross-shard write transactions using a coordinator-based 2-phase commit (2PC) protocol. When a batch operation spans multiple shards, Antfly automatically executes it as a distributed transaction.

How it works:

  1. Metadata server allocates HLC timestamp and selects coordinator shard
  2. Coordinator writes transaction record, participants write intents
  3. After all intents succeed, coordinator commits transaction
  4. Participants are notified asynchronously to resolve intents
  5. Recovery loop ensures notifications complete even after coordinator failure

Features:

  • Automatic: No special API required - just use the batch endpoint
  • Atomic: All writes across all shards commit or abort together
  • Recoverable: Coordinator failures are handled via recovery loops
  • Efficient: ~20ms latency for cross-shard transactions

Performance:

  • Single-shard batches: < 5ms latency
  • Cross-shard transactions: ~20ms latency
  • Intent resolution: < 30 seconds worst-case (via recovery loop)

Guarantees:

  • All writes succeed or all fail (atomicity)
  • Coordinator failure is recoverable (new leader resumes notifications)
  • Idempotent resolution (duplicate notifications are safe)

Use Cases:

  • Updating related records across shards (e.g., user profile + preferences)
  • Multi-table inserts that must succeed together
  • Bulk imports requiring all-or-nothing semantics

Linear Merge (Data Sync)#

Synchronize and keep Antfly in sync with external data sources like Shopify, Postgres, S3, or any sorted record source. Also known as: data synchronization, database sync, incremental sync, e-commerce sync.

Both source and Antfly must be sorted by the same key. Performs three-way merge: inserts new records, updates changed records, deletes records absent from source.

How it works:

  1. Query external source with pagination (sorted by key)
  2. Send sorted page to linear merge endpoint
  3. Antfly merges: upserts present records, deletes Antfly records absent from page
  4. Repeat for next page - no sync state required between pages

Use Cases:

  • Postgres/MySQL: SELECT * FROM products ORDER BY id LIMIT 1000 OFFSET 0

    • Sync production DB to Antfly for hybrid search
    • Run periodically (hourly/daily) to stay in sync
  • Shopify API: GET /admin/api/2024-01/products.json?order=id&limit=250

    • Sync e-commerce catalog with cursor pagination
    • Antfly becomes searchable product index
  • S3 Data Lake: Sorted JSON files (data/0001.json, data/0002.json, ...)

    • Batch import from data warehouse exports
    • Process files in order, send contents page by page
  • Databricks: SELECT * FROM delta_table ORDER BY key LIMIT 10000

    • Sync data warehouse tables to Antfly
    • Enable low-latency search over warehouse data

Benefits:

  • Stateless: No cursors or checkpoints - restart from any page
  • Idempotent: Safe to re-run if interrupted
  • Efficient: Stream comparison, no random access needed

WARNING: Not safe for concurrent merges with overlapping ranges. Single-client sync API only.

Backup and Restore#

Create backups to various storage backends:

  • file:///path/to/backup - Local filesystem
  • s3://bucket/path - Amazon S3

Restore operations rebuild tables from backup snapshots.

Key Lookups#

Direct key-value lookups for retrieving individual documents by their unique key.

Common questions about this section
  • How do distributed transactions work in Antfly?
  • How do I sync data from Postgres or Shopify?
  • What's the difference between batch operations and linear merge?
  • How do I backup and restore a table?

Cross-table batch operations#

POST/batch

Perform batch inserts, deletes, and transforms across multiple tables in a single atomic transaction.

All operations across all tables are committed atomically using distributed 2-phase commit (2PC). Either all operations succeed, or none do.

Use cases:

  • Transfer records between tables (insert in one, delete from another)
  • Maintain referential integrity across tables
  • Atomic multi-table updates

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "tables": {
        "users": {
            "inserts": {
                "user:123": {
                    "name": "John Doe",
                    "email": "john@example.com"
                }
            }
        },
        "orders": {
            "inserts": {
                "order:456": {
                    "user_id": "user:123",
                    "total": 99.99
                }
            }
        }
    },
    "sync_level": "propose"
}

Code Examples#

curl -X POST "/api/v1/batch" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "tables": {
        "users": {
            "inserts": {
                "user:123": {
                    "name": "John Doe",
                    "email": "john@example.com"
                }
            }
        },
        "orders": {
            "inserts": {
                "order:456": {
                    "user_id": "user:123",
                    "total": 99.99
                }
            }
        }
    },
    "sync_level": "propose"
}'

Responses#

{
  "tables": {}
}

Commit an OCC transaction#

POST/transactions/commit

Commit a stateless OCC (Optimistic Concurrency Control) transaction.

Workflow:

  1. Read documents using regular lookup endpoints, capturing the X-Antfly-Version response header for each read
  2. Compute writes locally based on the read values
  3. Submit this commit request with the read set (keys + versions) and the write set (batch operations per table)

The server validates that all read versions still match current state. If any version has changed, the transaction is aborted with a 409 Conflict response containing details about which key conflicted.

If all versions match, writes are executed atomically via 2PC.

No server-side state: There is no "begin transaction" endpoint. The client manages its own read set.

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "read_set": [
        {
            "table": "string",
            "key": "string",
            "version": "string"
        }
    ],
    "tables": {},
    "sync_level": "propose"
}

Code Examples#

curl -X POST "/api/v1/transactions/commit" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "read_set": [
        {
            "table": "string",
            "key": "string",
            "version": "string"
        }
    ],
    "tables": {},
    "sync_level": "propose"
}'

Responses#

{
  "status": "committed",
  "conflict": {
    "table": "string",
    "key": "string",
    "message": "string"
  },
  "tables": {}
}

Perform batch inserts and deletes on a table#

POST/tables/{tableName}/batch

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "inserts": {
        "user:123": {
            "name": "John Doe",
            "email": "john@example.com",
            "age": 30,
            "tags": [
                "customer",
                "premium"
            ]
        },
        "user:456": {
            "name": "Jane Smith",
            "email": "jane@example.com",
            "age": 25,
            "tags": [
                "customer"
            ]
        }
    },
    "deletes": [
        "user:789",
        "user:old_account"
    ]
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}/batch" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "inserts": {
        "user:123": {
            "name": "John Doe",
            "email": "john@example.com",
            "age": 30,
            "tags": [
                "customer",
                "premium"
            ]
        },
        "user:456": {
            "name": "Jane Smith",
            "email": "jane@example.com",
            "age": 25,
            "tags": [
                "customer"
            ]
        }
    },
    "deletes": [
        "user:789",
        "user:old_account"
    ]
}'

Responses#

{
  "inserted": 0,
  "deleted": 0,
  "transformed": 0
}

Synchronize data from external sources (Shopify, Postgres, S3) using a linear merge#

POST/tables/{tableName}/merge

Synchronize and keep Antfly in sync with external data sources like Shopify, Postgres, S3, or any sorted record source. Also known as: data synchronization, database sync, incremental sync, e-commerce sync.

Both source and destination must be sorted by the same key. Performs three-way merge:

  • Inserts new records from source
  • Updates changed records
  • Deletes Antfly records absent from source page

Stateless & Idempotent: No sync state between pages. Safe to restart from any page if interrupted.

Use Cases: Sync production databases, e-commerce APIs (Shopify, WooCommerce), data lake exports, or warehouse tables to Antfly for low-latency hybrid search.

WARNING: Not safe for concurrent merges with overlapping ranges. Single-client sync API only.

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "records": {
        "product:001": {
            "name": "Laptop",
            "price": 999.99
        },
        "product:002": {
            "name": "Mouse",
            "price": 29.99
        },
        "product:003": {
            "name": "Keyboard",
            "price": 79.99
        }
    },
    "last_merged_id": "product:003",
    "dry_run": false,
    "sync_level": "propose"
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}/merge" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "records": {
        "product:001": {
            "name": "Laptop",
            "price": 999.99
        },
        "product:002": {
            "name": "Mouse",
            "price": 29.99
        },
        "product:003": {
            "name": "Keyboard",
            "price": 79.99
        }
    },
    "last_merged_id": "product:003",
    "dry_run": false,
    "sync_level": "propose"
}'

Responses#

{
  "status": "success",
  "upserted": 0,
  "skipped": 0,
  "deleted": 0,
  "deleted_ids": [
    "string"
  ],
  "failed": [
    {
      "id": "string",
      "operation": "upsert",
      "error": "string"
    }
  ],
  "next_cursor": "string",
  "key_range": {
    "from": "string",
    "to": "string"
  },
  "keys_scanned": 0,
  "message": "string",
  "took": 0
}

Backup a table#

POST/tables/{tableName}/backup

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "backup_id": "backup-2025-01-15-v2",
    "location": "s3://mybucket/antfly-backups/users-table/2025-01-15"
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}/backup" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "backup_id": "backup-2025-01-15-v2",
    "location": "s3://mybucket/antfly-backups/users-table/2025-01-15"
}'

Responses#

{
  "backup": "successful"
}

Restore a table from backup#

POST/tables/{tableName}/restore

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "backup_id": "backup-2025-01-15-v2",
    "location": "s3://mybucket/antfly-backups/users-table/2025-01-15"
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}/restore" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "backup_id": "backup-2025-01-15-v2",
    "location": "s3://mybucket/antfly-backups/users-table/2025-01-15"
}'

Responses#

{
  "restore": "triggered"
}

Scan keys in a table within a key range#

POST/tables/{tableName}/lookup

Scans keys in a table within an optional key range and returns them as newline-delimited JSON (NDJSON). Each line contains a JSON object with the key and optionally projected document fields. This is useful for iterating through all keys in a table or a subset of keys within a range.

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Request Body#

Example:

{
    "from": "user:100",
    "to": "user:200",
    "inclusive_from": true,
    "exclusive_to": true,
    "fields": [
        "title",
        "author",
        "metadata.tags"
    ],
    "filter_query": {
        "term": "string",
        "field": "string",
        "boost": 0
    },
    "limit": 100
}

Code Examples#

curl -X POST "/api/v1/tables/{tableName}/lookup" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "from": "user:100",
    "to": "user:200",
    "inclusive_from": true,
    "exclusive_to": true,
    "fields": [
        "title",
        "author",
        "metadata.tags"
    ],
    "filter_query": {
        "term": "string",
        "field": "string",
        "boost": 0
    },
    "limit": 100
}'

Responses#

No response body

Lookup a key in a table#

GET/tables/{tableName}/lookup/{key}

Security#

Provide your bearer token in the Authorization header when making requests to protected resources.

Example: Authorization: Bearer YOUR_API_KEY

Parameters#

NameTypeLocationRequiredDescription
fieldsstringqueryNoComma-separated list of fields to include in the response.
If not specified, returns the full document. Supports:
  • Simple fields: "title,author"
  • Nested paths: "user.address.city"
  • Wildcards: "_chunks.*"
  • Exclusions: "-_chunks.*._embedding"
  • Special fields: "_embeddings,_summaries,_chunks" |

Code Examples#

curl -X GET "/api/v1/tables/{tableName}/lookup/{key}?fields=title,author,metadata.tags" \
    -H "Authorization: Bearer YOUR_API_KEY"

Responses#

{}