Launch AnnouncementMarch 2026

Unlock Your Dark Data with Antfly Swarm & Termite

Over 80% of enterprise data sits untapped -- too sensitive to send to cloud AI services, too complex for traditional databases. Antfly Swarm gives you a fully local AI data platform that runs on your laptop or VPS: your data never leaves your infrastructure, and you get production-grade search, RAG, and AI enrichment out of the box.

Antfly is the retrieval layer for unstructured and dark data. It indexes any data source and builds the retrieval system that allows humans or AI to find the right context.

Jump to Tutorial Read the Docs

The Dark Data Problem Why Local-First Matters The Multimodal Retrieval Stack What is Antfly Swarm?What is Termite?Getting Started Connecting Your Data Querying Your Dark Data Why Basic RAG Isn't Enough Building a RAG Pipeline Cloud vs. Local Comparison Use Cases Tradeoffs to Consider What's Next

The Dark Data Problem

Every organization has a massive blind spot: dark data. These are the PDFs buried in shared drives, the call recordings gathering dust, the internal wikis that nobody searches, the customer support transcripts that never get analyzed. Industry analysts estimate that over 80% of enterprise data is unstructured and untapped -- invisible to the systems that could extract value from it.

Dark data is not a storage problem — it's a representation problem. The data is already stored. What's missing is the transformation layer that converts multimodal artifacts — PDFs, slide decks, call recordings, scanned documents — into structured, semantically coherent content that machines can reason over. Without that transformation, AI systems can't find the right context, and RAG pipelines fail.

Traditional systems can index filenames, timestamps, and basic metadata. They cannot extract slide structure, interpret diagrams, segment meeting transcripts by topic, understand speaker roles, or represent semantic relationships across modalities. This is why most enterprise knowledge remains dark.

The AI revolution was supposed to fix this. Large language models can now read documents, transcribe audio, and understand images. But there's a catch: to use most AI services, you have to send your data to someone else's servers. For healthcare records, financial documents, legal contracts, and proprietary research, that's a non-starter.

The fundamental tradeoff: You can have powerful AI capabilities, or you can keep your sensitive data private. With cloud-only AI, you can't have both. Antfly Swarm eliminates this tradeoff entirely.

Why Local-First Matters

When you send data to a cloud embedding API like OpenAI's, your documents travel across the internet, get processed on third-party hardware, and -- even if the provider promises not to retain them -- you've lost cryptographic control. For many organizations, this violates compliance requirements (HIPAA, GDPR, SOC 2) and internal security policies.

A local-first approach means the entire AI pipeline runs on infrastructure you control. No API keys to manage, no usage-based billing surprises, no vendor lock-in, and most importantly -- zero data leaving your perimeter. This is what tools like Ollama pioneered for LLM inference. Antfly extends this philosophy to the entire data stack: storage, indexing, embedding, chunking, re-ranking, and search.

Solving Dark Data Requires a Retrieval Stack

Unlocking dark data isn't a single feature — it's a layered architecture. Each layer handles a different part of the problem:

Layer 1

Raw Ingestion

Connectors to S3, Google Drive, SharePoint, Slack, CRM exports, and call platforms. Support for PDFs, PPTX, Google Slides, audio, video, images, and scanned documents.

Layer 2

Modality-Specific Processing

Each content type needs different treatment: ASR and speaker diarization for audio, slide-level segmentation and diagram summarization for presentations, structural parsing and table detection for PDFs, OCR and caption generation for images. The goal isn't raw text extraction — it's structured, semantically coherent content.

Layer 3

Semantic Indexing

Processed content gets chunked strategically (context-aware, not fixed-size), embedded (text, image, and multimodal joint embeddings), enriched with metadata (source, author, department, topic clusters), and stored in a hybrid retrieval system (BM25 + vector with RRF).

Layer 4

Knowledge Layer

The unlock: semantic search across all modalities, cross-document reasoning, question answering with citations, historical decision tracing, and AI agents with enterprise memory. Instead of querying files, users query meaning. Instead of retrieving documents, AI retrieves context.

Antfly Swarm handles layers 1 through 3 out of the box — and provides the API surface to build layer 4. This is what makes it a retrieval layer, not just a database.

What is Antfly Swarm?

At its core, Antfly is a retrieval database — it indexes any data source and builds the retrieval system that allows humans or AI to find the right context. Swarm is its single-binary deployment mode, packaging the entire platform — database, search engine, model runner, and API server — into one process you can run on your laptop, a Raspberry Pi, or a small VPS. Think of it as docker compose for your AI data stack, except it's a single Go binary with zero dependencies.

Minimum Requirements

Resource	Minimum	Recommended
CPU	2 cores	4+ cores
RAM	4 GB	8+ GB
Storage	20 GB	SSD recommended
OS	Linux, macOS, Windows (any)

Swarm mode handles automatic node discovery, data sharding, and all the distributed consensus machinery under the hood. For development, you get the exact same API surface as a production multi-node cluster -- so code you write locally works identically in production.

What is Termite?

If Ollama is "Docker for LLMs," then Termite is "Docker for RAG pipeline models." Termite is Antfly's built-in model garden and local ONNX runtime for the smaller, specialized models that power retrieval -- embedding models, re-rankers, chunkers, and classifiers. These models run locally using hardware-accelerated SIMD instructions (AVX-512, NEON, SME), so your data never leaves your machine.

What Termite Runs Locally

Capability	What It Does	Example Model
Embeddings	Convert text/images to vectors	`bge-small-en-v1.5`
Chunking	Semantic document splitting	Built-in with multi-tier caching
Re-ranking	Re-score results for relevance	Cross-encoder models
Classification	Categorize and tag documents	ONNX classifier models

Why not just use Ollama? Ollama is great for running LLMs (chat, reasoning, generation). But a RAG pipeline needs more than just an LLM. You need embedding models, re-rankers, chunkers -- and these need to be tightly integrated with your database's indexing engine. Termite handles all of this inside AntflyDB, with automatic embedding lifecycle management, multi-tier caching, and hardware-accelerated inference. You can still connect to Ollama (or any LLM) for the generation step.

How It All Fits Together

Hover over each component to see how data flows through the local stack.

Getting Started

Getting Antfly Swarm running takes about 60 seconds. Download the binary for your platform and start it:

# Download the latest release
curl -fsSL https://get.antfly.io | sh

# Start Antfly in Swarm mode (single binary, zero config)
antfly swarm start

# Verify it's running
curl http://localhost:8080/health

That's it. You now have AntflyDB with Termite running locally. The REST API is available at http://localhost:8080 and is fully OpenAPI-compatible, so you can use the auto-generated SDKs for Go, TypeScript, or Python.

Install the Python SDK

pip install antfly

Connecting Your Dark Data

Antfly processes each modality differently — because a PDF is not a slide deck is not a call recording. Here's what happens under the hood when you load content:

PDFs

Structural parsing, header/section detection, table extraction, footnote linking, image-caption association

Slides (PPTX)

Slide-level segmentation, OCR on embedded images, layout parsing, table extraction, diagram summarization

Audio / Video

Automatic speech recognition, speaker diarization, topic segmentation, timestamp indexing, entity extraction

Images

OCR, caption generation, diagram interpretation, multimodal embedding for visual content

The first step is creating a table and telling AntflyDB what kind of data you'll be storing and how to index it. Here we'll create a table for internal documents with hybrid search (BM25 + vector) and local embeddings via Termite:

# Create a table with hybrid search + local embeddings
curl -X POST http://localhost:8080/tables \
  -H "Content-Type: application/json" \
  -d '{
    "name": "internal_docs",
    "indexes": [{
      "type": "aknn_v0",
      "fields": ["content"],
      "embedder": {
        "provider": "termite",
        "model": "bge-small-en-v1.5"
      }
    }, {
      "type": "full_text_v0",
      "fields": ["content", "title"]
    }]
  }'

Notice the "provider": "termite" -- this tells AntflyDB to use the local ONNX model runner for embeddings instead of calling an external API. Your documents are embedded on your machine, stored on your machine, and indexed on your machine. Nothing leaves.

Insert Documents

Now insert your sensitive documents. Antfly handles chunking, embedding, and indexing automatically in the background:

from antfly import AntflyClient

client = AntflyClient("http://localhost:8080")

# Insert documents — Antfly handles embedding automatically
client.table("internal_docs").insert([
    {
        "title": "Q4 Financial Review",
        "content": "Revenue grew 23% year-over-year...",
        "department": "finance",
        "classification": "confidential"
    },
    {
        "title": "Patient Case Study #4821",
        "content": "Patient presented with symptoms...",
        "department": "medical",
        "classification": "hipaa_protected"
    }
])

# Antfly automatically:
# 1. Chunks large documents semantically (via Termite)
# 2. Generates embeddings locally (via Termite + ONNX)
# 3. Indexes for both BM25 keyword + vector similarity search
# 4. All processing happens on YOUR machine

Querying Your Dark Data

Antfly's hybrid search combines keyword matching (BM25) with semantic understanding (vector similarity) using Reciprocal Rank Fusion. This means you get exact matches when the user types precise terms, and conceptual matches when they describe what they're looking for:

# Hybrid search — combines keyword + semantic matching
results = client.table("internal_docs").search(
    query="what was our revenue growth last quarter",
    limit=5
)

for doc in results:
    print(f"{doc['title']} (score: {doc['_score']:.3f})")
    print(f"  {doc['content'][:100]}...")

# Output:
# Q4 Financial Review (score: 0.847)
#   Revenue grew 23% year-over-year...

This query works even though we typed "revenue growth" and the document says "Revenue grew" -- the semantic embedding understands they mean the same thing. The BM25 component also boosts the result because "revenue" appears as an exact keyword match. Both signals get fused together via RRF.

Why Basic RAG Isn't Enough for Dark Data

Most RAG systems assume clean text, pre-chunked documents, and known structure. Dark data violates all of those assumptions. When you point a naive RAG pipeline at real enterprise content, it breaks in predictable ways:

Poor chunk boundaries

Fixed-size chunking splits tables mid-row, cuts sentences, and destroys context.

Loss of slide context

Individual slide text without the deck narrative is nearly meaningless.

No temporal awareness

Meeting transcripts need topic segmentation, not flat text splitting.

Diagram blindness

Architecture diagrams and flowcharts are invisible to text-only pipelines.

Broken table extraction

Tables become garbled text that embeds poorly and retrieves worse.

No cross-document linking

Related content across files is never connected or co-indexed.

A production-grade system needs hierarchical indexing, multimodal processing, cross-document linking, source-grounded citations, and continuous re-indexing. This is infrastructure, not a feature. It's why Antfly builds these capabilities into the database layer rather than leaving them to application code.

Building a Local RAG Pipeline

The real power comes when you combine Antfly's retrieval with an LLM for answering questions. Antfly has a built-in RAG endpoint that streams answers via Server-Sent Events. You can connect any LLM -- local (via Ollama) or remote:

# Fully local RAG: Antfly retrieval + Ollama generation
# Zero data leaves your machine

import requests

response = requests.post("http://localhost:8080/rag", json={
    "table": "internal_docs",
    "query": "Summarize our financial performance last quarter",
    "generator": {
        "provider": "ollama",
        "model": "llama3.2"
    },
    "retriever": {
        "limit": 5,
        "reranker": True  # Termite re-ranks for better context
    }
}, stream=True)

# Stream the answer as it's generated
for chunk in response.iter_lines():
    if chunk:
        print(chunk.decode(), end="", flush=True)

The entire pipeline is local: Your documents are stored in AntflyDB (local). Embeddings are generated by Termite (local ONNX). The re-ranker runs in Termite (local ONNX). The LLM runs in Ollama (local). At no point does any data leave your machine.

Cloud vs. Local: Side by Side

Here's how the typical cloud AI stack compares to Antfly Swarm for sensitive data workloads:

Typical Cloud AI Approach

Data leaves your control

Antfly Swarm (Local)

Everything stays on your machine

	Cloud AI Stack	Antfly Swarm
Data residency	Third-party servers	Your machine only
Vendors needed	3+ (DB, embeddings, vector store)	1 binary
Compliance	Depends on vendor DPAs	Full control -- you own the infra
Cost model	Per-token / per-query billing	Fixed infra cost
Latency	Network round-trips	Local -- sub-50ms p99
Offline capable	No	Yes
Embedding lifecycle	Manual management	Automatic (Termite)

Use Cases: Where Dark Data Lives

🏥

Healthcare

Search patient records, clinical notes, and medical literature without sending PHI to cloud services.

⚖️

Legal

Index contracts, case law, and privileged communications. Full attorney-client privilege preserved.

🏦

Finance

Analyze financial statements, transaction records, and compliance docs locally.

🔬

Research

Search proprietary research, lab notebooks, and pre-publication papers without IP leakage.

🏗️

Manufacturing

Index technical specifications, maintenance logs, and quality reports on-premise.

🏛️

Government

Process classified or sensitive government documents in air-gapped environments.

Tradeoffs to Consider

Being transparent: a local-first approach isn't the right choice for every workload. Here's what to consider:

Model size vs. accuracy

Local embedding models (like bge-small) are smaller than cloud models (like OpenAI's text-embedding-3-large). For most retrieval tasks, the difference is negligible -- but for highly specialized domains, you may want to benchmark. Termite supports swapping models easily.

Hardware requirements

Running models locally requires CPU/RAM. Swarm mode needs at least 4GB RAM. For larger datasets with re-ranking, 8-16GB is recommended. No GPU required -- Termite uses SIMD (AVX-512, NEON) for hardware acceleration.

LLM generation

For the generation step (chat, answering), local LLMs via Ollama are capable but smaller than GPT-4 or Claude. You can always use Antfly's local retrieval with a cloud LLM -- only the final query context goes to the API, not your raw documents.

Scaling beyond one machine

Swarm mode is designed for development and small-to-medium deployments. For production scale, AntflyDB supports multi-node clusters with the same API surface. Code you write in Swarm works in production without changes.