Termite

Local ML inference for AI search

Like Ollama, but for all AI models. Embeddings. Chunking. Reranking.

terminal

Why Termite

Replace the toolchain

Running local AI means stitching together Ollama, Python pipelines, API keys, and custom code. Termite replaces the entire ML toolchain with a single binary.

Ollama

LLM generation

Self-hosted

OpenAI API

Embedding generation

API key

Cohere API

Reranking service

API key

Unstructured

Document chunking

Self-hosted

spaCy

Named entity recognition

Custom code

HuggingFace

Zero-shot classification

Custom code

Tesseract OCR

Document reading

Self-hosted

Custom T5

Text rewriting

Custom code

vLLM / TGI

Production serving

Self-hosted

Each tool requires a separate runtime, config, and deployment pipeline.

Services

Languages

API Keys

Why Termite?

Fast, flexible ML inference with production-grade reliability

Cost

83x cheaper at scale

Fixed infrastructure cost vs. per-token API changes. Cheap for prototypes, and even cheaper at scale.

Privacy

Data stays on-premises

Your data never leaves your infrastructure. No API keys or external dependencies required.

Quality

+28% search relevance

Semantic chunking and cross-encoder reranking improve retrieval accuracy beyond hybrid search alone.

Latency

Sub-millisecond cache hits

Local inference plus intelligent caching eliminates network round-trips. Singleflight deduplication for concurrent requests.

Reliability

No rate limits or outages

Self-hosted means no API quotas, no external service dependencies, no surprise downtime.

Flexibility

Mix providers per operation

Use Termite for chunking, OpenAI for embeddings, local reranking. Swap models without code changes.

Quickstart

Up and running in under a minute

terminalbash

1# Install Termite
2brew install antflydb/antfly/termite
3
4# Pull embedding and reranker models
5termite pull bge-small-en-v1.5
6termite pull mxbai-rerank-base-v1
7
8# Start serving
9termite serve
10
11# Test with curl
12curl http://localhost:11435/api/embed \
13  -d '{"model": "bge-small-en-v1.5", "input": "Hello, world!"}'

Full Getting Started Guide