Termite

Local ML inference for AI search

Like Ollama, but for all AI models. Embeddings. Chunking. Reranking.

terminal

Why Termite

Replace the toolchain

Running local AI means stitching together Ollama, Python pipelines, API keys, and custom code. Termite replaces the entire ML toolchain with a single binary.

Ollama
LLM generation
Self-hosted
OpenAI API
Embedding generation
API key
Cohere API
Reranking service
API key
Unstructured
Document chunking
Self-hosted
spaCy
Named entity recognition
Custom code
HuggingFace
Zero-shot classification
Custom code
Tesseract OCR
Document reading
Self-hosted
Custom T5
Text rewriting
Custom code
vLLM / TGI
Production serving
Self-hosted

Each tool requires a separate runtime, config, and deployment pipeline.

9
Services
3+
Languages
2+
API Keys

Why Termite?

Fast, flexible ML inference with production-grade reliability

Cost

83x cheaper at scale

Fixed infrastructure cost vs. per-token API changes. Cheap for prototypes, and even cheaper at scale.

Privacy

Data stays on-premises

Your data never leaves your infrastructure. No API keys or external dependencies required.

Quality

+28% search relevance

Semantic chunking and cross-encoder reranking improve retrieval accuracy beyond hybrid search alone.

Latency

Sub-millisecond cache hits

Local inference plus intelligent caching eliminates network round-trips. Singleflight deduplication for concurrent requests.

Reliability

No rate limits or outages

Self-hosted means no API quotas, no external service dependencies, no surprise downtime.

Flexibility

Mix providers per operation

Use Termite for chunking, OpenAI for embeddings, local reranking. Swap models without code changes.

Quickstart

Up and running in under a minute

terminalbash
1# Install Termite
2brew install antflydb/antfly/termite
3
4# Pull embedding and reranker models
5termite pull bge-small-en-v1.5
6termite pull mxbai-rerank-base-v1
7
8# Start serving
9termite serve
10
11# Test with curl
12curl http://localhost:11435/api/embed \
13  -d '{"model": "bge-small-en-v1.5", "input": "Hello, world!"}'