Local ML inference for AI search
Like Ollama, but for all AI models. Embeddings. Chunking. Reranking.
Why Termite
Running local AI means stitching together Ollama, Python pipelines, API keys, and custom code. Termite replaces the entire ML toolchain with a single binary.
Each tool requires a separate runtime, config, and deployment pipeline.
Fast, flexible ML inference with production-grade reliability
83x cheaper at scale
Fixed infrastructure cost vs. per-token API changes. Cheap for prototypes, and even cheaper at scale.
Data stays on-premises
Your data never leaves your infrastructure. No API keys or external dependencies required.
+28% search relevance
Semantic chunking and cross-encoder reranking improve retrieval accuracy beyond hybrid search alone.
Sub-millisecond cache hits
Local inference plus intelligent caching eliminates network round-trips. Singleflight deduplication for concurrent requests.
No rate limits or outages
Self-hosted means no API quotas, no external service dependencies, no surprise downtime.
Mix providers per operation
Use Termite for chunking, OpenAI for embeddings, local reranking. Swap models without code changes.
Up and running in under a minute
1# Install Termite
2brew install antflydb/antfly/termite
3
4# Pull embedding and reranker models
5termite pull bge-small-en-v1.5
6termite pull mxbai-rerank-base-v1
7
8# Start serving
9termite serve
10
11# Test with curl
12curl http://localhost:11435/api/embed \
13 -d '{"model": "bge-small-en-v1.5", "input": "Hello, world!"}'