Termite Documentation
Termite is a local ML inference server for ONNX-based models. It provides an Ollama-compatible API for embeddings, chunking, reranking, and more.
What is Termite?
Termite provides local ML inference with an Ollama-compatible API:
- Embedding Generation - Text and multimodal (CLIP) embedding models
- Text Chunking - Semantic chunking with ONNX models or fixed-size fallback
- Reranking - Relevance re-scoring for search results
- Named Entity Recognition - Extract persons, organizations, locations from text
- Text Rewriting - Transform text using Seq2Seq models
When to Use Termite
Termite can run standalone or as part of an Antfly cluster:
- Local ONNX model inference without external API dependencies
- Ollama-compatible
/api/embedendpoint for embeddings - Semantic text chunking for RAG pipelines
- Relevance reranking for improved search quality
- Centralized model serving across distributed nodes
- Privacy-preserving ML inference (data never leaves your infrastructure)
Quick Links
- Getting Started - Install and run your first models
- API Reference - Complete API documentation
- Models - Browse available models
- Downloads - Download Termite
Guides
Learn how to use specific Termite features:
- Embedding Models - Generate vector embeddings
- Reranking - Improve search relevance
- Chunking - Split text into optimal segments
Deployment
- Kubernetes Operator - Deploy Termite on Kubernetes with autoscaling