Termite Documentation#

Termite is a local ML inference server for ONNX-based models. It provides an Ollama-compatible API for embeddings, chunking, reranking, and more.

What is Termite?#

Termite provides local ML inference with an Ollama-compatible API:

  • Embedding Generation - Text and multimodal (CLIP) embedding models
  • Text Chunking - Semantic chunking with ONNX models or fixed-size fallback
  • Reranking - Relevance re-scoring for search results
  • Named Entity Recognition - Extract persons, organizations, locations from text
  • Text Rewriting - Transform text using Seq2Seq models

When to Use Termite#

Termite can run standalone or as part of an Antfly cluster:

  • Local ONNX model inference without external API dependencies
  • Ollama-compatible /api/embed endpoint for embeddings
  • Semantic text chunking for RAG pipelines
  • Relevance reranking for improved search quality
  • Centralized model serving across distributed nodes
  • Privacy-preserving ML inference (data never leaves your infrastructure)

Guides#

Learn how to use specific Termite features:

Deployment#