Getting Started

Learn how to install Termite, pull models, and start serving ML inference for embeddings, chunking, and reranking.

Ollama-compatible /api/embed endpoint
Multi-model concurrent serving
Automatic model discovery from ./models/
INT8 and FP16 quantization support
CPU, GPU (CUDA), and TPU acceleration
Built-in request caching and deduplication
1

Install Termite

Download and install Termite on your system.

bash
1# macOS (Homebrew)
2brew install antflydb/antfly/termite
3
4# Linux (download binary)
5curl -fsSL https://antfly.io/install-termite.sh | sh
6
7# Docker
8docker pull antflydb/termite:latest
2

Pull Models

Download the ML models you need for your pipeline.

bash
1# Pull an embedding model
2termite pull bge-small-en-v1.5
3
4# Pull a reranker model
5termite pull mxbai-rerank-base-v1
6
7# Pull a chunker model
8termite pull jina-segmenter-v1
9
10# List installed models
11termite list
3

Start the Server

Run Termite to start serving inference requests.

bash
1# Start Termite server
2termite serve
3
4# Or specify a port
5termite serve --port 11435
6
7# Run in background
8termite serve --daemon
4

Use with Antfly

Connect Termite to Antfly for automatic embeddings and reranking.

bash
1# When running Antfly, Termite is embedded by default
2antfly swarm
3
4# Or connect to a standalone Termite instance
5antfly swarm --termite-addr localhost:11435
5

Use the API

Or use the REST API directly for standalone inference.

bash
1# Generate embeddings
2curl http://localhost:11435/api/embed \
3  -H "Content-Type: application/json" \
4  -d '{
5    "model": "bge-small-en-v1.5",
6    "input": ["Hello, world!", "How are you?"]
7  }'
8
9# Rerank documents
10curl http://localhost:11435/api/rerank \
11  -H "Content-Type: application/json" \
12  -d '{
13    "model": "mxbai-rerank-base-v1",
14    "query": "What is machine learning?",
15    "documents": [
16      "Machine learning is a subset of AI.",
17      "The weather is nice today.",
18      "Deep learning uses neural networks."
19    ]
20  }'