Termite Kubernetes Operator
The Termite Operator provides Kubernetes-native management for Termite inference pools with autoscaling, GPU/TPU support, and intelligent traffic routing.
Overview
The Termite Operator automates the deployment and management of Termite inference servers on Kubernetes:
- Autoscaling - Scale inference pools based on request load
- GPU/TPU Support - Schedule workloads on accelerated hardware
- Traffic Routing - Route requests to optimal model instances
- Health Monitoring - Automatic health checks and recovery
- Model Versioning - Manage multiple model versions
Installation
Prerequisites
- Kubernetes 1.24+
- kubectl configured with cluster access
- Helm 3.0+ (optional, for Helm installation)
Install with kubectl
kubectl apply -f https://github.com/antflydb/termite/releases/latest/download/termite-operator.yamlInstall with Helm
helm repo add termite https://charts.termite.dev
helm install termite-operator termite/termite-operatorQuick Start
Create an Inference Pool
apiVersion: termite.antfly.io/v1
kind: InferencePool
metadata:
name: embeddings
namespace: default
spec:
model: BAAI/bge-small-en-v1.5
replicas: 2
resources:
limits:
memory: 4Gi
cpu: 2Apply the configuration:
kubectl apply -f inference-pool.yamlCheck Status
kubectl get inferencepool embeddingsConfiguration
InferencePool Spec
| Field | Type | Description |
|---|---|---|
model | string | Model name to serve (e.g., BAAI/bge-small-en-v1.5) |
replicas | int | Number of replicas |
resources | ResourceRequirements | CPU/memory/GPU limits |
autoscaling | AutoscalingSpec | Autoscaling configuration |
Autoscaling
Enable autoscaling based on request queue depth:
spec:
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
targetQueueDepth: 5GPU Support
Schedule on GPU nodes:
spec:
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-t4Architecture
The Termite Operator consists of:
- Controller - Watches InferencePool resources and reconciles state
- Proxy - Routes requests to available model instances
- Metrics Server - Collects metrics for autoscaling decisions
Troubleshooting
Common Issues
Pods stuck in Pending
- Check node resources with
kubectl describe node - Verify GPU drivers are installed if using GPU resources
Model not loading
- Check pod logs:
kubectl logs -l app=termite - Verify model name is correct
High latency
- Enable autoscaling to handle load spikes
- Consider using quantized models for faster inference
Next Steps
- API Reference - Use the Termite API
- Models - Browse available models
- Downloads - Install Termite locally