Welcome to the Antfly Operator documentation. The Antfly Operator is a Kubernetes operator for deploying and managing Antfly database clusters with built-in high availability, autoscaling, and operational simplicity.
Container Image: ghcr.io/antflydb/antfly-operator:latest
Quick Links
| Section | Description |
|---|---|
| Installation | Install the operator in your cluster |
| Quickstart | Deploy your first cluster in 5 minutes |
| Concepts | Understand the architecture |
Cloud Platform Guides
| Platform | Description |
|---|---|
| AWS EKS | Deploy on Amazon EKS with Spot Instances |
| GCP GKE | Deploy on GKE Autopilot with Spot Pods |
| Generic Kubernetes | Deploy on any Kubernetes cluster |
Operations
| Topic | Description |
|---|---|
| Backup & Restore | Schedule backups and restore data |
| Autoscaling | Configure automatic scaling |
| Monitoring | Health checks and observability |
| Pod Scheduling | Taints, tolerations, affinities, and workload placement |
| Storage | PVC retention, volume expansion, and storage lifecycle |
Security
| Topic | Description |
|---|---|
| RBAC | Role-based access control |
| Secrets Management | Manage credentials securely |
| Service Mesh | Istio, Linkerd integration |
Reference
| Resource | Description |
|---|---|
| AntflyCluster API | Complete CRD reference |
| AntflyBackup API | Backup CRD reference |
| AntflyRestore API | Restore CRD reference |
| Examples | Example configurations |
Troubleshooting
See the Troubleshooting Guide for common issues and solutions.
Key Features
- High Availability: Raft-based consensus for metadata nodes ensures data consistency
- Autoscaling: Automatic scaling of data nodes based on CPU and memory metrics
- Cloud-Native: Native support for GKE Autopilot and AWS EKS
- Cost Optimization: Spot Pod/Instance support for up to 90% cost savings
- Backup & Restore: Scheduled backups to S3/GCS with point-in-time recovery
- Service Mesh: Optional Istio/Linkerd integration for mTLS
- Observability: Built-in health checks and Prometheus metrics
Architecture Overview
The Antfly Operator manages two types of nodes:
┌─────────────────────────────────────────┐
│ AntflyCluster │
└─────────────────────────────────────────┘
│
┌──────────────────┴──────────────────┐
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ Metadata Nodes │ │ Data Nodes │
│ (StatefulSet) │ │ (StatefulSet) │
├───────────────────┤ ├───────────────────┤
│ • Raft consensus │ │ • Data storage │
│ • Cluster coord. │◄────────────────►│ • Replication │
│ • Public API │ │ • Autoscalable │
│ • Fixed replicas │ │ • Spot-compatible │
└───────────────────┘ └───────────────────┘Metadata Nodes: Handle cluster coordination via Raft consensus. Fixed replica count (typically 3 or 5) for quorum stability.
Data Nodes: Store and replicate data. Support autoscaling and Spot Pods/Instances for cost optimization.
Requirements
- Kubernetes 1.20+
- kubectl configured for your cluster
- Storage class with dynamic provisioning
- (Optional) metrics-server for autoscaling
- (Optional) Service mesh for mTLS
Getting Help
- GitHub Issues: antflydb/antfly/issues
- Documentation: You're reading it!
Contributing
For development setup and contribution guidelines, see DEVELOPMENT.md.