Welcome to the Antfly Operator documentation. The Antfly Operator is a Kubernetes operator for deploying and managing Antfly database clusters with built-in high availability, autoscaling, and operational simplicity.

Container Image: ghcr.io/antflydb/antfly-operator:latest

SectionDescription
InstallationInstall the operator in your cluster
QuickstartDeploy your first cluster in 5 minutes
ConceptsUnderstand the architecture

Cloud Platform Guides#

PlatformDescription
AWS EKSDeploy on Amazon EKS with Spot Instances
GCP GKEDeploy on GKE Autopilot with Spot Pods
Generic KubernetesDeploy on any Kubernetes cluster

Operations#

TopicDescription
Backup & RestoreSchedule backups and restore data
AutoscalingConfigure automatic scaling
MonitoringHealth checks and observability
Pod SchedulingTaints, tolerations, affinities, and workload placement
StoragePVC retention, volume expansion, and storage lifecycle

Security#

TopicDescription
RBACRole-based access control
Secrets ManagementManage credentials securely
Service MeshIstio, Linkerd integration

Reference#

ResourceDescription
AntflyCluster APIComplete CRD reference
AntflyBackup APIBackup CRD reference
AntflyRestore APIRestore CRD reference
ExamplesExample configurations

Troubleshooting#

See the Troubleshooting Guide for common issues and solutions.

Key Features#

  • High Availability: Raft-based consensus for metadata nodes ensures data consistency
  • Autoscaling: Automatic scaling of data nodes based on CPU and memory metrics
  • Cloud-Native: Native support for GKE Autopilot and AWS EKS
  • Cost Optimization: Spot Pod/Instance support for up to 90% cost savings
  • Backup & Restore: Scheduled backups to S3/GCS with point-in-time recovery
  • Service Mesh: Optional Istio/Linkerd integration for mTLS
  • Observability: Built-in health checks and Prometheus metrics

Architecture Overview#

The Antfly Operator manages two types of nodes:

                    ┌─────────────────────────────────────────┐
                    │           AntflyCluster                 │
                    └─────────────────────────────────────────┘
                    ┌──────────────────┴──────────────────┐
                    ▼                                      ▼
        ┌───────────────────┐                  ┌───────────────────┐
        │  Metadata Nodes   │                  │    Data Nodes     │
        │   (StatefulSet)   │                  │   (StatefulSet)   │
        ├───────────────────┤                  ├───────────────────┤
        │ • Raft consensus  │                  │ • Data storage    │
        │ • Cluster coord.  │◄────────────────►│ • Replication     │
        │ • Public API      │                  │ • Autoscalable    │
        │ • Fixed replicas  │                  │ • Spot-compatible │
        └───────────────────┘                  └───────────────────┘

Metadata Nodes: Handle cluster coordination via Raft consensus. Fixed replica count (typically 3 or 5) for quorum stability.

Data Nodes: Store and replicate data. Support autoscaling and Spot Pods/Instances for cost optimization.

Requirements#

  • Kubernetes 1.20+
  • kubectl configured for your cluster
  • Storage class with dynamic provisioning
  • (Optional) metrics-server for autoscaling
  • (Optional) Service mesh for mTLS

Getting Help#

Contributing#

For development setup and contribution guidelines, see DEVELOPMENT.md.