Complete API reference for the AntflyCluster custom resource.

Overview#

apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  # ... spec fields
status:
  # ... status fields (read-only)

Spec#

Top-Level Fields#

FieldTypeRequiredDescription
imagestringYesContainer image for Antfly (e.g., ghcr.io/antflydb/antfly:latest)
imagePullPolicystringNoImage pull policy (Always, IfNotPresent, Never)
metadataNodesMetadataNodesSpecYesMetadata node configuration
dataNodesDataNodesSpecYesData node configuration
storageStorageSpecYesStorage configuration
configstringYesAntfly configuration (JSON)
gkeGKESpecNoGKE-specific configuration
eksEKSSpecNoAWS EKS-specific configuration
serviceMeshServiceMeshSpecNoService mesh configuration
publicAPIPublicAPIConfigNoPublic API service configuration
serviceAccountNamestringNoKubernetes ServiceAccount for pods

MetadataNodesSpec#

Configuration for metadata nodes (Raft consensus, API coordination).

FieldTypeRequiredDefaultDescription
replicasint32No3Number of metadata nodes
resourcesResourceSpecYes-Resource requirements
metadataAPIAPISpecYes-Metadata API configuration
metadataRaftAPISpecYes-Metadata Raft configuration
healthAPISpecNoport: 4200Health check endpoint
useSpotPodsboolNofalseUse GKE Spot Pods (standard GKE only)
envFrom[]EnvFromSourceNo-Environment variables from secrets/configmaps
tolerations[]TolerationNo-Pod scheduling tolerations
nodeSelectormap[string]stringNo-Node selector labels for scheduling
affinity*AffinityNo-Pod affinity/anti-affinity rules
topologySpreadConstraints[]TopologySpreadConstraintNo-Pod topology spread constraints

Notes:

  • replicas should be an odd number (3 or 5) for Raft quorum
  • useSpotPods must be false when spec.gke.autopilot=true
  • nodeSelector must not be set when spec.gke.autopilot=true (Autopilot uses compute classes)
  • Scheduling fields (tolerations, nodeSelector, affinity, topologySpreadConstraints) are merged with cloud-provider-specific values (e.g., EKS Spot tolerations)
  • Metadata nodes are never autoscaled

DataNodesSpec#

Configuration for data nodes (data storage, replication).

FieldTypeRequiredDefaultDescription
replicasint32No3Initial number of data nodes
autoScalingAutoScalingSpecNo-Autoscaling configuration
resourcesResourceSpecYes-Resource requirements
apiAPISpecYes-Data API configuration
raftAPISpecYes-Data Raft configuration
healthAPISpecNoport: 4200Health check endpoint
useSpotPodsboolNofalseUse GKE Spot Pods (standard GKE only)
envFrom[]EnvFromSourceNo-Environment variables from secrets/configmaps
tolerations[]TolerationNo-Pod scheduling tolerations
nodeSelectormap[string]stringNo-Node selector labels for scheduling
affinity*AffinityNo-Pod affinity/anti-affinity rules
topologySpreadConstraints[]TopologySpreadConstraintNo-Pod topology spread constraints

APISpec#

Port and host configuration.

FieldTypeRequiredDefaultDescription
portint32NovariesPort number
hoststringNo0.0.0.0Host to bind to

Default Ports:

  • Metadata API: 12377
  • Metadata Raft: 9017
  • Data API: 12380
  • Data Raft: 9021
  • Health: 4200

ResourceSpec#

Resource requirements and limits.

FieldTypeRequiredDescription
cpustringNoCPU request (e.g., "500m")
memorystringNoMemory request (e.g., "512Mi")
limitsResourceLimitsYesResource limits

ResourceLimits#

FieldTypeRequiredDescription
cpustringNoCPU limit (e.g., "1000m")
memorystringNoMemory limit (e.g., "1Gi")

AutoScalingSpec#

Autoscaling configuration for data nodes.

FieldTypeRequiredDefaultDescription
enabledboolYes-Enable autoscaling
minReplicasint32Yes-Minimum replicas
maxReplicasint32Yes-Maximum replicas
targetCPUUtilizationPercentage*int32No-Target CPU utilization
targetMemoryUtilizationPercentage*int32No-Target memory utilization
scaleUpCooldown*durationNo60sCooldown between scale-up operations
scaleDownCooldown*durationNo300sCooldown between scale-down operations

Notes:

  • At least one of targetCPUUtilizationPercentage or targetMemoryUtilizationPercentage must be set
  • Requires metrics-server and resource requests on pods

StorageSpec#

Storage configuration.

FieldTypeRequiredDescription
storageClassstringNoStorage class name
metadataStoragestringNoStorage size for metadata nodes (e.g., "1Gi")
dataStoragestringNoStorage size for data nodes (e.g., "10Gi")

GKESpec#

GKE-specific configuration.

FieldTypeRequiredDefaultDescription
autopilotboolNofalseEnable GKE Autopilot optimizations
autopilotComputeClassstringNo"Balanced"Autopilot compute class
podDisruptionBudgetPodDisruptionBudgetSpecNo-PDB configuration

Valid autopilotComputeClass values:

  • Accelerator - GPU/TPU workloads
  • Balanced - General-purpose (default)
  • Performance - CPU/memory intensive
  • Scale-Out - Distributed workloads
  • autopilot - Default Autopilot behavior
  • autopilot-spot - Spot Pods

Immutable fields: autopilot and autopilotComputeClass cannot be changed after creation.

EKSSpec#

AWS EKS-specific configuration.

FieldTypeRequiredDefaultDescription
enabledboolNofalseEnable EKS optimizations
useSpotInstancesboolNofalseUse Spot Instances for data nodes
instanceTypes[]stringNo-Preferred EC2 instance types
irsaRoleARNstringNo-IAM role ARN for IRSA
ebsVolumeTypestringNogp3EBS volume type
ebsEncryptedboolNofalseEnable EBS encryption
ebsKmsKeyIdstringNo-KMS key for encryption
ebsIOPs*int32No-Provisioned IOPS (io1/io2 only)
ebsThroughput*int32No-Throughput in MiB/s (gp3 only)
podDisruptionBudgetPodDisruptionBudgetSpecNo-PDB configuration

Valid ebsVolumeType values: gp3, gp2, io1, io2, st1, sc1

IRSA ARN format: arn:aws(-cn|-us-gov)?:iam::\d{12}:role/.+

Immutable fields: enabled cannot be changed after creation.

PodDisruptionBudgetSpec#

FieldTypeRequiredDefaultDescription
enabledboolYes-Enable PDB creation
maxUnavailable*int32No1Max unavailable pods
minAvailable*int32No-Min available pods

Specify either maxUnavailable or minAvailable, not both.

ServiceMeshSpec#

Service mesh configuration.

FieldTypeRequiredDefaultDescription
enabledboolNofalseEnable service mesh integration
annotationsmap[string]stringNo-Mesh-specific annotations

PublicAPIConfig#

Public API service configuration.

FieldTypeRequiredDefaultDescription
enabled*boolNofalseCreate public API service
serviceType*ServiceTypeNoLoadBalancerService type
portint32No80Service port
nodePort*int32No-Node port (NodePort type only)

Valid serviceType values: ClusterIP, NodePort, LoadBalancer

Status#

The status section is read-only and managed by the operator.

Top-Level Status Fields#

FieldTypeDescription
phasestringCluster phase (Pending, Running, Degraded, Failed)
conditions[]ConditionCurrent conditions
metadataNodesReadyint32Ready metadata node count
dataNodesReadyint32Ready data node count
autoScalingStatusAutoScalingStatusAutoscaling state
serviceMeshStatusServiceMeshStatusService mesh state

Conditions#

TypeDescription
ConfigurationValidConfiguration validation status
SecretsReadyReferenced secrets availability
ServiceMeshReadyService mesh sidecar injection status

AutoScalingStatus#

FieldTypeDescription
currentReplicasint32Current replica count
desiredReplicasint32Desired replica count
lastScaleTime*TimeLast scaling timestamp
lastScaleDirectionstring"up", "down", or ""
currentCPUUtilizationPercentage*int32Current CPU utilization
currentMemoryUtilizationPercentage*int32Current memory utilization

ServiceMeshStatus#

FieldTypeDescription
enabledboolService mesh enabled
sidecarInjectionStatusstring"Complete", "Partial", "None", "Unknown"
podsWithSidecarsint32Pods with sidecars
totalPodsint32Total expected pods
lastTransitionTime*TimeLast status change

Validation Rules#

The operator validates configurations via admission webhook:

GKE Validation#

  • autopilotComputeClass must be valid enum value
  • useSpotPods=true conflicts with autopilot=true
  • autopilotComputeClass requires autopilot=true
  • Accelerator compute class requires GPU resources
  • autopilot and autopilotComputeClass are immutable

EKS Validation#

  • irsaRoleARN must match AWS ARN format
  • ebsVolumeType must be valid enum value
  • ebsIOPs only valid for io1/io2
  • ebsThroughput only valid for gp3 (125-1000)
  • ebsKmsKeyId requires ebsEncrypted=true
  • Cannot enable both GKE and EKS
  • enabled is immutable

Scheduling Validation#

  • nodeSelector conflicts with gke.autopilot=true (Autopilot manages scheduling via compute classes)
  • tolerations, affinity, and topologySpreadConstraints are allowed with all cloud providers

General Validation#

  • Metadata and data replicas must be > 0
  • publicAPI.nodePort only valid for NodePort service type

Example#

Complete example with all fields:

apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
  name: production-cluster
  namespace: production
spec:
  image: ghcr.io/antflydb/antfly:latest
  imagePullPolicy: IfNotPresent
  serviceAccountName: antfly-workload-sa

  metadataNodes:
    replicas: 3
    metadataAPI:
      port: 12377
    metadataRaft:
      port: 9017
    health:
      port: 4200
    resources:
      cpu: "500m"
      memory: "512Mi"
      limits:
        cpu: "1000m"
        memory: "1Gi"
    envFrom:
      - secretRef:
          name: backup-credentials
    tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "antfly"
        effect: "NoSchedule"
    nodeSelector:
      node-pool: "antfly-metadata"

  dataNodes:
    replicas: 5
    api:
      port: 12380
    raft:
      port: 9021
    health:
      port: 4200
    resources:
      cpu: "1000m"
      memory: "2Gi"
      limits:
        cpu: "2000m"
        memory: "4Gi"
    autoScaling:
      enabled: true
      minReplicas: 5
      maxReplicas: 20
      targetCPUUtilizationPercentage: 70
      targetMemoryUtilizationPercentage: 80
      scaleUpCooldown: 60s
      scaleDownCooldown: 300s
    envFrom:
      - secretRef:
          name: backup-credentials
    tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "antfly"
        effect: "NoSchedule"
    nodeSelector:
      node-pool: "antfly-data"
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: antfly-database
            app.kubernetes.io/component: data

  storage:
    storageClass: "premium-rwo"
    metadataStorage: "5Gi"
    dataStorage: "50Gi"

  gke:
    autopilot: true
    autopilotComputeClass: "Balanced"
    podDisruptionBudget:
      enabled: true
      maxUnavailable: 1

  serviceMesh:
    enabled: true
    annotations:
      sidecar.istio.io/inject: "true"
      traffic.sidecar.istio.io/excludeOutboundPorts: "9017,9021"

  publicAPI:
    enabled: true
    serviceType: LoadBalancer
    port: 80

  config: |
    {
      "log": {
        "level": "info",
        "style": "json"
      },
      "enable_metrics": true,
      "replication_factor": 3
    }

See Also#