Configure pod placement for Antfly clusters using node selectors, affinities, tolerations, and topology spread constraints.
Overview
Proper pod scheduling is critical for running Antfly in production:
- High Availability: Spread pods across failure domains (zones, nodes) to survive outages
- Performance Isolation: Dedicate nodes to Antfly to avoid noisy-neighbor issues
- Cost Optimization: Run data nodes on Spot instances while keeping metadata nodes on stable, on-demand capacity
- Compliance: Pin workloads to specific regions or node types
Antfly's two node types have different scheduling needs. Metadata nodes run Raft consensus and require stability — even brief disruptions can cause leader elections and temporary unavailability. Data nodes are replicated and tolerate disruption better, making them candidates for Spot capacity and more aggressive spreading.
Scheduling Concepts
| Mechanism | What It Does | CRD Field |
|---|---|---|
| Node Selectors | Hard constraint: pods only run on nodes with matching labels | nodeSelector |
| Tolerations | Allow pods to schedule on tainted nodes | tolerations |
| Node Affinity | Soft or hard preference for nodes with specific labels | affinity.nodeAffinity |
| Pod Anti-Affinity | Spread pods away from each other | affinity.podAntiAffinity |
| Topology Spread | Even distribution across zones or nodes | topologySpreadConstraints |
| Taints | Node-side: repel pods that lack matching tolerations | Infrastructure-level (node pool config) |
CRD Scheduling Fields
Both metadataNodes and dataNodes support these scheduling fields:
spec:
metadataNodes:
tolerations: [] # []corev1.Toleration
nodeSelector: {} # map[string]string
affinity: {} # corev1.Affinity
topologySpreadConstraints: [] # []corev1.TopologySpreadConstraint
dataNodes:
tolerations: []
nodeSelector: {}
affinity: {}
topologySpreadConstraints: []All fields use standard Kubernetes types. The operator applies user-specified values first, then merges cloud-provider-specific scheduling on top (see How Cloud-Provider Scheduling Composes).
nodeSelector is rejected by the validating webhook when spec.gke.autopilot: true. GKE Autopilot manages scheduling via compute classes — use spec.gke.autopilotComputeClass instead.
Common Scenarios
Dedicated Node Pools with Taints
Isolate Antfly on dedicated nodes by tainting a node pool and adding matching tolerations and selectors in the CRD.
Step 1: Create and taint the node pool
EKS (eksctl):
eksctl create nodegroup \
--cluster my-cluster \
--name antfly-pool \
--node-type m6i.2xlarge \
--nodes 3 \
--node-labels workload=antfly \
--node-taints dedicated=antfly:NoScheduleGKE Standard:
gcloud container node-pools create antfly-pool \
--cluster my-cluster \
--machine-type e2-standard-8 \
--num-nodes 3 \
--node-labels workload=antfly \
--node-taints dedicated=antfly:NoScheduleGeneric Kubernetes:
kubectl label nodes node-1 node-2 node-3 workload=antfly
kubectl taint nodes node-1 node-2 node-3 dedicated=antfly:NoScheduleStep 2: Configure the CRD
apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
name: my-cluster
spec:
metadataNodes:
replicas: 3
nodeSelector:
workload: antfly
tolerations:
- key: dedicated
operator: Equal
value: antfly
effect: NoSchedule
dataNodes:
replicas: 3
nodeSelector:
workload: antfly
tolerations:
- key: dedicated
operator: Equal
value: antfly
effect: NoScheduleZone-Aware Scheduling
Default behavior: New AntflyCluster deployments automatically get a soft zone topology spread constraint (whenUnsatisfiable: ScheduleAnyway, maxSkew: 1, topologyKey: topology.kubernetes.io/zone) applied to both metadata and data StatefulSets. This ensures pods are distributed across AZs when possible, without blocking scheduling on single-zone or imbalanced clusters.
The default spread is skipped when:
- You specify explicit
topologySpreadConstraintsin the CRD (your constraints take precedence) - GKE Autopilot is enabled (Autopilot manages topology internally)
To override the default with a hard zone spread, specify explicit constraints:
apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
name: my-cluster
spec:
metadataNodes:
replicas: 3
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: antfly-database
app.kubernetes.io/component: metadata
dataNodes:
replicas: 6
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: antfly-database
app.kubernetes.io/component: dataStorageClass requirement: For zone-aware scheduling with persistent volumes, use a StorageClass with volumeBindingMode: WaitForFirstConsumer so that PVs are provisioned in the same zone as the pod. Using Immediate binding can cause PVs to be provisioned in the wrong AZ.
Cross-cloud StorageClass reference:
| Provider | Recommended StorageClass | volumeBindingMode | Notes |
|---|---|---|---|
| EKS < 1.30 | gp3 (custom) or default gp2 | WaitForFirstConsumer | Must use ebs.csi.aws.com provisioner for gp3 |
| EKS >= 1.30 | gp3 (custom, must create) | WaitForFirstConsumer | No default StorageClass on EKS 1.30+ |
| GKE Standard | standard-rwo or premium-rwo | WaitForFirstConsumer | Default standard uses Immediate |
| GKE Autopilot | standard-rwo (default) | WaitForFirstConsumer | Autopilot handles topology internally |
| AKS < 1.29 | managed-csi | WaitForFirstConsumer | LRS disks are AZ-bound |
| AKS >= 1.29 | managed-csi (default) | WaitForFirstConsumer | Multi-zone auto-uses ZRS |
| Generic | Must verify | Must be WaitForFirstConsumer | Check with kubectl get sc <name> -o yaml |
Pod Anti-Affinity (One Pod per Node)
Prevent multiple pods of the same type from landing on the same node:
apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
name: my-cluster
spec:
metadataNodes:
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: antfly
app.kubernetes.io/component: metadata
topologyKey: kubernetes.io/hostname
dataNodes:
replicas: 3
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/name: antfly
app.kubernetes.io/component: data
topologyKey: kubernetes.io/hostnameThis example uses a hard requirement for metadata nodes (Raft consensus requires distinct failure domains) and a soft preference for data nodes (so pods can still schedule when node count is limited).
Combining with EKS Spot Instances
When EKS Spot is enabled (spec.eks.enabled: true, spec.eks.spot.dataNodes: true), the operator automatically adds Spot tolerations and node selectors to data node pods. Your user-specified scheduling fields compose with these.
CRD input:
apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
name: my-cluster
spec:
eks:
enabled: true
spot:
dataNodes: true
dataNodes:
replicas: 5
nodeSelector:
workload: antfly
tolerations:
- key: dedicated
operator: Equal
value: antfly
effect: NoSchedule
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app.kubernetes.io/name: antfly
app.kubernetes.io/component: dataResulting pod template (after operator merges):
# nodeSelector (user + EKS Spot merged)
nodeSelector:
workload: antfly # from CRD
eks.amazonaws.com/capacityType: "SPOT" # added by operator
# tolerations (user + EKS Spot appended)
tolerations:
- key: dedicated # from CRD
operator: Equal
value: antfly
effect: NoSchedule
- key: eks.amazonaws.com/spot # added by operator
operator: Exists
effect: NoSchedule
# topologySpreadConstraints (passed through)
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
...Karpenter (EKS)
If you use Karpenter for node provisioning on EKS, create a NodePool that provisions nodes matching your CRD scheduling constraints:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: antfly
spec:
template:
metadata:
labels:
workload: antfly
spec:
taints:
- key: dedicated
value: antfly
effect: NoSchedule
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m6i.2xlarge", "m6i.4xlarge", "m7i.2xlarge", "m7i.4xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
limits:
cpu: "128"
memory: 512GiFor data nodes on Spot, create a separate NodePool:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: antfly-spot
spec:
template:
metadata:
labels:
workload: antfly-data
spec:
taints:
- key: dedicated
value: antfly
effect: NoSchedule
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m6i.2xlarge", "m6i.4xlarge", "m7i.2xlarge", "m7i.4xlarge", "c6i.2xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
limits:
cpu: "256"
memory: 1Ti
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 60sHow Cloud-Provider Scheduling Composes
The operator applies scheduling in two phases:
- User constraints from the CRD are applied via
applySchedulingConstraints - Cloud-provider values are merged on top by
applyEKSPodSpecorapplyGKEPodSpec
| Field | Merge Strategy | Detail |
|---|---|---|
tolerations | Appended | Cloud-provider tolerations are appended to user list (duplicates avoided) |
nodeSelector | Key-merged | Cloud-provider keys are added to user map; duplicate keys are overwritten by cloud-provider |
affinity (preferred terms) | Appended | Cloud-provider preferred scheduling terms are appended to user list |
affinity (required terms) | User wins | If user specifies required node affinity, it takes precedence |
affinity (pod affinity/anti-affinity) | User wins | User-specified pod affinity/anti-affinity is preserved as-is |
topologySpreadConstraints | Appended | Cloud-provider constraints are appended to user list |
What each cloud provider adds automatically:
| Provider Mode | nodeSelector | Tolerations | Affinity | Other |
|---|---|---|---|---|
| EKS + Spot | eks.amazonaws.com/capacityType: SPOT | Spot toleration | Instance type preference (weight 100) | terminationGracePeriod: 25s |
| GKE Standard + Spot | cloud.google.com/gke-spot: "true" | — | — | terminationGracePeriod: 15s |
| GKE Autopilot | nodeSelector cleared | — | — | Compute class annotation; terminationGracePeriod: 15s |
Best Practices
Metadata Nodes vs. Data Nodes
| Consideration | Metadata Nodes | Data Nodes |
|---|---|---|
| Spot/Preemptible | Never — Raft leader election disrupts the cluster | Safe with 3+ replicas and PDBs |
| Dedicated nodes | Recommended for production | Optional; depends on workload isolation needs |
| Instance types | Memory-optimized (Raft state is in-memory) | Storage-optimized or general-purpose |
| Anti-affinity | Hard (required) — one per node | Soft (preferred) — allows co-location when needed |
| Zone spread | Required for HA (3 zones for 3 replicas) | Recommended; use ScheduleAnyway to avoid unschedulable pods |
General Recommendations
- Always enable PodDisruptionBudgets for production clusters to protect against voluntary disruptions during upgrades and scaling
- Use soft topology spread (
whenUnsatisfiable: ScheduleAnyway) for data nodes to avoid pods stuck in Pending when zone capacity is limited - Set resource requests and limits on all pods to prevent noisy-neighbor effects and enable accurate bin-packing
- Use
WaitForFirstConsumerstorage class when combining zone-aware scheduling with persistent volumes - Test scheduling in staging before production — use
kubectl describe podto verify placement matches expectations
Troubleshooting
Pods Stuck in Pending
Check the pod events for scheduling failure reasons:
kubectl describe pod my-cluster-metadata-0Common messages and solutions:
| Event Message | Cause | Solution |
|---|---|---|
0/N nodes are available: N node(s) had untolerated taint | Missing toleration | Add matching toleration to CRD |
0/N nodes are available: N node(s) didn't match Pod's node affinity/selector | No nodes match nodeSelector or affinity | Verify node labels match CRD selectors |
0/N nodes are available: N node(s) didn't satisfy topology spread constraint | Cannot satisfy maxSkew with DoNotSchedule | Switch to ScheduleAnyway or add nodes in under-represented zones |
0/N nodes are available: N too many pods | Node is full | Add more nodes or reduce pod resource requests |
Verifying Applied Scheduling
Inspect the StatefulSet to confirm the operator applied your scheduling fields:
# Check nodeSelector
kubectl get statefulset my-cluster-metadata -o jsonpath='{.spec.template.spec.nodeSelector}' | jq
# Check tolerations
kubectl get statefulset my-cluster-metadata -o jsonpath='{.spec.template.spec.tolerations}' | jq
# Check affinity
kubectl get statefulset my-cluster-metadata -o jsonpath='{.spec.template.spec.affinity}' | jq
# Check topology spread constraints
kubectl get statefulset my-cluster-metadata -o jsonpath='{.spec.template.spec.topologySpreadConstraints}' | jqGKE Autopilot nodeSelector Error
If you see a webhook validation error like:
spec.metadataNodes.nodeSelector conflicts with spec.gke.autopilot=trueRemove nodeSelector from your CRD and use spec.gke.autopilotComputeClass to control scheduling on GKE Autopilot.
See Also
- AWS EKS: EKS deployment guide with Spot Instances and IRSA
- GCP GKE: GKE deployment guide with Autopilot and Spot Pods
- Generic Kubernetes: Deployment on any Kubernetes distribution
- Autoscaling: Automatic scaling of data nodes
- AntflyCluster API Reference: Complete CRD field reference