Warning: Service mesh integration is experimental. APIs and behavior may change in future releases.
The Antfly Operator provides native support for service mesh integration, enabling automatic mTLS encryption and traffic management for your database clusters.
Overview
Service mesh integration allows you to:
- Automatic mTLS encryption between all Antfly pods
- Traffic observability through service mesh telemetry
- Advanced traffic management (circuit breaking, retries, timeouts)
- Zero-trust security with automatic certificate rotation
- Network policy enforcement at the sidecar level
The operator automatically detects sidecar injection and updates cluster status accordingly.
Supported Service Meshes
The Antfly Operator is designed to work with any Kubernetes service mesh that uses sidecar injection:
| Mesh | Status | Notes |
|---|---|---|
| Istio | Recommended | Best tested integration |
| Linkerd | Supported | Lightweight option |
| Consul Connect | Supported | HashiCorp ecosystem |
Quick Start
Prerequisites
- Antfly Operator installed in your cluster
- Service mesh control plane installed (e.g., Istio, Linkerd)
- Service mesh sidecar injection configured (namespace-level or pod-level)
Enable Service Mesh on a New Cluster
apiVersion: antfly.io/v1
kind: AntflyCluster
metadata:
name: my-cluster
namespace: production
spec:
image: ghcr.io/antflydb/antfly:latest
serviceMesh:
enabled: true
annotations:
sidecar.istio.io/inject: "true"
metadataNodes:
replicas: 3
resources:
cpu: "500m"
memory: "512Mi"
dataNodes:
replicas: 3
resources:
cpu: "1000m"
memory: "2Gi"
storage:
storageClass: "standard"
metadataStorage: "1Gi"
dataStorage: "10Gi"Enable Service Mesh on Existing Cluster
Patch an existing cluster to enable service mesh:
kubectl patch antflycluster my-cluster -n production --type='merge' -p='
{
"spec": {
"serviceMesh": {
"enabled": true,
"annotations": {
"sidecar.istio.io/inject": "true"
}
}
}
}'The operator will perform a rolling restart, injecting sidecars into each pod while maintaining cluster availability.
Configuration
Spec Fields
spec:
serviceMesh:
enabled: true # Enable service mesh integration
annotations: # Mesh-specific annotations
key: valueenabled (boolean, optional, default: false)
Controls whether service mesh sidecar injection is enabled for the cluster.
annotations (map[string]string, optional)
Mesh-specific annotations to apply to pod templates. These annotations trigger sidecar injection and configure mesh behavior.
Status Fields
The operator automatically populates the following status fields:
status:
serviceMeshStatus:
enabled: true # Reflects spec.serviceMesh.enabled
sidecarInjectionStatus: "Complete" # Complete | Partial | None | Unknown
podsWithSidecars: 6 # Number of pods with sidecars
totalPods: 6 # Total number of pods
lastTransitionTime: "2025-10-04T..."
conditions:
- type: ServiceMeshReady
status: "True"
reason: SidecarInjectionComplete
message: "All 6 pods have sidecars injected"Sidecar Injection Status Values
| Status | Description |
|---|---|
Complete | All pods have sidecars injected |
Partial | Some pods have sidecars, others don't (blocks reconciliation) |
None | No pods have sidecars |
Unknown | Pod count is zero or status cannot be determined |
Mesh-Specific Configuration
Istio
spec:
serviceMesh:
enabled: true
annotations:
sidecar.istio.io/inject: "true"
# Exclude Raft ports from proxy (recommended for performance)
traffic.sidecar.istio.io/excludeOutboundPorts: "9017,9021"
# Resource limits for sidecar (optional)
sidecar.istio.io/proxyCPU: "100m"
sidecar.istio.io/proxyMemory: "128Mi"Important Ports:
| Port | Service | Recommendation |
|---|---|---|
| 12377 | Metadata API | Include in mesh |
| 9017 | Metadata Raft | Exclude from mesh |
| 12380 | Data API | Include in mesh |
| 9021 | Data Raft | Exclude from mesh |
Consider excluding Raft ports (9017, 9021) from the service mesh to reduce latency for consensus traffic.
Linkerd
spec:
serviceMesh:
enabled: true
annotations:
linkerd.io/inject: enabled
# Skip Raft ports (recommended)
config.linkerd.io/skip-outbound-ports: "9017,9021"
config.linkerd.io/skip-inbound-ports: "9017,9021"Consul Connect
spec:
serviceMesh:
enabled: true
annotations:
consul.hashicorp.com/connect-inject: "true"
consul.hashicorp.com/connect-service-upstreams: "antfly-metadata:12377,antfly-data:12380"Observability
Check Service Mesh Status
View the current service mesh status:
kubectl get antflycluster my-cluster -o jsonpath='{.status.serviceMeshStatus}' | jqCheck ServiceMeshReady Condition
kubectl get antflycluster my-cluster -o jsonpath='{.status.conditions[?(@.type=="ServiceMeshReady")]}' | jqView Operator Logs
Monitor service mesh integration events:
kubectl logs -n antfly-operator-namespace deployment/antfly-operator -f | grep -i "service mesh"View Cluster Events
Check for service mesh-related events:
kubectl get events --field-selector involvedObject.name=my-cluster -n productionPerformance Optimization
Exclude Raft Ports
Raft consensus traffic is latency-sensitive. Exclude Raft ports from the mesh:
# Istio
annotations:
traffic.sidecar.istio.io/excludeOutboundPorts: "9017,9021"
# Linkerd
annotations:
config.linkerd.io/skip-outbound-ports: "9017,9021"
config.linkerd.io/skip-inbound-ports: "9017,9021"Tune Sidecar Resources
Set appropriate resource limits for sidecars:
annotations:
sidecar.istio.io/proxyCPU: "100m"
sidecar.istio.io/proxyMemory: "128Mi"
sidecar.istio.io/proxyCPULimit: "500m"
sidecar.istio.io/proxyMemoryLimit: "512Mi"Sidecar Concurrency
Tune proxy concurrency based on workload:
annotations:
sidecar.istio.io/concurrency: "2"Security Configuration
Strict mTLS
For maximum security, use strict mTLS mode:
# Istio PeerAuthentication
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: antfly-mtls
namespace: production
spec:
selector:
matchLabels:
app: antfly
mtls:
mode: STRICTNetwork Policies
Combine service mesh with Kubernetes NetworkPolicies for defense in depth:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: antfly-mesh-only
namespace: production
spec:
podSelector:
matchLabels:
app: antfly
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: antflyTroubleshooting
Partial Sidecar Injection
Problem: The operator detects partial sidecar injection and blocks reconciliation.
Symptoms:
ServiceMeshReadycondition isFalsewith reasonPartialInjection- Operator logs show:
"Blocking reconciliation" ... "partial sidecar injection" - Kubernetes events show:
Warning PartialSidecarInjection
Solutions:
-
Check mesh control plane:
# Istio istioctl analyze -n production # Linkerd linkerd check -
Verify pod annotations:
kubectl get pods -n production -l app.kubernetes.io/name=antfly-database \ -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.annotations}{"\n"}{end}' -
Check admission webhooks:
kubectl get mutatingwebhookconfigurations | grep -i istio -
Force pod recreation:
kubectl delete pod <pod-name> -n production
Sidecars Not Injected
Problem: Service mesh is enabled but sidecars are not being injected.
Solutions:
-
Verify annotations are correct:
kubectl get antflycluster my-cluster -o yaml | grep -A 5 serviceMesh -
Check namespace labels (if using namespace-level injection):
kubectl get namespace production --show-labels -
Verify StatefulSet pod template:
kubectl get statefulset my-cluster-metadata -o jsonpath='{.spec.template.metadata.annotations}' | jq -
Test manual injection (debugging):
# Istio istioctl kube-inject -f examples/service-mesh-istio-cluster.yaml # Linkerd linkerd inject examples/service-mesh-linkerd-cluster.yaml
High Latency After Enabling Mesh
Problem: Database latency increases significantly after enabling service mesh.
Solutions:
-
Exclude Raft ports from mesh (see Performance Optimization above)
-
Tune sidecar resource limits:
annotations: sidecar.istio.io/proxyCPU: "200m" sidecar.istio.io/proxyMemory: "256Mi" -
Check mTLS overhead:
# Istio - view proxy stats istioctl proxy-config endpoint <pod-name> -n production
Rolling Restart Failures
Problem: Pods fail to restart with sidecars during rolling update.
Solutions:
-
Check resource quotas:
kubectl describe resourcequota -n production -
Verify PodDisruptionBudget (if using GKE):
kubectl get pdb -n production -
Check StatefulSet events:
kubectl describe statefulset my-cluster-metadata -n production
Best Practices
Production Deployments
-
Start with data nodes: Enable service mesh on data nodes first, verify stability, then enable for metadata nodes
-
Use resource limits: Set appropriate sidecar resource limits to prevent OOM
annotations: sidecar.istio.io/proxyCPU: "100m" sidecar.istio.io/proxyMemory: "128Mi" sidecar.istio.io/proxyCPULimit: "500m" sidecar.istio.io/proxyMemoryLimit: "512Mi" -
Exclude Raft ports: Reduce latency by excluding consensus traffic from mesh
annotations: traffic.sidecar.istio.io/excludeOutboundPorts: "9017,9021" -
Monitor during rollout: Watch cluster status during rolling restart
watch kubectl get antflycluster my-cluster -o jsonpath='{.status.serviceMeshStatus}'
Security Considerations
- mTLS mode: Use STRICT mode for maximum security
- Network policies: Combine service mesh with Kubernetes NetworkPolicies
- Certificate rotation: Service mesh handles automatic rotation - no operator action needed
Connection Pooling
Configure at service mesh level:
# Istio DestinationRule
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: antfly-connection-pool
namespace: production
spec:
host: my-cluster-metadata
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100Limitations
-
Metadata nodes: Service mesh adds latency to Raft consensus. Consider excluding Raft ports or disabling mesh on metadata nodes for latency-sensitive workloads.
-
Partial injection: The operator blocks reconciliation when partial injection is detected to prevent split-brain scenarios. Resolve the injection issue before proceeding.
-
Mesh upgrades: Upgrade the service mesh control plane independently. The operator will detect sidecar version changes but does not manage mesh upgrades.
Examples
See the examples/ directory for complete configuration examples:
examples/service-mesh-istio-cluster.yaml- Istio integrationexamples/service-mesh-linkerd-cluster.yaml- Linkerd integration
See Also
- Monitoring: Observability setup
- GKE Autopilot: GKE-specific considerations
- Troubleshooting: Common issues