Complete API reference for the AntflyRestore custom resource.

Overview#

AntflyRestore defines on-demand restore operations for Antfly clusters. The operator creates a Kubernetes Job to execute the restore.

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-operation
  namespace: default
spec:
  # ... spec fields
status:
  # ... status fields (read-only)

Spec#

Top-Level Fields#

FieldTypeRequiredDefaultDescription
clusterRefClusterReferenceYes-Target cluster for restore
sourceRestoreSourceYes-Backup source location
tables[]stringNoallSpecific tables to restore
restoreModeRestoreModeNofail_if_existsBehavior when tables exist
restoreTimeout*durationNo2hMaximum restore duration
backoffLimit*int32No3Retry attempts before failure

ClusterReference#

FieldTypeRequiredDefaultDescription
namestringYes-Name of the target AntflyCluster
namespacestringNosame as restoreNamespace of the AntflyCluster

RestoreSource#

FieldTypeRequiredDescription
backupIdstringYesBackup identifier to restore
locationstringYesBackup URL (s3:// or file://)
credentialsSecret*SecretReferenceNoSecret with storage credentials

Location format:

  • S3: s3://bucket-name/path/to/backups
  • GCS (via S3 API): s3://bucket-name/path with endpoint in credentials
  • Local: file:///path/to/backups (testing only)

SecretReference#

FieldTypeRequiredDescription
namestringYesName of the Secret

Expected Secret keys:

KeyRequiredDescription
AWS_ACCESS_KEY_IDYesAccess key
AWS_SECRET_ACCESS_KEYYesSecret key
AWS_REGIONNoAWS region
AWS_ENDPOINT_URLNoCustom S3 endpoint

RestoreMode#

ModeDescription
fail_if_existsAbort if any target table exists (default, safest)
skip_if_existsSkip existing tables, restore others
overwriteDrop and recreate existing tables (destructive)

Status#

Top-Level Status Fields#

FieldTypeDescription
phaseRestorePhaseCurrent phase
startTime*TimeWhen restore started
completionTime*TimeWhen restore finished
tables[]TableRestoreStatusPer-table status
messagestringStatus message
conditions[]ConditionCurrent conditions
jobNamestringName of executing Job

RestorePhase#

PhaseDescription
PendingRestore has not started
RunningRestore is in progress
CompletedRestore completed successfully
FailedRestore failed

TableRestoreStatus#

FieldTypeDescription
namestringTable name
statusstring"Pending", "Restoring", "Completed", "Failed", "Skipped"
errorstringError message if failed

Conditions#

TypeDescription
JobReadyRestore Job is created and ready
ClusterReadyTarget cluster exists and is ready

Condition reasons:

  • JobCreated - Job created successfully
  • JobRunning - Job is running
  • JobCompleted - Job completed successfully
  • JobFailed - Job failed
  • ClusterNotFound - Target cluster not found
  • InvalidSource - Backup source is invalid

Validation Rules#

  • source.location must start with s3:// or file://
  • source.backupId must be non-empty
  • clusterRef.name must reference an existing AntflyCluster
  • restoreMode must be valid enum value
  • backoffLimit must be >= 0

Examples#

Basic Restore#

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-latest
  namespace: production
spec:
  clusterRef:
    name: my-cluster
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-bucket/antfly-backups
    credentialsSecret:
      name: backup-credentials

Restore Specific Tables#

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-users-table
  namespace: production
spec:
  clusterRef:
    name: my-cluster
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-bucket/antfly-backups
    credentialsSecret:
      name: backup-credentials
  tables:
    - users
  restoreMode: overwrite  # Replace existing table

Restore with Skip Mode#

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-missing-tables
  namespace: production
spec:
  clusterRef:
    name: my-cluster
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-bucket/antfly-backups
    credentialsSecret:
      name: backup-credentials
  restoreMode: skip_if_exists  # Only restore tables that don't exist

Cross-Namespace Restore#

Restore to a different namespace (e.g., staging from production backup):

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-to-staging
  namespace: staging
spec:
  clusterRef:
    name: staging-cluster
    namespace: staging
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-bucket/production-backups
    credentialsSecret:
      name: backup-credentials
  restoreMode: overwrite

GCS Restore#

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-from-gcs
  namespace: production
spec:
  clusterRef:
    name: my-cluster
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-gcs-bucket/antfly-backups
    credentialsSecret:
      name: gcs-hmac-credentials
---
apiVersion: v1
kind: Secret
metadata:
  name: gcs-hmac-credentials
  namespace: production
stringData:
  AWS_ACCESS_KEY_ID: "GOOGABC123DEF456"
  AWS_SECRET_ACCESS_KEY: "your-hmac-secret"
  AWS_ENDPOINT_URL: "https://storage.googleapis.com"
  AWS_REGION: "auto"

Long-Running Restore#

For large datasets:

apiVersion: antfly.io/v1
kind: AntflyRestore
metadata:
  name: restore-large-dataset
  namespace: production
spec:
  clusterRef:
    name: my-cluster
  source:
    backupId: "backup-20250115-020000"
    location: s3://my-bucket/large-backups
    credentialsSecret:
      name: backup-credentials
  restoreTimeout: 8h  # Extended timeout
  backoffLimit: 5      # More retries

Managing Restores#

List Restores#

kubectl get antflyrestore -n production

View Restore Status#

kubectl get antflyrestore restore-latest -n production -o yaml

Check Restore Progress#

# View phase
kubectl get antflyrestore restore-latest -n production \
  -o jsonpath='{.status.phase}'

# View per-table status
kubectl get antflyrestore restore-latest -n production \
  -o jsonpath='{.status.tables}' | jq

View Restore Job#

# Get job name
kubectl get antflyrestore restore-latest -n production \
  -o jsonpath='{.status.jobName}'

# View job
kubectl get job <job-name> -n production

# View job logs
kubectl logs -l job-name=<job-name> -n production

Cancel Restore#

Delete the AntflyRestore resource to cancel:

kubectl delete antflyrestore restore-latest -n production

This will also delete the associated Job.

Re-run Failed Restore#

Delete and recreate the restore:

# Delete failed restore
kubectl delete antflyrestore restore-latest -n production

# Recreate
kubectl apply -f restore.yaml

Troubleshooting#

Restore Stuck in Pending#

# Check conditions
kubectl get antflyrestore restore-latest -o jsonpath='{.status.conditions}'

# Check if cluster exists
kubectl get antflycluster my-cluster -n production

Restore Failed#

# View error message
kubectl get antflyrestore restore-latest -o jsonpath='{.status.message}'

# Check job logs
kubectl logs -l job-name=<job-name> -n production

# Check per-table errors
kubectl get antflyrestore restore-latest -o jsonpath='{.status.tables}' | jq '.[] | select(.status=="Failed")'

Table Already Exists#

If using fail_if_exists mode and tables exist:

  1. Use skip_if_exists to restore other tables
  2. Use overwrite to replace existing tables (destructive)
  3. Manually drop tables before restore

See Also#