StatefulSets Overview
StatefulSets manage stateful applications that need stable identities
Unlike Deployments where pods are interchangeable, StatefulSets give each pod a persistent hostname, ordered deployment, and its own storage.
Stable Identity
Each pod gets a persistent hostname that survives rescheduling
Ordered Operations
Pods are created, scaled, and deleted in a predictable order
Persistent Storage
Each pod can have its own persistent volume that survives pod restarts
When to Use StatefulSets
| Use Case | Example | Key Requirement |
|---|---|---|
| Databases | MySQL, PostgreSQL, MongoDB | Data persistence, ordered startup |
| Message Queues | Kafka, RabbitMQ | Stable network identity for brokers |
| Distributed Systems | Elasticsearch, Cassandra | Cluster coordination, data sharding |
| Stateful Services | ZooKeeper, etcd | Leader election, consensus |
Creating a StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-db
spec:
serviceName: postgres-service
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:14
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: myapp
- name: POSTGRES_USER
value: admin
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi
Headless Service Required
StatefulSets require a Headless Service (ClusterIP: None) to manage network identities. Each pod gets a DNS name: <pod-name>.<service-name>.<namespace>.svc.cluster.local
Ordered Deployment and Scaling
StatefulSet pods are created sequentially: postgres-0, then postgres-1, then postgres-2. Each pod must be Running and Ready before the next one is created.
spec:
podManagementPolicy: OrderedReady # Default: sequential
# OR
podManagementPolicy: Parallel # All pods start simultaneously
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2 # Only update pods with ordinal >= 2
Watch Out for Ordering Dependencies
With OrderedReady policy, if pod-1 fails to start, pods 2, 3, 4 and beyond will not be created until pod-1 is healthy.
Database Workloads
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
spec:
clusterIP: None
selector:
app: mysql
ports:
- port: 3306
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql-headless
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: mysql:8.0
command:
- bash
- "-c"
- |
set -ex
# Generate mysql server-id from pod ordinal
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
cp /mnt/config-map/master.cnf /mnt/conf.d/
cp /mnt/config-map/slave.cnf /mnt/conf.d/
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- containerPort: 3306
volumeMounts:
- name: data
mountPath: /var/lib/mysql
- name: conf
mountPath: /etc/mysql/conf.d
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
Database Best Practices
- Use Init Containers: Configure database replicas, set server IDs, and prepare configuration files before the main container starts.
- Implement Health Checks: Use readiness and liveness probes specific to your database to ensure proper health monitoring.
- Backup Strategy: Implement regular backups using CronJobs or dedicated backup operators.
DaemonSets Overview
DaemonSets ensure that all (or some) nodes run a copy of a pod. Perfect for node-level services like log collectors, monitoring agents, and network plugins.
Node Coverage
Automatically deploys to all nodes in the cluster
Auto-Scaling
Adds pods when new nodes join the cluster
Node Selection
Target specific nodes with selectors and tolerations
Common DaemonSet Use Cases
Log Collection
Fluentd, Logstash, Filebeat running on every node to collect logs
Monitoring
Node exporters, Datadog agents, New Relic agents for metrics
Network
CNI plugins like Calico, Weave, Flannel for pod networking
Storage
Storage drivers and CSI plugins for volume provisioning
Creating a DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
tolerations:
# Allow this pod to be scheduled on master nodes
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
env:
- name: FLUENT_ELASTICSEARCH_HOST
value: "elasticsearch.logging"
- name: FLUENT_ELASTICSEARCH_PORT
value: "9200"
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
How It Works
This Fluentd DaemonSet collects logs from all nodes and forwards them to Elasticsearch. It mounts host directories to access container logs.
Node Selection
# Deploy only to nodes with SSD storage
spec:
template:
spec:
nodeSelector:
disktype: ssd
# OR use node affinity for more complex rules
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.large
- m5.xlarge
Update Strategy
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Update one node at a time
maxSurge: 0 # Don't create extra pods during update
# For immediate updates (not recommended for production)
updateStrategy:
type: OnDelete # Pods updated only when manually deleted
Jobs Overview
Jobs create one or more pods and ensure they run to successful completion. Perfect for batch processing, data migrations, and one-time tasks.
Single Task
Run once and complete successfully
Parallel Processing
Multiple pods working together on a workload
Completion Tracking
Guaranteed execution to success with retry logic
Job Patterns
| Pattern | Completions | Parallelism | Use Case |
|---|---|---|---|
| Single Job | 1 | 1 | Database migration |
| Fixed Completion Count | N | 1 to N | Process N items |
| Work Queue | null | N | Process until queue empty |
| Indexed Job | N | N | Parallel array processing |
Basic Job Example
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
template:
spec:
containers:
- name: migrate
image: myapp:latest
command: ["python", "migrate.py"]
env:
- name: SOURCE_DB
value: "postgresql://old-db:5432/myapp"
- name: TARGET_DB
value: "postgresql://new-db:5432/myapp"
restartPolicy: Never
backoffLimit: 4 # Retry 4 times before marking as failed
activeDeadlineSeconds: 600 # Timeout after 10 minutes
ttlSecondsAfterFinished: 86400 # Clean up after 24 hours
Parallel Processing Job
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-processing
spec:
parallelism: 5 # Run 5 pods in parallel
completions: 20 # Complete 20 successful runs total
completionMode: Indexed # Each pod gets a unique index (0-19)
template:
spec:
containers:
- name: worker
image: batch-processor:latest
env:
- name: JOB_COMPLETION_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
command:
- sh
- -c
- |
echo "Processing batch $JOB_COMPLETION_INDEX"
python process.py --partition=$JOB_COMPLETION_INDEX --total=20
restartPolicy: Never
Pro Tip
Use completionMode: Indexed for embarrassingly parallel workloads where each pod processes a different subset of data.
Handling Failures
spec:
backoffLimit: 6 # Maximum retries (default: 6)
# Exponential backoff for retries
# 10s, 20s, 40s, 80s, 160s, 320s (max)
# Pod failure policies (Kubernetes 1.25+)
podFailurePolicy:
rules:
- action: Ignore # Don't count toward backoffLimit
onExitCodes:
values: [1, 2, 3] # Ignore these exit codes
- action: FailJob # Immediately fail the job
onExitCodes:
values: [42] # Fatal error code
- action: Count # Normal counting (default)
onPodConditions:
- type: DisruptionTarget # Pod was evicted
CronJobs Overview
CronJobs create Jobs on a schedule using cron syntax. Perfect for backups, reports, maintenance tasks, and periodic data processing.
Cron Schedule Format
┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6)
│ │ │ │ │
* * * * *
Common Cron Patterns
| Expression | Description |
|---|---|
0 * * * * |
Every hour at minute 0 |
*/15 * * * * |
Every 15 minutes |
0 2 * * * |
Daily at 2:00 AM |
0 0 * * 0 |
Weekly on Sunday at midnight |
0 0 1 * * |
Monthly on the 1st at midnight |
Creating a CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
spec:
schedule: "0 2 * * *" # Daily at 2 AM
timeZone: "America/New_York" # Kubernetes 1.24+
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:14
command:
- /bin/bash
- -c
- |
DATE=$(date +%Y%m%d_%H%M%S)
pg_dump $DATABASE_URL > /backup/db_$DATE.sql
aws s3 cp /backup/db_$DATE.sql s3://my-backups/postgres/
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-credentials
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-credentials
key: secret-key
restartPolicy: OnFailure
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
concurrencyPolicy: Forbid # Don't run if previous job still running
startingDeadlineSeconds: 300 # Skip if can't start within 5 minutes
Concurrency Policies
Allow (default)
Multiple jobs can run concurrently. Use when jobs are independent.
Forbid
Skip new job if previous is still running. Prevents overlap.
Replace
Cancel current job and start new one. Latest run always wins.
Monitoring CronJobs
# List all cronjobs
kubectl get cronjobs
# View cronjob details
kubectl describe cronjob database-backup
# View job history
kubectl get jobs --selector=cronjob-name=database-backup
# Check last schedule time
kubectl get cronjob database-backup -o jsonpath='{.status.lastScheduleTime}'
# Manually trigger a cronjob
kubectl create job --from=cronjob/database-backup manual-backup-$(date +%s)
Patterns and Best Practices
Choosing the Right Workload Type
| Workload Type | Use When | Don't Use When | Example |
|---|---|---|---|
| Deployment | Stateless apps, web servers, APIs | Need stable network identity or storage | Nginx, Node.js app |
| StatefulSet | Databases, distributed systems | Stateless applications | MongoDB, Kafka |
| DaemonSet | Node-level services | Application workloads | Log collectors, monitoring |
| Job | One-time tasks, batch processing | Long-running services | Data migration, backup |
| CronJob | Scheduled recurring tasks | Event-driven tasks | Reports, cleanup |
Combined Patterns
Pattern: Backup System
Combine StatefulSet (database) + CronJob (backups) + Job (restore):
- StatefulSet runs PostgreSQL with persistent storage
- CronJob performs daily backups to S3
- Job restores from backup when needed
Pattern: Log Pipeline
Combine DaemonSet (collection) + Deployment (processing) + StatefulSet (storage):
- DaemonSet runs Fluentd on all nodes
- Deployment runs Logstash for processing
- StatefulSet runs Elasticsearch cluster
Migration Patterns
# Pattern: Blue-Green Database Migration
---
# Step 1: Deploy new database version as StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-v2
spec:
# ... new version configuration
---
# Step 2: Run migration job
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
template:
spec:
initContainers:
- name: wait-for-new-db
image: busybox
command: ['sh', '-c', 'until nc -z postgres-v2-0 5432; do sleep 1; done']
containers:
- name: migrate
image: migrate-tool:latest
command: ["./migrate.sh"]
---
# Step 3: Switch service to new version
apiVersion: v1
kind: Service
metadata:
name: postgres
spec:
selector:
app: postgres
version: v2 # Update selector
Resource Management Considerations
- StatefulSets: Reserve enough resources for all replicas
- DaemonSets: Account for one pod per node in resource planning
- Jobs: Set resource limits to prevent runaway consumption
- CronJobs: Consider overlap when setting resource requests
Monitoring and Observability
# Add Prometheus annotations for metrics
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
# Common metrics to track:
# - StatefulSet: Ready replicas, persistent volume usage
# - DaemonSet: Node coverage, resource usage per node
# - Job: Success/failure rate, duration
# - CronJob: Schedule adherence, missed runs
Troubleshooting Guide
StatefulSet Stuck
Symptom: Pods not creating in order
Solution: Check PVC binding, previous pod health
DaemonSet Not Scheduling
Symptom: Pods missing on some nodes
Solution: Check taints, tolerations, node selectors
Job Failing Repeatedly
Symptom: Backoff limit exceeded
Solution: Check logs, increase backoffLimit, fix script
CronJob Not Running
Symptom: Missed schedules
Solution: Check startingDeadlineSeconds, timezone
Debugging Commands
# Check StatefulSet rollout status
kubectl rollout status statefulset/mysql
# View DaemonSet logs from all nodes
kubectl logs -l name=fluentd-elasticsearch --all-containers
# Check Job events and status
kubectl describe job data-migration
# View recent cluster events
kubectl get events --sort-by='.lastTimestamp'
Practice Problems
Easy Create a Basic StatefulSet
Write a StatefulSet manifest for a Redis cluster with 3 replicas, each with its own 5Gi persistent volume.
You need a headless Service, a StatefulSet with volumeClaimTemplates, and Redis container configuration.
apiVersion: v1
kind: Service
metadata:
name: redis-headless
spec:
clusterIP: None
selector:
app: redis
ports:
- port: 6379
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis-headless
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7
ports:
- containerPort: 6379
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
Easy Write a DaemonSet for Node Monitoring
Create a DaemonSet that runs a Prometheus Node Exporter on every node, including control-plane nodes.
Use tolerations to allow scheduling on control-plane nodes. The Node Exporter image is prom/node-exporter:latest and listens on port 9100.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: node-exporter
image: prom/node-exporter:latest
ports:
- containerPort: 9100
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
Medium Create a Parallel Batch Processing Job
Write a Job manifest that processes 100 items in parallel using 10 worker pods. Each pod should process a different partition of the data.
Use completionMode: Indexed with completions: 100 and parallelism: 10. The JOB_COMPLETION_INDEX env var tells each pod which partition to process.
apiVersion: batch/v1
kind: Job
metadata:
name: batch-processor
spec:
completions: 100
parallelism: 10
completionMode: Indexed
template:
spec:
containers:
- name: worker
image: processor:latest
command:
- python
- process.py
- --partition=$(JOB_COMPLETION_INDEX)
- --total=100
env:
- name: JOB_COMPLETION_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
resources:
requests:
cpu: 250m
memory: 256Mi
restartPolicy: Never
backoffLimit: 10
activeDeadlineSeconds: 3600
Medium Design a CronJob with Failure Handling
Create a CronJob for database backups that runs daily at 3 AM, prevents concurrent runs, keeps history of 5 successful and 2 failed jobs, and has a 30-minute timeout.
Use concurrencyPolicy: Forbid, successfulJobsHistoryLimit, failedJobsHistoryLimit, and activeDeadlineSeconds in the jobTemplate spec.
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "0 3 * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 2
startingDeadlineSeconds: 600
jobTemplate:
spec:
activeDeadlineSeconds: 1800
backoffLimit: 3
template:
spec:
containers:
- name: backup
image: postgres:14
command: ["sh", "-c"]
args:
- |
pg_dump $DATABASE_URL | gzip > /backup/db_$(date +%Y%m%d).sql.gz
aws s3 cp /backup/ s3://backups/db/ --recursive
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
restartPolicy: OnFailure
Hard Production Checklist Implementation
Design a complete StatefulSet for a PostgreSQL primary-replica cluster with: init containers for configuration, health probes, resource limits, PDB (PodDisruptionBudget), and a companion CronJob for backups.
You need multiple resources: a headless Service, a StatefulSet with init containers and probes, a PodDisruptionBudget, and a CronJob. Use ordinal-based logic in init containers to configure primary vs replica.
# Headless Service
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
spec:
clusterIP: None
selector: { app: postgres }
ports: [{ port: 5432 }]
---
# PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
spec:
minAvailable: 2
selector:
matchLabels: { app: postgres }
---
# StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-headless
replicas: 3
selector:
matchLabels: { app: postgres }
template:
metadata:
labels: { app: postgres }
spec:
initContainers:
- name: init-config
image: postgres:14
command: ["sh", "-c"]
args:
- |
ordinal=$(hostname | grep -o '[0-9]*$')
if [ "$ordinal" = "0" ]; then
echo "primary" > /config/role
else
echo "replica" > /config/role
fi
containers:
- name: postgres
image: postgres:14
resources:
requests: { cpu: 500m, memory: 1Gi }
limits: { cpu: "2", memory: 4Gi }
livenessProbe:
exec:
command: ["pg_isready"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["pg_isready"]
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata: { name: data }
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests: { storage: 50Gi }
---
# Backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 2 * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:14
command: ["sh", "-c", "pg_dump -h postgres-0.postgres-headless $DB | gzip > /backup/$(date +%Y%m%d).sql.gz"]
restartPolicy: OnFailure
Production Readiness Checklist
- Resource Limits: Set appropriate CPU/memory requests and limits
- Health Checks: Configure liveness and readiness probes
- Persistence: Test backup and restore procedures
- Monitoring: Set up alerts for critical metrics
- Security: Use secrets, RBAC, and network policies
- Documentation: Document runbooks and recovery procedures