Storage Architecture in Kubernetes
Kubernetes storage separates what apps need from how storage is provided
Think of it like renting an apartment: you request "2 bedrooms, 1 bathroom" (PVC) without caring whether the building uses wood or steel framing (PV). The landlord (StorageClass) handles provisioning!
Pod
Consumes storage by mounting volumes into containers at specific paths.
Volume
A directory accessible to containers in a pod. Can be ephemeral or persistent.
PersistentVolumeClaim (PVC)
A request for storage by a user. Specifies size, access mode, and storage class.
PersistentVolume (PV)
A piece of storage in the cluster provisioned by an admin or dynamically.
StorageClass
A template for dynamic PV provisioning. Defines the provisioner and parameters.
CSI
Container Storage Interface - standard API for storage plugins across orchestrators.
Storage Types at a Glance
| Storage Type | Lifecycle | Use Case | Example |
|---|---|---|---|
| Ephemeral Volume | Tied to Pod lifecycle | Temporary data, caches | emptyDir, configMap, secret |
| Persistent Volume | Independent of Pod | Databases, file storage | hostPath, NFS, AWS EBS |
| Dynamic Storage | Created on-demand | Cloud-native apps | StorageClass + PVC |
| StatefulSet Storage | Stable, unique per replica | Distributed databases | volumeClaimTemplates |
Working with Volumes
Volume Types
emptyDir
Temporary directory that shares a pod's lifetime. Created when Pod is assigned to a Node, deleted when Pod is removed. Good for scratch space and caches.
hostPath
Mounts a file or directory from the host node's filesystem. Has security risks - use carefully. Best for node-specific data.
NFS
Network File System mount. Shared across multiple pods, persistent across pod restarts. Good for shared data.
Cloud Volumes
Provider-specific storage: AWS EBS, Azure Disk, GCE Persistent Disk. Managed by the cloud provider.
ConfigMap / Secret
Mount configuration as files inside containers. Supports dynamic updates. Typically read-only access.
Downward API
Expose pod/container metadata as files: labels, annotations, resource limits and requests.
emptyDir Volume
apiVersion: v1
kind: Pod
metadata:
name: cache-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: cache-volume
mountPath: /cache
- name: cache-warmer
image: busybox
command: ['sh', '-c', 'echo "Cache warmed" > /cache/ready']
volumeMounts:
- name: cache-volume
mountPath: /cache
volumes:
- name: cache-volume
emptyDir:
sizeLimit: 1Gi # Optional size limit
medium: Memory # Optional: use RAM instead of disk
hostPath Volume
apiVersion: v1
kind: Pod
metadata:
name: hostpath-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: hostpath-volume
mountPath: /usr/share/nginx/html
volumes:
- name: hostpath-volume
hostPath:
path: /data/nginx-html
type: DirectoryOrCreate # Create if doesn't exist
# Other types: Directory, File, Socket, CharDevice, BlockDevice
Multi-Volume Pod
apiVersion: v1
kind: Pod
metadata:
name: multi-volume-pod
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: config
mountPath: /etc/config
readOnly: true
- name: secrets
mountPath: /etc/secrets
readOnly: true
- name: data
mountPath: /data
- name: cache
mountPath: /cache
- name: podinfo
mountPath: /etc/podinfo
volumes:
- name: config
configMap:
name: app-config
- name: secrets
secret:
secretName: app-secrets
defaultMode: 0400
- name: data
persistentVolumeClaim:
claimName: data-pvc
- name: cache
emptyDir:
sizeLimit: 2Gi
- name: podinfo
downwardAPI:
items:
- path: "labels"
fieldRef:
fieldPath: metadata.labels
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations
- path: "cpu_limit"
resourceFieldRef:
containerName: app
resource: limits.cpu
Volume Considerations
- emptyDir data is lost when pod is deleted
- hostPath poses security risks - avoid in production
- Cloud volumes may have zone restrictions
- Volume mounts are atomic - all or nothing
Persistent Volumes & Claims
PV/PVC Lifecycle
Persistent storage follows four stages: Provisioning (creating the storage), Binding (matching PVC to PV), Using (mounting into pods), and Reclaiming (what happens when PVC is deleted).
Creating Persistent Volumes
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs
labels:
type: nfs
environment: production
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany # RWX - many nodes can mount for read/write
# - ReadWriteOnce # RWO - single node can mount for read/write
# - ReadOnlyMany # ROX - many nodes can mount for read-only
persistentVolumeReclaimPolicy: Retain
# Retain - manual reclamation
# Recycle - basic scrub (rm -rf /volume/*)
# Delete - delete volume (AWS EBS, GCE PD, Azure Disk)
storageClassName: nfs-storage
mountOptions:
- hard
- nfsvers=4.1
nfs:
server: nfs-server.example.com
path: /exported/path
Creating a PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: nfs-storage
selector: # Optional: select specific PV
matchLabels:
environment: production
matchExpressions:
- key: type
operator: In
values: [nfs, local]
Using PVC in a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-storage
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: data-volume
mountPath: /var/lib/myapp
- name: shared-data
mountPath: /shared
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-pvc
- name: shared-data
persistentVolumeClaim:
claimName: shared-pvc
readOnly: true # Mount as read-only
Volume Expansion
# StorageClass must have allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: expandable-storage
provisioner: kubernetes.io/aws-ebs
allowVolumeExpansion: true
parameters:
type: gp2
---
# Edit PVC to request more storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: expandable-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi # Increased from 50Gi
storageClassName: expandable-storage
PV/PVC Best Practices
- Use dynamic provisioning with StorageClasses when possible
- Set appropriate reclaim policies based on data sensitivity
- Use labels and selectors for PV/PVC matching
- Monitor PV usage and set up alerts for capacity
- Test backup and restore procedures regularly
- Consider using CSI drivers for better portability
Storage Classes & Dynamic Provisioning
Why StorageClasses?
StorageClasses enable dynamic provisioning of PersistentVolumes, eliminating the need to pre-create PVs manually. Just create a PVC and the StorageClass handles the rest!
AWS EBS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iopsPerGB: "10"
fsType: ext4
encrypted: "true"
allowVolumeExpansion: true
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer # Delay binding until Pod creation
mountOptions:
- debug
- noatime
GKE StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd-regional
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: regional-pd
zones: us-central1-a,us-central1-b
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
Local Storage Class
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
---
# Local PV must be created manually
apiVersion: v1
kind: PersistentVolume
metadata:
name: local-pv
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1
CSI Drivers
# Install AWS EFS CSI Driver
# kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-92107410
directoryPerms: "700"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/dynamic_provisioning"
---
# PVC using EFS
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
spec:
accessModes:
- ReadWriteMany # EFS supports RWX
storageClassName: efs-sc
resources:
requests:
storage: 5Gi # EFS is elastic, this is for quota
Cloud Storage Comparison
| Storage Type | Access Mode | Performance | Use Case |
|---|---|---|---|
| AWS EBS | RWO | High IOPS | Databases, single-node apps |
| AWS EFS | RWX | Variable | Shared storage, CMS |
| Azure Disk | RWO | Premium SSD | High-performance workloads |
| Azure Files | RWX | Standard | File shares, legacy apps |
| GCE PD | RWO/ROX | SSD/Standard | General purpose |
| Local SSD | RWO | Ultra-high | Caching, temp processing |
Volume Binding Modes
- Immediate: PV is bound to PVC immediately upon creation
- WaitForFirstConsumer: Binding delayed until Pod using PVC is scheduled
WaitForFirstConsumer is recommended for topology-constrained storage (zones, regions).
StatefulSets & Stateful Applications
When to use StatefulSets
Use StatefulSets when your application needs stable network identities, persistent storage per replica, ordered deployment/scaling, or predictable DNS names. Common examples: databases, message queues, and distributed caches.
Stable Identity
Ordered, unique Pod names (mysql-0, mysql-1, mysql-2) that persist across rescheduling.
Stable Storage
Each replica gets its own PVC via volumeClaimTemplates. Storage survives pod restarts.
Stable Network
Predictable DNS names via headless services: pod-name.service-name.namespace.svc.cluster.local
Ordered Operations
Sequential deployment, scaling, and rolling updates. Pod N+1 waits for Pod N to be ready.
MySQL StatefulSet
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
spec:
clusterIP: None # Headless service for StatefulSet
selector:
app: mysql
ports:
- port: 3306
name: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql-headless
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
initContainers:
- name: init-mysql
image: mysql:8.0
command:
- bash
- "-c"
- |
set -ex
# Generate mysql server-id from pod ordinal index
[[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/primary.cnf /mnt/conf.d/
else
cp /mnt/config-map/replica.cnf /mnt/conf.d/
fi
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config-map
mountPath: /mnt/config-map
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: data
mountPath: /var/lib/mysql
- name: conf
mountPath: /etc/mysql/conf.d
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
volumes:
- name: conf
emptyDir: {}
- name: config-map
configMap:
name: mysql-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 100Gi
MongoDB ReplicaSet with StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
spec:
serviceName: mongodb-service
replicas: 3
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongodb
image: mongo:5.0
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip"
- "0.0.0.0"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-data
mountPath: /data/db
env:
- name: MONGO_INITDB_ROOT_USERNAME
value: admin
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongodb-secret
key: password
# Sidecar container for replica set configuration
- name: mongo-sidecar
image: cvallance/mongo-k8s-sidecar
env:
- name: MONGO_SIDECAR_POD_LABELS
value: "app=mongodb"
- name: KUBERNETES_MONGO_SERVICE_NAME
value: "mongodb-service"
- name: MONGODB_USERNAME
value: admin
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
name: mongodb-secret
key: password
- name: MONGODB_DATABASE
value: admin
volumeClaimTemplates:
- metadata:
name: mongo-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
Managing StatefulSets
# Scale StatefulSet
kubectl scale statefulset mysql --replicas=5
# Rolling update
kubectl set image statefulset/mysql mysql=mysql:8.0.30
# Delete StatefulSet (keeps PVCs)
kubectl delete statefulset mysql --cascade=orphan
# Delete PVCs
kubectl delete pvc data-mysql-0 data-mysql-1 data-mysql-2
# Get pod names (predictable)
kubectl get pods -l app=mysql
# mysql-0, mysql-1, mysql-2
# Access specific pod
kubectl exec mysql-1 -- mysql -u root -p
# DNS names for pods
# <pod-name>.<headless-service>.<namespace>.svc.cluster.local
# mysql-0.mysql-headless.default.svc.cluster.local
StatefulSet Best Practices
- Always use a headless service for network identity
- Use init containers for initialization logic
- Implement proper readiness/liveness probes
- Use podAntiAffinity for high availability
- Plan for backup and disaster recovery
- Test scaling operations thoroughly
- Monitor persistent volume usage
Practice Problems
Medium WordPress with MySQL
Deploy WordPress with MySQL using persistent storage. Create a StorageClass, deploy MySQL with a PVC, deploy WordPress connected to MySQL with its own PVC, and verify data persists across pod restarts.
You need a StorageClass, two PVCs (one for MySQL data, one for WordPress uploads), a headless Service for MySQL, and Deployments for both apps. Use environment variables to connect WordPress to MySQL.
# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
allowVolumeExpansion: true
---
# mysql-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
storageClassName: standard
---
# wordpress-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: wordpress-pvc
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
storageClassName: standard
---
# mysql-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
value: rootpass
- name: MYSQL_DATABASE
value: wordpress
- name: MYSQL_USER
value: wordpress
- name: MYSQL_PASSWORD
value: wordpresspass
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-storage
mountPath: /var/lib/mysql
volumes:
- name: mysql-storage
persistentVolumeClaim:
claimName: mysql-pvc
---
# mysql-service.yaml
apiVersion: v1
kind: Service
metadata:
name: mysql
spec:
selector:
app: mysql
ports:
- port: 3306
clusterIP: None
---
# wordpress-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: wordpress
spec:
selector:
matchLabels:
app: wordpress
template:
metadata:
labels:
app: wordpress
spec:
containers:
- name: wordpress
image: wordpress:latest
env:
- name: WORDPRESS_DB_HOST
value: mysql
- name: WORDPRESS_DB_USER
value: wordpress
- name: WORDPRESS_DB_PASSWORD
value: wordpresspass
- name: WORDPRESS_DB_NAME
value: wordpress
ports:
- containerPort: 80
volumeMounts:
- name: wordpress-storage
mountPath: /var/www/html
volumes:
- name: wordpress-storage
persistentVolumeClaim:
claimName: wordpress-pvc
---
# wordpress-service.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress
spec:
type: LoadBalancer
selector:
app: wordpress
ports:
- port: 80
targetPort: 80
Hard Elasticsearch Cluster
Deploy a 3-node Elasticsearch cluster using StatefulSet with persistent storage, headless service for discovery, proper init containers, and cluster health checks.
Elasticsearch needs init containers to fix file permissions and increase vm.max_map_count. Use a headless service for cluster discovery. Configure seed hosts and initial master nodes via environment variables.
# elasticsearch-service.yaml
apiVersion: v1
kind: Service
metadata:
name: elasticsearch
spec:
clusterIP: None
selector:
app: elasticsearch
ports:
- name: rest
port: 9200
- name: transport
port: 9300
---
# elasticsearch-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
spec:
serviceName: elasticsearch
replicas: 3
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
env:
- name: cluster.name
value: es-cluster
- name: node.name
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: discovery.seed_hosts
value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
ports:
- containerPort: 9200
name: rest
- containerPort: 9300
name: transport
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
readinessProbe:
httpGet:
path: /_cluster/health
port: 9200
initialDelaySeconds: 30
periodSeconds: 10
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 30Gi
Hard Multi-Tier Application Storage
Design and deploy a complete multi-tier application with proper storage architecture: React frontend (ephemeral cache), Node.js API (ConfigMap + Secret), PostgreSQL with replication (StatefulSet), Redis cluster (StatefulSet), and shared NFS for file uploads.
Break it into layers: frontend uses emptyDir for nginx cache, backend mounts ConfigMap and Secret volumes, PostgreSQL and Redis each use StatefulSets with volumeClaimTemplates, and the shared file storage uses a ReadWriteMany PVC backed by NFS or EFS.
This is an open-ended design challenge. Key considerations:
- Frontend Deployment: emptyDir for /var/cache/nginx
- Backend Deployment: ConfigMap volume at /etc/config, Secret volume at /etc/secrets
- PostgreSQL StatefulSet: volumeClaimTemplates for /var/lib/postgresql/data, headless service, init containers for replication setup
- Redis StatefulSet: volumeClaimTemplates for /data, sentinel sidecar for HA
- Shared PVC with ReadWriteMany access mode for file uploads, mounted in both backend and any file-processing workers
- CronJob for database backups writing to a separate PVC or cloud storage