Custom Resource Definitions (CRDs)
CRDs extend the Kubernetes API with your own resource types
They allow you to create custom resources that behave like native Kubernetes objects, enabling declarative management of any application or infrastructure.
API Extension
Add new resource types to Kubernetes API without modifying Kubernetes itself
Declarative Config
Define desired state using YAML manifests just like built-in resources
CRUD Operations
Support standard kubectl operations: create, get, update, delete
Creating a Custom Resource Definition
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgres", "mysql", "mongodb"]
version:
type: string
replicas:
type: integer
minimum: 1
maximum: 10
storage:
type: object
properties:
size:
type: string
pattern: "^[0-9]+Gi$"
class:
type: string
required: ["size"]
backup:
type: object
properties:
enabled:
type: boolean
schedule:
type: string
required: ["engine", "version", "storage"]
status:
type: object
properties:
phase:
type: string
enum: ["Creating", "Running", "Failed", "Deleting"]
replicas:
type: integer
endpoint:
type: string
additionalPrinterColumns:
- name: Engine
type: string
jsonPath: .spec.engine
- name: Version
type: string
jsonPath: .spec.version
- name: Status
type: string
jsonPath: .status.phase
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames:
- db
Using Custom Resources
apiVersion: example.com/v1
kind: Database
metadata:
name: my-postgres
namespace: production
spec:
engine: postgres
version: "14.5"
replicas: 3
storage:
size: 100Gi
class: fast-ssd
backup:
enabled: true
schedule: "0 2 * * *"
Pro Tip
Use OpenAPI schema validation to ensure custom resources follow your specifications. This prevents invalid configurations from being created.
Advanced CRD Features
Versioning
spec:
versions:
- name: v1beta1
served: true
storage: false
deprecated: true
deprecationWarning: "v1beta1 is deprecated, use v1"
schema: # ... v1beta1 schema
- name: v1
served: true
storage: true
schema: # ... v1 schema
conversion:
strategy: Webhook
webhook:
clientConfig:
service:
name: crd-conversion-webhook
namespace: system
path: "/convert"
conversionReviewVersions: ["v1", "v1beta1"]
Validation with CEL
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
x-kubernetes-validations:
- rule: "self.minReplicas <= self.replicas"
message: "replicas must be >= minReplicas"
- rule: "self.replicas <= self.maxReplicas"
message: "replicas must be <= maxReplicas"
properties:
replicas:
type: integer
minReplicas:
type: integer
maxReplicas:
type: integer
Subresources
spec:
versions:
- name: v1
served: true
storage: true
subresources:
status: {} # Enable status subresource
scale: # Enable scale subresource
specReplicasPath: .spec.replicas
statusReplicasPath: .status.replicas
labelSelectorPath: .status.labelSelector
Kubernetes Controllers
Controllers are control loops that watch the state of your cluster and make changes to move the current state toward the desired state.
The Reconciliation Loop
Every controller follows the same pattern: Watch for changes, compare desired vs actual state, take action, update status, and repeat. This is the core of the Kubernetes control plane.
Controller Implementation (Go)
package controllers
import (
"context"
"fmt"
"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
examplev1 "example.com/api/v1"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type DatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
}
// Reconcile is the main logic of the controller
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
// Fetch the Database instance
database := &examplev1.Database{}
err := r.Get(ctx, req.NamespacedName, database)
if err != nil {
if errors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}
// Define desired StatefulSet
statefulSet := r.statefulSetForDatabase(database)
// Check if StatefulSet exists
found := &appsv1.StatefulSet{}
err = r.Get(ctx, types.NamespacedName{
Name: statefulSet.Name,
Namespace: statefulSet.Namespace,
}, found)
if err != nil && errors.IsNotFound(err) {
// Create StatefulSet
log.Info("Creating StatefulSet", "name", statefulSet.Name)
err = r.Create(ctx, statefulSet)
if err != nil {
return ctrl.Result{}, err
}
database.Status.Phase = "Creating"
r.Status().Update(ctx, database)
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
return ctrl.Result{}, err
}
// Update StatefulSet if needed
if !reflect.DeepEqual(statefulSet.Spec, found.Spec) {
found.Spec = statefulSet.Spec
err = r.Update(ctx, found)
if err != nil {
return ctrl.Result{}, err
}
}
// Update Database status
database.Status.Phase = "Running"
database.Status.Replicas = found.Status.Replicas
err = r.Status().Update(ctx, database)
return ctrl.Result{RequeueAfter: time.Minute}, nil
}
// SetupWithManager sets up the controller with the Manager
func (r *DatabaseReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&examplev1.Database{}).
Owns(&appsv1.StatefulSet{}).
Complete(r)
}
Key Controller Concepts
- Reconciliation: Core loop that ensures desired state matches actual state
- Owner References: Automatic cleanup when parent resource is deleted
- Status Updates: Keep users informed about resource state
- Requeue: Schedule future reconciliation for ongoing operations
Controller Patterns
Level-Based Triggering
React to current state, not events. Makes controllers resilient to restarts and missed events.
Idempotency
Multiple reconciliations produce the same result. Safe to retry operations.
Finalizers
Clean up external resources before deletion. Prevent orphaned resources.
Implementing Finalizers
const databaseFinalizer = "database.example.com/finalizer"
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
database := &examplev1.Database{}
err := r.Get(ctx, req.NamespacedName, database)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Check if the resource is marked for deletion
if database.ObjectMeta.DeletionTimestamp != nil {
if controllerutil.ContainsFinalizer(database, databaseFinalizer) {
// Perform cleanup
if err := r.deleteExternalResources(database); err != nil {
return ctrl.Result{}, err
}
// Remove finalizer
controllerutil.RemoveFinalizer(database, databaseFinalizer)
err := r.Update(ctx, database)
if err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}
// Add finalizer if not present
if !controllerutil.ContainsFinalizer(database, databaseFinalizer) {
controllerutil.AddFinalizer(database, databaseFinalizer)
err = r.Update(ctx, database)
if err != nil {
return ctrl.Result{}, err
}
}
// Normal reconciliation logic...
return ctrl.Result{}, nil
}
Operator Frameworks
Tools and SDKs that simplify building Kubernetes operators with best practices built-in.
| Framework | Language | Complexity | Best For |
|---|---|---|---|
| Operator SDK | Go, Ansible, Helm | Medium | Production operators with complex logic |
| Kubebuilder | Go | Medium-High | Advanced operators with custom APIs |
| KUDO | YAML | Low | Simple operators without coding |
| Kopf | Python | Low-Medium | Python developers, rapid prototyping |
| Metacontroller | Any (Webhooks) | Low | Simple controllers with webhooks |
Operator SDK Quick Start
# Initialize a new operator project
operator-sdk init --domain example.com --repo github.com/example/database-operator
# Create API and controller for Database resource
operator-sdk create api --group example --version v1 --kind Database --resource --controller
# Generate CRD manifests from Go types
make manifests
# Install CRDs into cluster
make install
# Run operator locally for development
make run
# Build and push operator image
make docker-build docker-push IMG=example/database-operator:v1.0.0
# Deploy operator to cluster
make deploy IMG=example/database-operator:v1.0.0
Operator Lifecycle Manager (OLM)
apiVersion: operators.coreos.com/v1alpha1
kind: ClusterServiceVersion
metadata:
name: database-operator.v1.0.0
namespace: operators
spec:
displayName: Database Operator
description: |
The Database Operator manages PostgreSQL, MySQL, and MongoDB instances
with automated backups, scaling, and failover capabilities.
version: 1.0.0
replaces: database-operator.v0.9.0
customresourcedefinitions:
owned:
- name: databases.example.com
version: v1
kind: Database
displayName: Database
description: Represents a database instance
install:
strategy: deployment
spec:
deployments:
- name: database-operator
spec:
replicas: 1
selector:
matchLabels:
name: database-operator
template:
metadata:
labels:
name: database-operator
spec:
serviceAccountName: database-operator
containers:
- name: database-operator
image: example/database-operator:v1.0.0
command:
- database-operator
Installing with OLM
# Create a CatalogSource
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: database-operators
namespace: olm
spec:
sourceType: grpc
image: example/database-operator-catalog:latest
---
# Create a Subscription
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: database-operator
namespace: operators
spec:
channel: stable
name: database-operator
source: database-operators
sourceNamespace: olm
installPlanApproval: Automatic
Operator Lifecycle Management
Operator Maturity Model
Level 1: Basic Install
Automated application provisioning and configuration
Level 2: Seamless Upgrades
Patch and minor version upgrades supported
Level 3: Full Lifecycle
App lifecycle, storage lifecycle, backups, failure recovery
Level 4: Deep Insights
Metrics, alerts, log processing, workload analysis
Level 5: Auto Pilot
Auto-scaling, auto-tuning, abnormality detection
Testing Operators
package controllers_test
import (
"context"
"testing"
"time"
. "github.com/onsi/ginkgo"
. "github.com/onsi/gomega"
appsv1 "k8s.io/api/apps/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"sigs.k8s.io/controller-runtime/pkg/client"
examplev1 "example.com/api/v1"
)
var _ = Describe("Database Controller", func() {
Context("When creating a Database", func() {
It("Should create a StatefulSet", func() {
ctx := context.Background()
database := &examplev1.Database{
ObjectMeta: metav1.ObjectMeta{
Name: "test-database",
Namespace: "default",
},
Spec: examplev1.DatabaseSpec{
Engine: "postgres",
Version: "14",
Replicas: 3,
Storage: examplev1.StorageSpec{
Size: "10Gi",
Class: "standard",
},
},
}
Expect(k8sClient.Create(ctx, database)).Should(Succeed())
statefulSet := &appsv1.StatefulSet{}
Eventually(func() bool {
err := k8sClient.Get(ctx, client.ObjectKey{
Name: "test-database-statefulset",
Namespace: "default",
}, statefulSet)
return err == nil
}, time.Second*10, time.Second).Should(BeTrue())
Expect(*statefulSet.Spec.Replicas).Should(Equal(int32(3)))
})
It("Should update Database status", func() {
ctx := context.Background()
database := &examplev1.Database{}
Eventually(func() string {
err := k8sClient.Get(ctx, client.ObjectKey{
Name: "test-database",
Namespace: "default",
}, database)
if err != nil {
return ""
}
return database.Status.Phase
}, time.Second*10, time.Second).Should(Equal("Running"))
})
})
})
Monitoring Operators
// Add Prometheus metrics to your controller
import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
reconciliationDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "database_operator_reconciliation_duration_seconds",
Help: "Duration of reconciliation in seconds",
},
[]string{"database", "namespace"},
)
databasesTotal = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "database_operator_databases_total",
Help: "Total number of databases managed",
},
[]string{"engine", "status"},
)
)
func init() {
metrics.Registry.MustRegister(reconciliationDuration, databasesTotal)
}
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
start := time.Now()
defer func() {
reconciliationDuration.WithLabelValues(
req.Name,
req.Namespace,
).Observe(time.Since(start).Seconds())
}()
// Reconciliation logic...
}
Popular Kubernetes Operators
Prometheus Operator
Manages Prometheus instances, ServiceMonitors, and PrometheusRules for monitoring
PostgreSQL Operator (Zalando)
Creates and manages PostgreSQL clusters with streaming replication and backups
Strimzi Kafka Operator
Deploys and manages Apache Kafka clusters, topics, and users
Elastic Cloud on Kubernetes
Deploy, manage, and orchestrate Elasticsearch clusters
MongoDB Community Operator
Manages MongoDB replica sets with authentication and TLS
Cert-Manager
Automates certificate management using Let's Encrypt and other issuers
Building a Stateful Service Operator
# Redis CRD
apiVersion: redis.example.com/v1
kind: RedisCluster
metadata:
name: redis-production
spec:
replicas: 6 # 3 masters, 3 replicas
version: "7.0"
persistence:
enabled: true
size: 10Gi
auth:
enabled: true
secretName: redis-auth
backup:
enabled: true
schedule: "0 */6 * * *"
destination: s3://backups/redis
monitoring:
enabled: true
serviceMonitor: true
---
# Operator will create:
# - StatefulSet for Redis nodes
# - Services for client access
# - ConfigMaps for configuration
# - Secrets for authentication
# - PodDisruptionBudget for availability
# - ServiceMonitor for Prometheus
# - CronJob for backups
Advanced Features Implementation
// Auto-scaling based on memory usage
func (r *RedisReconciler) handleAutoScaling(ctx context.Context, redis *redisv1.RedisCluster) error {
// Get current metrics
metrics, err := r.getRedisMetrics(redis)
if err != nil {
return err
}
currentReplicas := redis.Spec.Replicas
desiredReplicas := currentReplicas
// Scale up if memory usage > 80%
if metrics.MemoryUsagePercent > 80 {
desiredReplicas = min(currentReplicas+2, redis.Spec.MaxReplicas)
log.Info("Scaling up Redis cluster", "from", currentReplicas, "to", desiredReplicas)
}
// Scale down if memory usage < 30%
if metrics.MemoryUsagePercent < 30 && currentReplicas > redis.Spec.MinReplicas {
desiredReplicas = max(currentReplicas-2, redis.Spec.MinReplicas)
log.Info("Scaling down Redis cluster", "from", currentReplicas, "to", desiredReplicas)
}
if desiredReplicas != currentReplicas {
redis.Spec.Replicas = desiredReplicas
return r.Update(ctx, redis)
}
return nil
}
// Automated failover handling
func (r *RedisReconciler) handleFailover(ctx context.Context, redis *redisv1.RedisCluster) error {
masters, replicas, err := r.getRedisTopology(redis)
if err != nil {
return err
}
for _, master := range masters {
if !master.IsHealthy() {
log.Info("Master node unhealthy, initiating failover", "node", master.Name)
// Find best replica to promote
bestReplica := r.selectBestReplica(replicas, master)
if bestReplica == nil {
return fmt.Errorf("no suitable replica found for failover")
}
// Promote replica to master
if err := r.promoteReplica(bestReplica); err != nil {
return err
}
// Update cluster configuration
redis.Status.Topology = r.updateTopology(masters, replicas, master, bestReplica)
redis.Status.LastFailover = metav1.Now()
return r.Status().Update(ctx, redis)
}
}
return nil
}
Deployment Best Practices
Production Checklist
- RBAC: Use least privilege principle for operator permissions
- Resource Limits: Set CPU/memory limits for operator pods
- High Availability: Run multiple operator replicas with leader election
- Observability: Export metrics, structured logging, tracing
- Upgrades: Support zero-downtime upgrades with conversion webhooks
- Security: Scan images, use network policies, enable PSPs/PSAs
Leader Election Configuration
func main() {
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: ":8080",
Port: 9443,
HealthProbeBindAddress: ":8081",
LeaderElection: true,
LeaderElectionID: "database-operator.example.com",
LeaderElectionNamespace: "operators",
LeaseDuration: 15 * time.Second,
RenewDeadline: 10 * time.Second,
RetryPeriod: 2 * time.Second,
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
// Setup controllers...
}
Troubleshooting Operators
Common Issues and Solutions
- Reconciliation Loop: Add requeue delays, implement backoff
- Memory Leaks: Proper cleanup, limit watch scope
- RBAC Errors: Review and update ClusterRole permissions
- Webhook Failures: Check certificates, network policies
- Performance: Use indexers, limit reconciliation frequency
Debugging Commands
# View operator logs
kubectl logs -n operators deployment/database-operator -f
# Check recent events
kubectl get events --sort-by='.lastTimestamp' -A | grep database
# Inspect custom resource status
kubectl describe database.example.com/my-database
# List registered custom resources
kubectl api-resources --api-group=example.com
Practice Problems
Easy Create a Basic CRD
Write a CRD for a "WebApp" resource that includes fields for image, replicas, and port. Apply it to a cluster and create an instance.
Start with the apiextensions.k8s.io/v1 API. Define an openAPIV3Schema with spec.properties for image (string), replicas (integer), and port (integer).
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: webapps.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
image:
type: string
replicas:
type: integer
minimum: 1
port:
type: integer
required: ["image", "replicas", "port"]
scope: Namespaced
names:
plural: webapps
singular: webapp
kind: WebApp
shortNames: [wa]
Medium Implement a Reconciliation Loop
Write the Reconcile function for a controller that ensures a Deployment exists for each WebApp custom resource, matching the desired replicas and image.
Fetch the WebApp CR, check if a Deployment exists with the same name. If not, create one. If it exists, compare replicas and image, updating if needed.
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := log.FromContext(ctx)
webapp := &examplev1.WebApp{}
if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
deploy := &appsv1.Deployment{}
err := r.Get(ctx, req.NamespacedName, deploy)
if errors.IsNotFound(err) {
// Create Deployment
replicas := int32(webapp.Spec.Replicas)
deploy = &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: webapp.Name, Namespace: webapp.Namespace,
},
Spec: appsv1.DeploymentSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{"app": webapp.Name},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{"app": webapp.Name},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{{
Name: "webapp",
Image: webapp.Spec.Image,
Ports: []corev1.ContainerPort{{
ContainerPort: int32(webapp.Spec.Port),
}},
}},
},
},
},
}
ctrl.SetControllerReference(webapp, deploy, r.Scheme)
return ctrl.Result{}, r.Create(ctx, deploy)
}
// Update if needed
replicas := int32(webapp.Spec.Replicas)
deploy.Spec.Replicas = &replicas
deploy.Spec.Template.Spec.Containers[0].Image = webapp.Spec.Image
return ctrl.Result{}, r.Update(ctx, deploy)
}
Medium Add Finalizer Logic
Extend the WebApp controller with a finalizer that cleans up an external DNS record when the WebApp is deleted.
Check DeletionTimestamp to detect deletion. If the finalizer is present, perform cleanup then remove the finalizer. If not being deleted, ensure the finalizer is added.
const webappFinalizer = "webapp.example.com/dns-cleanup"
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
webapp := &examplev1.WebApp{}
if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
if webapp.DeletionTimestamp != nil {
if controllerutil.ContainsFinalizer(webapp, webappFinalizer) {
// Clean up external DNS
if err := r.deleteDNSRecord(webapp.Name); err != nil {
return ctrl.Result{}, err
}
controllerutil.RemoveFinalizer(webapp, webappFinalizer)
return ctrl.Result{}, r.Update(ctx, webapp)
}
return ctrl.Result{}, nil
}
if !controllerutil.ContainsFinalizer(webapp, webappFinalizer) {
controllerutil.AddFinalizer(webapp, webappFinalizer)
return ctrl.Result{}, r.Update(ctx, webapp)
}
// Normal reconciliation...
return ctrl.Result{}, nil
}
Hard Build a Multi-Version CRD with Conversion Webhook
Design a CRD with v1beta1 and v1 versions. Write a conversion webhook that translates between both versions, handling field renames and new required fields with defaults.
Register a webhook server that handles ConversionReview requests. Map old fields to new ones during v1beta1-to-v1 conversion, and reverse the mapping for v1-to-v1beta1.
func (r *WebApp) ConvertTo(dstRaw conversion.Hub) error {
dst := dstRaw.(*v1.WebApp)
dst.ObjectMeta = r.ObjectMeta
// v1beta1 "size" becomes v1 "replicas"
dst.Spec.Replicas = r.Spec.Size
dst.Spec.Image = r.Spec.Image
dst.Spec.Port = r.Spec.Port
// New v1 field with default
if dst.Spec.Strategy == "" {
dst.Spec.Strategy = "RollingUpdate"
}
return nil
}
func (r *WebApp) ConvertFrom(srcRaw conversion.Hub) error {
src := srcRaw.(*v1.WebApp)
r.ObjectMeta = src.ObjectMeta
r.Spec.Size = src.Spec.Replicas
r.Spec.Image = src.Spec.Image
r.Spec.Port = src.Spec.Port
return nil
}
Remember
Operators encode human operational knowledge into software. Start simple with a basic install operator and iterate toward higher maturity levels as your understanding deepens.