Monitoring & Logging

Master Kubernetes observability with Prometheus, Grafana, and centralized logging

Medium 25 min read Interactive

Observability in Kubernetes

The Three Pillars of Observability

Production Kubernetes clusters need three complementary signals to understand system health: metrics, logs, and traces. Together they give you a complete picture of what is happening and why.

Metrics

Numerical time-series data about system performance: CPU, memory, request rates, error rates, and latency.

Logs

Detailed records of events: application logs, system logs, audit logs, and security events.

Traces

Request flow through services: distributed tracing, service dependencies, performance bottlenecks, and error propagation.

Monitoring Stack Architecture

The standard monitoring pipeline flows from applications through collection to visualization and alerting:

Logging Stack Architecture

Centralized logging follows a similar pattern:

Stack Comparison

Stack Components Use Case Strengths
Prometheus Stack Prometheus + Grafana + Alertmanager Metrics & Monitoring Native K8s support, powerful queries
ELK Stack Elasticsearch + Logstash + Kibana Log aggregation Full-text search, rich visualizations
EFK Stack Elasticsearch + Fluentd + Kibana K8s logging Cloud-native, lightweight
Loki Stack Loki + Promtail + Grafana Lightweight logging Cost-effective, Prometheus-like
Jaeger Jaeger + OpenTelemetry Distributed tracing End-to-end tracing, OpenTracing support

Key Concepts

  • SLI/SLO/SLA: Service Level Indicators/Objectives/Agreements
  • Golden Signals: Latency, Traffic, Errors, Saturation
  • RED Method: Rate, Errors, Duration (for services)
  • USE Method: Utilization, Saturation, Errors (for resources)

Prometheus Monitoring

Why Prometheus?

Prometheus is the de-facto standard for Kubernetes monitoring. It uses a pull-based model to scrape metrics, stores them as time-series data, and provides PromQL for powerful querying.

Installing Prometheus

install-prometheus.sh
# Add Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack (includes Prometheus, Grafana, Alertmanager)
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set grafana.adminPassword=admin123 \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

# Verify installation
kubectl get pods -n monitoring
kubectl get svc -n monitoring

# Port-forward to access UIs
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-alertmanager 9093:9093

ServiceMonitor Configuration

A ServiceMonitor tells Prometheus which services to scrape and how often:

servicemonitor.yaml
# Application with Prometheus metrics endpoint
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: metrics
---
# ServiceMonitor to scrape metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sample-app-monitor
  labels:
    prometheus: kube-prometheus
spec:
  selector:
    matchLabels:
      app: sample-app
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
    honorLabels: true

Custom Metrics in Applications

metrics.go
package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    // Counter - only goes up (e.g., total requests)
    requestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )

    // Gauge - can go up or down (e.g., active connections)
    activeConnections = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )

    // Histogram - distribution of values (e.g., latency buckets)
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request latencies in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint"},
    )
)

func init() {
    prometheus.MustRegister(requestsTotal)
    prometheus.MustRegister(activeConnections)
    prometheus.MustRegister(requestDuration)
}

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}

PromQL Queries

CPU Usage

rate(container_cpu_usage_seconds_total[5m]) * 100

CPU usage percentage over 5 minutes

Memory Usage

container_memory_working_set_bytes
/ container_spec_memory_limit_bytes * 100

Memory usage percentage

Request Rate

sum(rate(http_requests_total[5m]))
  by (service)

Requests per second by service

Error Rate

sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))

5xx error percentage

P95 Latency

histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket[5m]))

95th percentile latency

Pod Restarts

increase(
  kube_pod_container_status_restarts_total[1h])

Container restarts in last hour

Alert Rules

alert-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
  namespace: monitoring
spec:
  groups:
  - name: app.rules
    interval: 30s
    rules:
    # High CPU Usage
    - alert: HighCPUUsage
      expr: |
        (sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, namespace) * 100) > 80
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage on pod {{ $labels.pod }}"
        description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} CPU above 80%"

    # High Memory Usage
    - alert: HighMemoryUsage
      expr: |
        (container_memory_working_set_bytes / container_spec_memory_limit_bytes) * 100 > 90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High memory usage on pod {{ $labels.pod }}"

    # Pod Crash Looping
    - alert: PodCrashLooping
      expr: |
        increase(kube_pod_container_status_restarts_total[15m]) > 3
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.pod }} is crash looping"

    # High Error Rate
    - alert: HighErrorRate
      expr: |
        (sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
         / sum(rate(http_requests_total[5m])) by (service)) > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High error rate for {{ $labels.service }}"

Prometheus Best Practices

  • Use appropriate metric types (Counter, Gauge, Histogram, Summary)
  • Keep cardinality low - avoid high-cardinality labels
  • Use recording rules for frequently-used complex queries
  • Set appropriate retention periods based on storage capacity
  • Use federation for multi-cluster monitoring

Grafana Dashboards

Dashboards as Code

Store your Grafana dashboards as JSON in ConfigMaps so they are version-controlled and automatically provisioned when the cluster is rebuilt.

Dashboard Configuration

dashboard.json
{
  "dashboard": {
    "title": "Kubernetes Application Dashboard",
    "panels": [
      {
        "id": 1,
        "title": "CPU Usage",
        "type": "graph",
        "targets": [{
          "expr": "sum(rate(container_cpu_usage_seconds_total{namespace=\"$namespace\"}[5m])) by (pod) * 100",
          "legendFormat": "{{pod}}"
        }]
      },
      {
        "id": 2,
        "title": "Memory Usage",
        "type": "graph",
        "targets": [{
          "expr": "sum(container_memory_working_set_bytes{namespace=\"$namespace\"}) by (pod) / 1024 / 1024",
          "legendFormat": "{{pod}}"
        }]
      },
      {
        "id": 3,
        "title": "Error Rate",
        "type": "singlestat",
        "targets": [{
          "expr": "sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m])) * 100"
        }],
        "thresholds": "1,5",
        "colors": ["green", "yellow", "red"]
      }
    ],
    "templating": {
      "list": [{
        "name": "namespace",
        "type": "query",
        "query": "label_values(kube_pod_info, namespace)"
      }]
    },
    "refresh": "30s"
  }
}

ConfigMap for Dashboard Provisioning

grafana-provisioning.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboards
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  k8s-cluster-dashboard.json: |
    { /* Dashboard JSON here */ }

---
# Data sources configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-datasources
  namespace: monitoring
data:
  datasources.yaml: |
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-kube-prometheus-prometheus:9090
      isDefault: true
    - name: Loki
      type: loki
      url: http://loki:3100
    - name: Jaeger
      type: jaeger
      url: http://jaeger-query:16686

Grafana Tips

  • Use variables for dynamic dashboards
  • Import community dashboards from grafana.com
  • Set up alert notifications (Slack, PagerDuty, etc.)
  • Use annotations to mark deployments and incidents

Centralized Logging

EFK Stack Setup

efk-stack.yaml
# Elasticsearch StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      initContainers:
      - name: init-sysctl
        image: busybox
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.15.0
        ports:
        - containerPort: 9200
          name: rest
        - containerPort: 9300
          name: transport
        env:
        - name: cluster.name
          value: k8s-logs
        - name: ES_JAVA_OPTS
          value: "-Xms1g -Xmx1g"
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 30Gi

---
# Fluentd DaemonSet
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch
        env:
        - name: FLUENT_ELASTICSEARCH_HOST
          value: "elasticsearch.logging.svc.cluster.local"
        - name: FLUENT_ELASTICSEARCH_PORT
          value: "9200"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers

---
# Kibana Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.15.0
        ports:
        - containerPort: 5601
        env:
        - name: ELASTICSEARCH_HOSTS
          value: "http://elasticsearch:9200"

Fluentd Configuration

fluentd-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    # Tail container logs
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    # Enrich with Kubernetes metadata
    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>

    # Output to Elasticsearch
    <match **>
      @type elasticsearch
      host elasticsearch.logging.svc.cluster.local
      port 9200
      logstash_format true
      logstash_prefix k8s
      <buffer>
        @type memory
        flush_interval 5s
        chunk_limit_size 2M
      </buffer>
    </match>

Loki Stack (Lightweight Alternative)

install-loki.sh
# Install Loki Stack with Helm
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki-stack \
  --namespace logging \
  --create-namespace \
  --set grafana.enabled=false \
  --set prometheus.enabled=false \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=10Gi \
  --set promtail.enabled=true

# Verify installation
kubectl get pods -n logging

# Add Loki data source in Grafana
# URL: http://loki.logging.svc.cluster.local:3100

Logging Considerations

  • Always configure log rotation and retention policies
  • Never log sensitive information (passwords, tokens, PII)
  • Use structured logging (JSON) for easier parsing and searching
  • Implement log sampling for high-volume applications
  • Consider storage costs for long-term retention

Distributed Tracing

Why Distributed Tracing?

In microservices architectures, a single user request may touch dozens of services. Distributed tracing follows a request end-to-end, showing exactly where time is spent and where errors occur.

Jaeger Installation

jaeger.yaml
# Install Jaeger Operator
kubectl create namespace observability
kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.37.0/jaeger-operator.yaml -n observability

---
# Jaeger Instance
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
  namespace: observability
spec:
  strategy: production
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
  agent:
    strategy: DaemonSet
  collector:
    replicas: 2
    autoscale: true
    maxReplicas: 5
  query:
    replicas: 2

OpenTelemetry Integration

tracing.go
package main

import (
    "context"
    "log"
    "net/http"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/exporters/jaeger"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    "go.opentelemetry.io/otel/trace"
)

func initTracer() func() {
    // Create Jaeger exporter
    exp, err := jaeger.New(jaeger.WithCollectorEndpoint(
        jaeger.WithEndpoint("http://jaeger-collector:14268/api/traces"),
    ))
    if err != nil {
        log.Fatal(err)
    }

    // Create and register trace provider
    tp := sdktrace.NewTracerProvider(sdktrace.WithBatcher(exp))
    otel.SetTracerProvider(tp)

    return func() { tp.Shutdown(context.Background()) }
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
    tracer := otel.Tracer("my-service")
    ctx, span := tracer.Start(r.Context(), "handleRequest",
        trace.WithAttributes(
            attribute.String("http.method", r.Method),
            attribute.String("http.url", r.URL.String()),
        ),
    )
    defer span.End()

    // Child span for database call
    _, dbSpan := tracer.Start(ctx, "database.query")
    // ... do DB work ...
    dbSpan.End()

    w.WriteHeader(http.StatusOK)
}

func main() {
    cleanup := initTracer()
    defer cleanup()

    http.HandleFunc("/", handleRequest)
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Tracing Best Practices

  • Use sampling to reduce overhead (1-10% in production)
  • Add meaningful span attributes and events
  • Implement context propagation across service boundaries
  • Correlate traces with logs and metrics for full observability
  • Set up trace-based alerting for SLO compliance

Practice Problems

Medium Deploy a Complete Observability Stack

Deploy a full observability stack for a microservices application with Prometheus for metrics, EFK for logging, and Jaeger for tracing.

Start by creating separate namespaces (monitoring, logging, tracing). Use Helm for Prometheus and Loki. Deploy Elasticsearch as a StatefulSet. Wire up ServiceMonitors for metric scraping.

# 1. Create namespaces
kubectl create namespace monitoring
kubectl create namespace logging
kubectl create namespace tracing

# 2. Prometheus Stack (includes Grafana)
helm install kube-prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

# 3. EFK Stack
# Deploy Elasticsearch, Fluentd DaemonSet, and Kibana
# (use manifests from the Centralized Logging section above)

# 4. Jaeger
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.37.0/jaeger-operator.yaml -n tracing

# 5. Wire up data sources in Grafana
# Prometheus: http://prometheus-kube-prometheus-prometheus:9090
# Elasticsearch: http://elasticsearch.logging:9200
# Jaeger: http://jaeger-query.tracing:16686

Medium Implement SLO Monitoring

Define SLIs for a web service (availability, latency, error rate), create SLO targets (99.9% availability), and implement error budget tracking with multi-window alerts.

Use Prometheus recording rules to calculate SLIs over different windows (5m, 30m, 1h). The error budget is: 1 - ((1 - actual_availability) / (1 - slo_target)). Use multi-window alerting so fast burns trigger faster alerts.

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: slo-rules
  namespace: monitoring
spec:
  groups:
  - name: slo.rules
    rules:
    # Availability SLI
    - record: sli:availability:ratio_rate5m
      expr: |
        sum(rate(http_requests_total{status!~"5.."}[5m])) by (service)
        / sum(rate(http_requests_total[5m])) by (service)

    # Error Budget remaining
    - record: error_budget:remaining
      expr: |
        1 - ((1 - sli:availability:ratio_rate5m) / (1 - 0.999))

    # Multi-window alert
    - alert: SLOAvailabilityBreach
      expr: |
        sli:availability:ratio_rate5m < 0.999
        AND sli:availability:ratio_rate1h < 0.999
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "SLO breach for {{ $labels.service }}"

Hard Multi-Cluster Observability

Design and implement observability for a multi-cluster Kubernetes deployment with Prometheus federation, centralized logging, cross-cluster tracing, and Thanos for long-term metric storage.

Use Thanos sidecar on each cluster's Prometheus to upload blocks to object storage. A central Thanos Querier aggregates data from all clusters. For logging, ship logs from each cluster's Fluentd to a centralized Elasticsearch cluster. Use Jaeger with shared storage for cross-cluster trace correlation.

# Thanos sidecar for each cluster's Prometheus
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: prometheus
spec:
  thanos:
    image: quay.io/thanos/thanos:v0.30.2
    objectStorageConfig:
      name: thanos-objstore-config
      key: thanos.yaml
    version: v0.30.2

---
# Central Thanos Querier
apiVersion: apps/v1
kind: Deployment
metadata:
  name: thanos-querier
spec:
  template:
    spec:
      containers:
      - name: thanos-query
        image: quay.io/thanos/thanos:v0.30.2
        args:
        - query
        - --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc
        - --store=cluster-a-prometheus:10901
        - --store=cluster-b-prometheus:10901

Quick Reference

Essential Commands

Task Command
Check Prometheus targets kubectl port-forward svc/prometheus 9090 -n monitoring
View pod logs kubectl logs -f deployment/app --all-containers
Check cluster events kubectl get events --sort-by='.lastTimestamp'
Resource usage kubectl top pods --sort-by=cpu
Describe ServiceMonitor kubectl get servicemonitors -n monitoring

The Four Golden Signals

Latency

Time to service a request. Track both successful and failed request latencies separately.

Traffic

Demand on the system: HTTP requests/sec, transactions/sec, or sessions.

Errors

Rate of failed requests: explicit (5xx), implicit (wrong content), or policy-based (> 1s latency).

Saturation

How "full" the service is: CPU, memory, I/O, queue depth. Signals capacity limits before failure.

Remember

Good observability is not about collecting everything - it is about collecting the right signals and making them actionable. Start with the Golden Signals, add custom metrics for your business logic, and build dashboards that help you answer questions during incidents.