CI/CD for Microservices | LIZIU Microservices

Why CI/CD Matters for Microservices

Why CI/CD Matters

The Problem: The monolith’s “one big release” works because there is one artifact, one schema, one deployment window. With 100 services owned by 30 teams, a single coordinated release window is an extinction event — you cannot get all teams ready at the same time, and the changes you do ship are too large to debug when something breaks.

The Solution: Each service owns its own pipeline. Every commit produces an immutable artifact. Every artifact can be deployed independently. The pipeline enforces the contracts — tests, scans, signing — that used to be enforced by the release manager.

Real Impact: Amazon claimed in 2014 to deploy every 11.7 seconds across its services. That number is only achievable when the unit of release is a single service and the pipeline runs unattended.

Real-World Analogy

The monolith release is a handcrafting workshop — a master builds one finished cabinet end-to-end, every cut is bespoke, throughput is one a week. CI/CD for microservices is a factory assembly line — each station does one thing (build, test, scan, ship), parts are interchangeable, and every car coming off the line is identical except for VIN and trim.

You don’t scale a handcrafting workshop by hiring more masters. You scale by replacing the workshop with a line. A microservices org without an industrialized pipeline is a workshop pretending to be a factory — and it will produce the worst of both.

The thing CI/CD actually buys you is not speed. It is independence. Each team ships when it is ready, on a cadence it controls, behind quality gates it understands. Coordination cost goes from O(N²) team-pairs in a manual release to O(1) per service in a pipeline.

What changes when you move from monolith to many services

Concern	Monolith	Microservices
Build artifacts	One WAR / JAR / binary	One image per service, hundreds in flight
Release cadence	Weekly or monthly	Per-commit, per-service
Versioning	Version the app	Version every service and every contract
Test scope	Big in-process suite	Unit + integration + contract + smoke
Failure blast radius	The whole app	One service if you did the patterns right
Rollback unit	Previous artifact	Per-service Git revert or image pin

Anatomy of a Microservice Pipeline

Why the Stages Are Standard

The Problem: Every team invents their own pipeline shape, then copies bugs between them. Some skip security scans. Some test against latest. Some have no rollback story.

The Solution: Standardize the stages. The order is not optional — you cannot scan an image you haven’t built, and you cannot promote a tag your tests didn’t see.

A production pipeline for a single service moves through these stages, in this order:

Stage definitions

Source: Trigger on git push to a branch or pull request. The commit SHA is the identity for everything that follows.
Build: Compile, lint, type-check. Fast feedback — under two minutes is the goal.
Unit test: No network, no database, no other services. If it needs Docker to run, it isn’t a unit test.
Container build: Multi-stage Dockerfile. The build context becomes a tagged image.
Security & SBOM scan: Trivy / Grype / Snyk for CVEs; Syft to produce a Software Bill of Materials. Fail on high/critical CVEs in your code; warn on base-image CVEs.
Integration test: Spin up real dependencies via testcontainers (Postgres, Kafka, Redis). The image under test runs against them.
Registry push: Push the immutable image to ECR / GCR / Artifact Registry / Harbor. Sign it with cosign.
Deploy: The pipeline either updates a Kubernetes manifest in Git (GitOps) or pokes a controller (Spinnaker, Argo Rollouts) to start a progressive rollout.

Per-Service vs. Monorepo Pipelines

Why This Choice Defines Your Org

The Problem: Monorepo gives you atomic cross-service refactors but a 90-minute “build everything” CI run. Polyrepo gives you fast per-service builds but turns shared libraries into a coordination nightmare.

The Solution: The right answer is rarely “rebuild everything on every commit.” Use change detection — Bazel, Nx, Turborepo, or git-diff-based path filters — so the pipeline only rebuilds what actually changed.

Dimension	One repo per service (polyrepo)	Monorepo
Cross-service refactor	Multiple PRs, careful sequencing	One atomic PR
Pipeline simplicity	Trivial — one service per pipeline	Needs change detection or it’s slow
Ownership boundaries	Hard, enforced by repo permissions	Soft, enforced by CODEOWNERS
Discoverability	Hard — where does that service live?	One `grep` finds anything
Build infra cost	Cheap per build, redundant tooling	One sophisticated build system, more complex
Best at scale	Independent teams, loose coupling	Tight platform team, shared standards

Google, Meta, and Uber run monorepos with custom build systems. Netflix and Amazon lean polyrepo with strong platform tooling per service. Both work; the failure mode is the middle — a monorepo without change detection, or a polyrepo without a paved-road template.

Change detection in practice

The shape of change detection is always the same: compute the affected set, build only that set, and cache the rest. Bazel uses content hashes; Nx and Turborepo use a project graph plus inputs/outputs declarations; the cheapest version is a path filter in the CI config:

# .github/workflows/services.yml — change-detection with path filters
name: services
on:
  push:
    branches: [main]
  pull_request:

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      services: ${{ steps.filter.outputs.changes }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            payments: 'services/payments/**'
            orders:   'services/orders/**'
            shipping: 'services/shipping/**'

  build:
    needs: changes
    if: needs.changes.outputs.services != '[]'
    strategy:
      matrix:
        service: ${{ fromJSON(needs.changes.outputs.services) }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make -C services/${{ matrix.service }} build test image

Independent deployability is a property, not a wish

The whole point of separate pipelines is that service A can ship without service B’s consent. If your CI requires “all services pass integration tests against each other before any of them deploys,” you have built a distributed monolith with extra steps. Contract tests (later) are the way out.

Build Artifacts and Container Registries

Why Image Hygiene Matters

The Problem: A team tags every build service:latest. Production has been running “latest” for six months. Nobody can tell you which commit is in prod, the SBOM is gone, and rollback means “hopefully someone tagged a backup.”

The Solution: Immutable, content-addressable images. Tag with the git SHA. Pin by digest in production. Sign every image. Generate an SBOM for every image.

The non-negotiable rules of container hygiene:

Tag every image with the git SHA — orders:9c4a7b2, never orders:latest. latest is a mutable pointer; the next push overwrites it. You cannot roll back to a tag whose contents have changed.
Pin by digest in production manifests — orders@sha256:e3b0c4…. The SHA tag is for humans; the digest is what the runtime actually trusts.
Multi-arch builds — linux/amd64 and linux/arm64. Graviton, M-series Macs, Ampere — arm64 is no longer optional. docker buildx handles both in one push.
Sign images with cosign — the registry stores a signature alongside the image. Admission controllers (Kyverno, Gatekeeper, Connaisseur) verify it before scheduling.
Generate an SBOM with Syft and scan it with Trivy or Grype. Attach the SBOM to the image as an OCI artifact.

A realistic multi-stage Dockerfile

# syntax=docker/dockerfile:1.7
# ---- build stage ---------------------------------------------------------
FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod go mod download
COPY . .
ARG GIT_SHA=unknown
RUN CGO_ENABLED=0 go build \
      -ldflags "-s -w -X main.commit=${GIT_SHA}" \
      -o /out/orders ./cmd/orders

# ---- runtime stage -------------------------------------------------------
FROM gcr.io/distroless/static-debian12:nonroot
USER nonroot:nonroot
COPY --from=build /out/orders /orders
EXPOSE 8080
ENTRYPOINT ["/orders"]

Things to notice: distroless base (no shell, no package manager, smaller attack surface), non-root user, the git SHA is baked into the binary so /healthz can report the running version, and BuildKit’s cache mount keeps Go module downloads off the critical path.

Build hashing and image tagging in a Makefile

# Makefile — the same logic CI runs, runnable locally for parity
SHELL      := /bin/bash
SERVICE    := orders
REGISTRY   := ghcr.io/acme
GIT_SHA    := $(shell git rev-parse --short=8 HEAD)
DIRTY      := $(shell git diff --quiet || echo "-dirty")
IMAGE      := $(REGISTRY)/$(SERVICE):$(GIT_SHA)$(DIRTY)

.PHONY: build image push sign sbom
build:
	go build -o bin/$(SERVICE) ./cmd/$(SERVICE)

image:
	docker buildx build \
	  --platform linux/amd64,linux/arm64 \
	  --build-arg GIT_SHA=$(GIT_SHA) \
	  -t $(IMAGE) \
	  --push .

sign:
	cosign sign --yes $(REGISTRY)/$(SERVICE)@$$(crane digest $(IMAGE))

sbom:
	syft $(IMAGE) -o spdx-json > sbom-$(GIT_SHA).json
	cosign attach sbom --sbom sbom-$(GIT_SHA).json $(IMAGE)
	trivy image --severity HIGH,CRITICAL --exit-code 1 $(IMAGE)

Never deploy untagged or unsigned images

An untagged image — pushed without a SHA, or with only latest — cannot be rolled back, audited, or correlated to a commit. An unsigned image is one supply-chain attack away from running an attacker’s code with your service account. In production: enforce both at the admission controller. The pipeline should not be allowed to deploy an image the admission policy would reject.

Test Layers in CI

Why the Pyramid Shifts

The Problem: The classic test pyramid — lots of unit tests, some integration, very few end-to-end — still applies, but in a microservices world the most expensive failures live in the seams between services. Pure unit tests do not catch a contract drift.

The Solution: Add a contract-test layer. Each consumer publishes its expectations of each provider; providers verify those expectations in their own pipelines. The end-to-end suite shrinks to a handful of true smoke tests.

Layer	What runs	Where it runs	Speed budget
Unit	Pure functions, mocked I/O	Every commit	< 2 min
Integration	Service + real Postgres / Kafka / Redis via testcontainers	Every commit	< 5 min
Contract	Pact verifications: this provider satisfies these consumer expectations	Every commit on provider; broker-triggered on consumer change	< 3 min
End-to-end smoke	5–20 critical user journeys against a deployed env	Post-deploy	< 10 min
Load / soak	k6 or Gatling against staging	Nightly or pre-release	Hours

Integration test with testcontainers

// Java + JUnit + Testcontainers — real Postgres in CI, no fixtures, no mocks
import org.testcontainers.containers.PostgreSQLContainer;
import org.testcontainers.junit.jupiter.Container;
import org.testcontainers.junit.jupiter.Testcontainers;

@Testcontainers
class OrderRepositoryIT {

    @Container
    static PostgreSQLContainer<?> pg = new PostgreSQLContainer<>("postgres:16-alpine")
        .withDatabaseName("orders")
        .withUsername("app")
        .withPassword("app");

    @Test
    void persistsAndReadsBack() {
        var repo = new OrderRepository(pg.getJdbcUrl(), pg.getUsername(), pg.getPassword());
        var id = repo.save(new Order("sku-1", 2));
        assertEquals(2, repo.findById(id).quantity());
    }
}

Consumer-driven contracts with Pact

# orders consumer publishes a pact — “I expect inventory to respond like this”
# Pact JSON, abbreviated:
{
  "consumer": { "name": "orders" },
  "provider": { "name": "inventory" },
  "interactions": [{
    "description": "a stock check for sku-1",
    "request":  { "method": "GET", "path": "/v1/stock/sku-1" },
    "response": {
      "status": 200,
      "body": { "sku": "sku-1", "available": 42 }
    }
  }]
}

The consumer ships its pact to a Pact Broker. The provider’s pipeline pulls every published pact and verifies its current build satisfies them. If the provider would break a consumer, the provider’s build fails — before the bad image is pushed. This is how independent deployability survives contact with reality.

End-to-end tests are not your safety net

A full end-to-end suite that boots all 50 services is slow, flaky, and expensive. Use it for a handful of true journeys: signup, checkout, payment. Push everything else down to contract and integration tests, where the failure mode is fast and clearly attributed to a single service.

Continuous Deployment vs. Continuous Delivery

Why the Distinction Matters

The Problem: The terms get used interchangeably. They are not the same. The difference determines who gets paged at 3 AM.

The Solution: Continuous Delivery — every commit is releasable; a human approves the prod push. Continuous Deployment — every commit that passes the pipeline goes to production unattended. Most orgs run delivery for prod and deployment for lower envs.

The promotion path most mature teams converge on:

# Same image, different envs — promote, don’t rebuild.
dev    <-- auto-deploy on every merge to main
stage  <-- auto-deploy after dev smoke passes
prod   <-- manual approval (CD-as-delivery)  OR
            auto-deploy with progressive rollout (CD-as-deployment)

Whichever you pick, the four numbers worth tracking are the DORA metrics — from years of Accelerate and the State of DevOps Report:

Metric	Definition	Elite	Low
Deployment frequency	How often you ship to prod	On demand (multiple per day)	Less than monthly
Lead time for changes	Commit to prod	< 1 hour	1–6 months
Change failure rate	% of deploys causing an incident	0–15%	> 30%
MTTR	Time to restore after incident	< 1 hour	> 1 week

The trap is optimizing one number at the expense of another. A team can hit “deploys per day = 100” by removing all gates and accept a 60% change failure rate. That is not elite; it is an outage factory. Move all four together.

A complete GitHub Actions workflow

# .github/workflows/orders.yml
name: orders
on:
  push:
    branches: [main]
    paths: ['services/orders/**']

env:
  REGISTRY: ghcr.io/acme
  IMAGE: ghcr.io/acme/orders

jobs:
  build-test-push:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write     # for cosign keyless signing
    steps:
      - uses: actions/checkout@v4

      - name: Setup Go
        uses: actions/setup-go@v5
        with: { go-version: '1.22' }

      - name: Unit tests
        working-directory: services/orders
        run: go test ./... -race -count=1

      - name: Set image tag
        id: tag
        run: echo "sha=$(git rev-parse --short=8 HEAD)" >> $GITHUB_OUTPUT

      - uses: docker/setup-buildx-action@v3

      - name: Login to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build & push (multi-arch)
        uses: docker/build-push-action@v6
        with:
          context: services/orders
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ env.IMAGE }}:${{ steps.tag.outputs.sha }}
          build-args: GIT_SHA=${{ steps.tag.outputs.sha }}

      - name: Trivy scan
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE }}:${{ steps.tag.outputs.sha }}
          severity: HIGH,CRITICAL
          exit-code: '1'

      - name: Cosign sign (keyless)
        run: cosign sign --yes ${{ env.IMAGE }}:${{ steps.tag.outputs.sha }}

      - name: Bump GitOps repo
        uses: peter-evans/repository-dispatch@v3
        with:
          token: ${{ secrets.GITOPS_TOKEN }}
          repository: acme/gitops
          event-type: image-update
          client-payload: '{"service":"orders","tag":"${{ steps.tag.outputs.sha }}"}'

Notice that this pipeline never runs kubectl apply against a cluster. It builds, tests, scans, signs, pushes, and then sends an event to the GitOps repo. The actual deployment is a separate concern — which is the next section.

GitOps and Declarative Deploys

Why Push Mode Doesn’t Scale

The Problem: Pipelines that kubectl apply directly into a cluster need wide cluster credentials, leak permissions to CI runners, and have no record of what should be in the cluster vs. what is.

The Solution: GitOps. Git is the source of truth for desired state. A controller in the cluster (Argo CD or Flux) pulls from Git and reconciles. The pipeline only writes Git; it never touches the cluster.

The flow becomes:

CI builds and pushes orders:9c4a7b2 to the registry.
CI opens a PR (or commits directly) to a GitOps repo, bumping the image tag in orders/values.yaml.
A reviewer (or auto-merge bot) merges the PR.
Argo CD or Flux notices the Git change within ~1 minute and reconciles the cluster — new pods come up, old ones drain.
If the deploy goes wrong, rollback is git revert. The cluster catches up automatically.

An Argo CD Application manifest

# gitops/apps/orders.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: orders
  namespace: argocd
spec:
  project: commerce
  source:
    repoURL:  https://github.com/acme/gitops.git
    path:     services/orders
    targetRevision: main
    helm:
      valueFiles:
        - values.yaml
        - values-prod.yaml
  destination:
    server:    https://kubernetes.default.svc
    namespace: commerce
  syncPolicy:
    automated:
      prune:    true     # delete resources removed from Git
      selfHeal: true     # revert manual cluster edits
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 5
      backoff: { duration: 5s, factor: 2, maxDuration: 3m }

selfHeal: true is the line that turns drift into a non-event. Someone kubectl edits a deployment in prod? Argo notices within a minute and reverts it. The cluster is no longer where state lives — Git is.

The four GitOps principles

Declarative — the entire system state is described as data, not commands.
Versioned and immutable — every change is a Git commit.
Pulled automatically — an in-cluster agent pulls from Git; nothing pushes into the cluster.
Continuously reconciled — the agent constantly compares desired (Git) and observed (cluster) state and converges them.

The GitOps repo is a production system

It deserves the same care as application code: branch protection, code review, signed commits, audit log. A merge to the GitOps repo is a deploy. If anyone with write access can merge unreviewed, you have given them cluster-admin with extra steps.

Real-World Examples

Spotify organizes around squads — small autonomous teams that own services end-to-end. Backstage, their internal developer portal (now CNCF), provides “golden paths” — opinionated templates that scaffold a new service with a tested pipeline, observability hookup, and on-call rotation pre-wired. The cost of starting a new service is “run the template,” which is the only way an org of their size avoids snowflake services.

Netflix built Spinnaker as their continuous delivery platform. Spinnaker treats deployments as multi-stage pipelines with built-in support for canaries (Kayenta), traffic shifting, automated rollback on metric regression, and multi-region/multi-cloud orchestration. Every Netflix service-to-prod path runs through Spinnaker; the platform team owns the pipeline so the product teams don’t each reinvent it.

GitHub ships GitHub itself with GitHub Actions. The matrix-build pattern — one workflow, many parameter combinations — lets a single YAML file fan out across services, OSes, and language versions. For polyrepo orgs, reusable workflows (uses: acme/.github/.github/workflows/build.yml@main) provide the centralized template Spotify gets from Backstage.

Google runs Bazel internally on a hermetic build graph — every input is content-addressed, every action is cacheable, every test result is reproducible. The remote cache means a CI build that would take an hour cold completes in minutes warm. The same rigor is what powers Borg deploys: every binary in production is traceable to the exact source revision, with the SBOM and the build provenance attached.

Amazon built Apollo (internal) and CodePipeline / CodeDeploy (AWS-facing) to enable the “you build it, you run it” model. The platform supplies pipelines, deployment safety, monitoring and rollback; the team supplies the service. This is the same shape every mature org converges on — a small platform team multiplied by hundreds of product teams that consume the platform.

Other ecosystems worth knowing: GitLab CI for orgs that want pipeline, registry, and SCM in one product; Jenkins with shared libraries for legacy/on-prem environments; Tekton as the Kubernetes-native pipeline primitive that other tools (CD Foundation’s Pipelines as Code, Jenkins X) build on.

Best Practices

The short list

One pipeline per service. Shared mega-pipelines are the seed of a distributed monolith.
Tag images by git SHA, never latest. Pin by digest in production manifests.
Sign every image with cosign. Reject unsigned images at the admission controller.
Generate an SBOM with Syft and scan with Trivy. Fail the build on high/critical CVEs in your code.
Run integration tests with testcontainers, not mocks. A mocked Postgres tests your mock, not your code.
Use Pact for inter-service contracts. The provider build fails before the bad image reaches the registry.
Adopt GitOps for deploys. Argo CD or Flux. The pipeline writes Git; the cluster pulls.
Track DORA metrics on a dashboard. Move all four together; do not optimize one in isolation.
Build a paved road, not a recommendation. A template every team copies is worth more than a wiki page nobody reads.
Make rollback boring. If git revert doesn’t restore prod within minutes, your pipeline is broken.

The single most useful sentence about CI/CD

The pipeline is the only system every deployment touches. Invest in it the way you invest in production — tests, observability, on-call, postmortems — because in a microservices org the pipeline is production’s control plane.