API Gateway | LIZIU Microservices

Why API Gateways Matter

The Problem: Without a gateway, every client — mobile app, web SPA, partner integration, IoT device — has to know about every service. That means hardcoded hostnames, duplicated auth logic in every client, N×M CORS headaches, and a deploy of all your apps every time you split or rename a service.

The Solution: A single front door. Clients talk to one URL. The gateway handles authentication, routing, rate limits, retries, and the boring plumbing once — not in every client and not in every service.

Real Impact: Netflix routes billions of requests per day through its edge gateway (originally Zuul, now Spring Cloud Gateway). The gateway is the reason a UI engineer doesn’t need to know that there are 700 backend services behind it.

Real-World Analogy

Think of a hotel concierge desk:

Single front door — guests don’t wander into the kitchen, the laundry room, or the boiler — they go to the desk.
Routes guests to the right room — the concierge knows which staff handles which request and forwards accordingly.
Handles common needs once — checking ID, taking payment, logging visits — the rooms don’t each implement check-in.
Shields the back of house — if housekeeping reorganizes, guests notice nothing because the desk absorbs the change.

An API gateway plays the concierge role for your services. Without one, every guest needs the staff directory, a key to every door, and a credit card reader.

The pain a gateway solves is mostly invisible until you’ve felt it. Picture a 30-service backend with no edge layer. The mobile team needs to call 12 of them to render a home screen. Each one has its own DNS name, its own auth header conventions, its own way of paginating, its own preferred error format. The mobile app ships with a list of 12 hostnames hardcoded into a config — and the day you split orders into orders and order-history, every installed copy of the mobile app needs an update.

That’s the before. The after is one host, one auth scheme, one error envelope, one rate limit, one observability story. Every service-shaped change is invisible to clients because the gateway absorbs it.

What you avoid by not having one

Auth-per-service: Each team rolls their own JWT validation. One of them gets the signature check wrong. You don’t find out until the post-mortem.
CORS forever: Browsers preflight every cross-origin call. Multiply that across services. Now multiply it across environments.
Client coupling to topology: Splitting a service requires coordinating a release with every consumer. Backwards compatibility becomes a treadmill.
Inconsistent observability: Some services log JSON, some log text. Latency histograms are bucketed differently. Tracing is patchy.
No throttle: A misbehaving client can hammer one service into the ground because nothing at the edge says “slow down.”

What an API Gateway Does

The gateway is doing a small number of things, repeatedly, very fast. None of these are exotic on their own — the value is that they all live in one place and ship as one configuration.

Responsibility	What it actually does
Request routing	Match an incoming path/host/header to an upstream service and forward.
Authentication	Validate JWTs, exchange OAuth tokens, check API keys — before traffic touches a service.
Authorization	Coarse-grained policy: is this token allowed to call this route at all?
Rate limiting	Per-user, per-IP, per-API-key quotas. Reject overages with 429 fast.
Request/response transformation	Strip headers, rewrite paths, convert protocols (REST ↔ gRPC, JSON ↔ XML).
Aggregation	Compose multiple downstream calls into a single client response when needed.
Observability	Emit access logs, metrics, and tracing spans for every request — with consistent labels.
SSL/TLS termination	Decrypt at the edge, talk plaintext (or mTLS) to internal services.
Caching	Serve idempotent GETs from edge cache where it’s safe.

What an API gateway should not do

Never put business logic in the gateway. The moment your gateway starts computing pricing, applying discounts, or deciding who owns a record, you have built a distributed monolith with a fancy reverse proxy. Anything domain-specific belongs in a service. The gateway should be boring.

Gateway vs BFF (Backend for Frontend)

Why Two Patterns Coexist

The Problem: A single shared gateway tries to please mobile, web, partners, and internal tooling all at once. Mobile wants tiny payloads to save bandwidth. Web wants larger ones with embedded relations. Partners want stable contracts. The gateway becomes the place every team fights over.

The Solution: Sam Newman’s Backend for Frontend pattern says: each client type gets its own gateway. The mobile BFF talks to the same backend services as the web BFF, but it shapes responses for its own client’s needs. No more compromise contracts.

The shared API gateway and the BFF are not opposites — they’re two points on the same spectrum, and many production systems use both.

Aspect	Single Shared Gateway	Backend for Frontend (BFF)
Owners	Platform team	Each client team owns its BFF
Audience	All clients	One client type (mobile, web, partner)
Response shape	Generic, lowest common denominator	Tailored to one client’s screens/needs
Release cadence	Slow — affects everyone	Fast — in step with client releases
Fan-out logic	Discouraged (keep it boring)	Expected (the BFF aggregates)
When to use	You need a stable edge for many clients	Two client types diverge enough to hurt

A common production pattern: a thin shared gateway at the very edge handles auth, TLS termination, and global rate limits. Behind it sit per-client BFFs that handle aggregation and shaping. Mobile traffic goes edge → mobile-bff → services; web traffic goes edge → web-bff → services. Partners often get their own — with different rate limits and a stricter contract.

How to know you need a BFF

Mobile and web teams routinely argue about response shape in PR review.
You’re returning fields just so one client can ignore them.
The gateway has client-specific branches (if user-agent contains ‘Mobile’).
Mobile releases are blocked behind backend deploys.
Partner clients need different auth, different rate limits, different SLAs from your own apps.

Routing Strategies

Routing is the gateway’s simplest job and the one you’ll touch most often. Five flavors cover almost everything:

Strategy	How it matches	Typical use
Path-based	`/orders/*` → orders service	The default. Clean, predictable, easy to debug.
Host-based	`api.example.com` vs `partners.example.com`	Multi-tenant; isolating partner traffic.
Header-based	`X-Tenant: acme` → tenant-specific upstream	SaaS with per-tenant isolation; A/B cohorts.
Weighted (canary)	95% to v1, 5% to v2	Rolling out a new release behind the gateway.
A/B / cohort	Hash user-id → bucket	Sticky experiments — same user, same variant.

A Kong declarative route

# kong.yaml — declarative config, version-controlled
_format_version: "3.0"

services:
  - name: orders-service
    url: http://orders.svc.cluster.local:8080
    routes:
      - name: orders-route
        paths:
          - /orders
        strip_path: false
        methods: [GET, POST, PUT, DELETE]
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 600
          policy: redis

  - name: payments-service
    url: http://payments.svc.cluster.local:8080
    routes:
      - name: payments-route
        paths:
          - /payments
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 60     # tighter limit on payment writes
          policy: redis

Notice the configuration is declarative — no code, no imperative steps. The gateway is the executor; this file is the truth. That’s the property you want: the gateway behaves the same way wherever you run it from this file.

An Envoy route fragment

# envoy.yaml route_config — weighted routing for a canary
route_config:
  name: api_routes
  virtual_hosts:
    - name: api
      domains: ["api.example.com"]
      routes:
        - match:
            prefix: "/orders"
          route:
            weighted_clusters:
              clusters:
                - name: orders_v1
                  weight: 95
                - name: orders_v2
                  weight: 5          # 5% canary
            timeout: 2s
            retry_policy:
              retry_on: "5xx,reset,connect-failure"
              num_retries: 2
              per_try_timeout: 800ms
        - match:
            prefix: "/payments"
            headers:
              - name: "x-tenant"
                exact_match: "acme"
          route:
            cluster: payments_acme       # tenant-isolated upstream
        - match:
            prefix: "/payments"
          route:
            cluster: payments_default

Cross-Cutting Concerns at the Edge

Why At The Edge

The Problem: Auth, rate limiting, CORS, and request validation are concerns that every service has. Implementing them N times means N inconsistencies and N bugs.

The Solution: Push them to the edge. The service trusts that anything that arrived already passed the gate. The service code shrinks; the security posture improves; the shared concerns become one team’s responsibility.

Authentication

Two flavors dominate at the edge:

JWT validation — the gateway holds the public key (or a JWKS URL), checks the signature, validates exp / iss / aud, and forwards claims to the upstream as headers.
OAuth introspection — the gateway calls the auth server’s /introspect endpoint per request (or per cache window) to learn whether a token is still valid. Slower but supports revocation.

# Kong JWT plugin — validate signatures, then forward claims
plugins:
  - name: jwt
    config:
      key_claim_name: iss
      claims_to_verify:
        - exp                           # reject expired
      maximum_expiration: 3600           # reject > 1h tokens
      header_names: ["Authorization"]
      uri_param_names: []                # no tokens in query strings
      cookie_names: []
      run_on_preflight: false            # skip OPTIONS
  - name: request-transformer
    config:
      add:
        headers:
          - "X-User-Id:$(jwt.claim.sub)"
          - "X-Tenant:$(jwt.claim.tenant)"
      remove:
        headers:
          - "Authorization"             # don’t leak tokens to upstream

That last bit matters. The internal service should trust the headers the gateway forwards (because nothing else can reach it) and never see the raw bearer token. Token handling stays at the edge; auth claims propagate inward as plain identifiers.

Rate limiting

Limit dimension	Use it for
Per IP	Crude DDoS defense; anonymous traffic.
Per API key	Partner / B2B contracts — the meter for billing tiers.
Per user (sub claim)	Authenticated traffic; per-account quotas.
Per route	Tighter limits on writes (POST/DELETE) than reads.
Per service (global)	Hard ceiling so one runaway upstream doesn’t take everything down.

Use a distributed store — Redis is the standard — so a multi-instance gateway counts requests across all replicas. A local in-memory limiter on each instance silently allows N× your intended limit, where N is the replica count.

CORS, body limits, schema validation

CORS — one allowlist at the gateway, not per-service. Preflight responses cached aggressively.
Body size limits — a hard cap (e.g. 1 MB) at the edge defends every upstream from oversized bodies.
Schema validation — reject requests that don’t match the OpenAPI/JSON Schema before they ever touch your service. Cheap to do at the edge; valuable to remove from every service.

Aggregation and Fan-Out

Why Aggregation Is Tempting and Dangerous

The Problem: The mobile home screen needs data from 6 services. The naive answer is “the gateway calls all 6 and merges the result.” The naive answer also turns one client request into 6 backend dependencies, where the slowest one decides your latency and any one failure becomes your failure.

The Solution: Aggregate when it pays off — saves round trips on slow networks — but treat fan-out as a serious distributed-systems problem with timeouts, partial-result handling, and circuit breakers per leg.

A simple aggregation handler in pseudocode:

// Express-style BFF aggregation for the mobile home screen
app.get('/home', async (req, res) => {
    const userId = req.user.id;

    // Fan out in parallel; each leg has its own timeout + breaker
    const [profile, orders, recs, balance] = await Promise.allSettled([
        userClient.getProfile(userId),       // required
        ordersClient.recent(userId, 5),       // required
        recsClient.forUser(userId),           // optional — ok to fail
        walletClient.balance(userId),         // optional — ok to fail
    ]);

    if (profile.status === 'rejected' || orders.status === 'rejected') {
        return res.status(503).json({ error: 'home_unavailable' });
    }

    res.json({
        profile: profile.value,
        orders:  orders.value,
        recs:    recs.status    === 'fulfilled' ? recs.value    : null,
        balance: balance.status === 'fulfilled' ? balance.value : null,
    });
});

Synchronous fan-out is a multiplier of failure

If each downstream is 99.9% available and you call 6 of them serially with no fallback, your aggregated endpoint is 99.4% available. Add timeouts, retries with no jitter, and a sick downstream, and a single bad service can degrade your entire home screen. Always classify each leg as required or optional, fail soft on optional ones, and circuit-break each leg independently.

GraphQL as an alternative

GraphQL gateways — Apollo Gateway, GraphQL federation, GitHub’s public GraphQL API — turn aggregation into a first-class concern. The client sends one query specifying exactly the fields it wants; the gateway plans the resolver calls, fans out, and stitches the response. The win: clients can reshape responses without backend changes. The cost: a query planner is now in your hot path, and an over-eager client query can fan out into a denial-of-service against your own services. Guard with query cost analysis and persisted queries.

Gateway Tools Comparison

Tool	Origin	Strengths	Watch out for	When to pick
Kong	Nginx + Lua, OSS & enterprise	Huge plugin ecosystem; declarative config; battle-tested	Plugin quality varies; enterprise features pricey	Plugin-rich, multi-team enterprise edge
Envoy	Lyft, CNCF graduated	L7 features, dynamic xDS config, foundation of Istio	Steep learning curve; YAML voluminous	Service mesh, sophisticated traffic shaping
AWS API Gateway	AWS managed	Zero ops; deep IAM/Lambda integration; auto-scale	Cold starts; per-request pricing; AWS lock-in	Serverless on AWS; low-traffic public APIs
Apigee	Google Cloud	Developer portal, monetization, full API lifecycle	Enterprise pricing; heavyweight	Public API products with billing tiers
Tyk	OSS Go gateway	Lightweight, multi-data-center, dashboard included	Smaller community than Kong	OSS gateway with good UX out of the box
Nginx	OSS reverse proxy	Universal, fast, well understood	Plugins via Lua/njs; not API-aware by default	Simple routing/TLS termination; teams already on Nginx
Traefik	OSS, container-native	Automatic service discovery in Docker/K8s; great defaults	Less mature plugin ecosystem	Container-first stacks; quick start on K8s
Spring Cloud Gateway	Pivotal/VMware, JVM	Native Spring integration; reactive; Netflix Zuul successor	JVM resource footprint	Spring shops; replacing Zuul

An AWS API Gateway resource

# SAM/CloudFormation snippet — HTTP API with JWT authorizer
Resources:
  Api:
    Type: AWS::Serverless::HttpApi
    Properties:
      Auth:
        DefaultAuthorizer: JwtAuth
        Authorizers:
          JwtAuth:
            JwtConfiguration:
              issuer: https://auth.example.com/
              audience:
                - api.example.com
            IdentitySource: "$request.header.Authorization"
      RouteSettings:
        "POST /orders":
          ThrottlingBurstLimit: 100
          ThrottlingRateLimit:  50
        "GET /catalog/{proxy+}":
          ThrottlingBurstLimit: 5000
          ThrottlingRateLimit:  2000

  OrdersFn:
    Type: AWS::Serverless::Function
    Properties:
      Handler: orders.handler
      Events:
        CreateOrder:
          Type: HttpApi
          Properties:
            ApiId: !Ref Api
            Path: /orders
            Method: POST

Operational Concerns

Why The Edge Needs Extra Care

The Problem: Every request goes through the gateway. If it’s down, your platform is down — even if every backend service is healthy.

The Solution: Treat the gateway as your most critical service. Multiple replicas, multiple AZs, blue/green deploys, separate change-management process from regular service deploys, and explicit fallback runbooks for the day it misbehaves.

The single-point-of-failure problem

A gateway is by definition a fan-in point. The standard mitigations:

N+1 replicas behind a load balancer — never run a single instance.
Multi-AZ — replicas in at least two availability zones, with health checks that fail traffic over.
Stateless — the gateway should hold no per-request state in memory; rate-limit counters and session caches live in Redis.
Capacity headroom — size for 2–3× peak. The day you need it, you really need it.
Bypass plan — for the rare disaster scenario, document how an internal team can call services directly. You hope to never use it.

Version skew with downstreams

The gateway and its upstream services rarely deploy at exactly the same instant. A change to a route prefix or a header convention has to land in both. The pattern that works: change is additive on both sides, deploy each side in arbitrary order, then remove the old behavior in a follow-up release. Never make a destructive change in one place that requires the other side to deploy at the same moment.

The gateway-as-monolith risk

The drift to gateway monolith

It starts innocently: “just put the user-id parsing in the gateway, every service needs it.” Six months later the gateway is parsing JSON bodies, calling out to enrichment services, applying region-specific business rules, and rendering response templates. Now every change requires a gateway deploy and a 3-team approval process. Resist this drift. The gateway should grow no faster than the platform team can own it.

Observability for the edge itself

requests_per_second by route — baseline traffic shape.
latency_p50/p95/p99 by route — the gateway adds 1–5 ms of overhead. Watch for drift.
upstream_error_rate by service — isolates whether a 5xx came from the gateway or behind it.
auth_failures — spikes mean misconfigured clients or an attack.
rate_limited_total — spikes mean a client (or attacker) hit a quota.
active_connections — sudden climbs mean upstreams are slow and the queue is filling.

Real-World Examples

Netflix Zuul → Spring Cloud Gateway. Netflix built Zuul as the front door to their cloud edge. Zuul 1 was synchronous and blocking; Zuul 2 introduced a reactive, non-blocking model. The patterns Netflix codified — dynamic routing, request and response filters, integrated circuit breakers (with Hystrix originally) — are now industry-standard. The community has largely migrated to Spring Cloud Gateway, which is the spiritual successor on the JVM.

Amazon API Gateway. AWS’s managed offering ties API routes directly to Lambda, Step Functions, or VPC services. Pay-per-request pricing, built-in throttling, IAM-based auth, JWT authorizers, and WebSocket support. The trade-off is the typical managed-service trade: you trade flexibility for not having to operate it.

GitHub’s GraphQL API. GitHub publishes a single GraphQL endpoint backing essentially their entire product surface. The GraphQL gateway plans queries across many internal services and assembles a single response. Cost analysis and rate limits are computed in “points” — a complex query consumes more budget than a simple one. This is the GraphQL-as-gateway pattern at industrial scale.

Twitter’s Finagle. Twitter’s Finagle library blurs the gateway/RPC boundary — it provides routing, load balancing, retries, circuit breakers, and observability as composable filters around any service-to-service call. Many of the patterns now standard in API gateways were validated at Twitter scale through Finagle first.

Kong at Cisco / Yahoo / Expedia. Kong’s public reference list reads like a who’s-who of large web platforms. The common shape: a Kong fleet behind a cloud load balancer, plugins for JWT and rate limiting, declarative config in Git, multi-region for residency.

Best Practices

The short list

Keep the gateway boring. Routing, auth, rate limit, observability. No business logic. Ever.
Declarative config in Git. The gateway is infrastructure; treat its config like infrastructure — reviewed, versioned, replayable.
One auth mechanism per consumer class. JWT for first-party apps, API keys for partners, mTLS for service-to-service. Don’t mix.
Strip the bearer token before forwarding. Let the upstream see X-User-Id and other parsed claims; never the raw token.
Distributed rate limit state. Use Redis (or the gateway’s native equivalent). Local counters lie when you scale out.
Hard body-size limits at the edge. Defends every upstream from a memory-hungry payload.
Per-route timeouts, retries, circuit breakers. A bad upstream should fail fast, not drag down the gateway.
Treat aggregation as a distributed-systems problem. Required vs optional legs, partial-result handling, breakers per leg.
Two replicas per AZ minimum, multi-AZ. The gateway is your top-priority HA target.
Run the BFF when client needs diverge. Don’t bend a shared gateway to fit one client’s screens.
Cap GraphQL query cost. If you expose GraphQL, make over-fetching expensive enough that it can’t happen accidentally.
Have a bypass runbook. The day the gateway misbehaves, you want a known way to get traffic to services directly.

The single most useful sentence about API gateways

The gateway exists so that everything else can stay simple. The moment it starts taking on the complexity its services were supposed to handle, you have made the platform worse, not better. When in doubt: push the logic down, keep the edge thin.