API Gateway

The single front door to a fleet of services. Done well, it absorbs the cross-cutting concerns — auth, rate limiting, routing, observability — so each service can stay focused on its job. Done poorly, it becomes a distributed monolith with a fancy logo.

Medium25 min read

Why API Gateways Matter

Why API Gateways Matter

The Problem: Without a gateway, every client — mobile app, web SPA, partner integration, IoT device — has to know about every service. That means hardcoded hostnames, duplicated auth logic in every client, N×M CORS headaches, and a deploy of all your apps every time you split or rename a service.

The Solution: A single front door. Clients talk to one URL. The gateway handles authentication, routing, rate limits, retries, and the boring plumbing once — not in every client and not in every service.

Real Impact: Netflix routes billions of requests per day through its edge gateway (originally Zuul, now Spring Cloud Gateway). The gateway is the reason a UI engineer doesn’t need to know that there are 700 backend services behind it.

Real-World Analogy

Think of a hotel concierge desk:

  • Single front door — guests don’t wander into the kitchen, the laundry room, or the boiler — they go to the desk.
  • Routes guests to the right room — the concierge knows which staff handles which request and forwards accordingly.
  • Handles common needs once — checking ID, taking payment, logging visits — the rooms don’t each implement check-in.
  • Shields the back of house — if housekeeping reorganizes, guests notice nothing because the desk absorbs the change.

An API gateway plays the concierge role for your services. Without one, every guest needs the staff directory, a key to every door, and a credit card reader.

The pain a gateway solves is mostly invisible until you’ve felt it. Picture a 30-service backend with no edge layer. The mobile team needs to call 12 of them to render a home screen. Each one has its own DNS name, its own auth header conventions, its own way of paginating, its own preferred error format. The mobile app ships with a list of 12 hostnames hardcoded into a config — and the day you split orders into orders and order-history, every installed copy of the mobile app needs an update.

That’s the before. The after is one host, one auth scheme, one error envelope, one rate limit, one observability story. Every service-shaped change is invisible to clients because the gateway absorbs it.

What you avoid by not having one

What an API Gateway Does

The gateway is doing a small number of things, repeatedly, very fast. None of these are exotic on their own — the value is that they all live in one place and ship as one configuration.

Client → Gateway → Services Client mobile / web / partner API Gateway auth · routing · rate limit TLS · logging · tracing transform · aggregation single front door orders REST /orders/* payments gRPC payments.v1 catalog REST /catalog/*
ResponsibilityWhat it actually does
Request routingMatch an incoming path/host/header to an upstream service and forward.
AuthenticationValidate JWTs, exchange OAuth tokens, check API keys — before traffic touches a service.
AuthorizationCoarse-grained policy: is this token allowed to call this route at all?
Rate limitingPer-user, per-IP, per-API-key quotas. Reject overages with 429 fast.
Request/response transformationStrip headers, rewrite paths, convert protocols (REST ↔ gRPC, JSON ↔ XML).
AggregationCompose multiple downstream calls into a single client response when needed.
ObservabilityEmit access logs, metrics, and tracing spans for every request — with consistent labels.
SSL/TLS terminationDecrypt at the edge, talk plaintext (or mTLS) to internal services.
CachingServe idempotent GETs from edge cache where it’s safe.

What an API gateway should not do

Never put business logic in the gateway. The moment your gateway starts computing pricing, applying discounts, or deciding who owns a record, you have built a distributed monolith with a fancy reverse proxy. Anything domain-specific belongs in a service. The gateway should be boring.

Gateway vs BFF (Backend for Frontend)

Why Two Patterns Coexist

The Problem: A single shared gateway tries to please mobile, web, partners, and internal tooling all at once. Mobile wants tiny payloads to save bandwidth. Web wants larger ones with embedded relations. Partners want stable contracts. The gateway becomes the place every team fights over.

The Solution: Sam Newman’s Backend for Frontend pattern says: each client type gets its own gateway. The mobile BFF talks to the same backend services as the web BFF, but it shapes responses for its own client’s needs. No more compromise contracts.

The shared API gateway and the BFF are not opposites — they’re two points on the same spectrum, and many production systems use both.

AspectSingle Shared GatewayBackend for Frontend (BFF)
OwnersPlatform teamEach client team owns its BFF
AudienceAll clientsOne client type (mobile, web, partner)
Response shapeGeneric, lowest common denominatorTailored to one client’s screens/needs
Release cadenceSlow — affects everyoneFast — in step with client releases
Fan-out logicDiscouraged (keep it boring)Expected (the BFF aggregates)
When to useYou need a stable edge for many clientsTwo client types diverge enough to hurt

A common production pattern: a thin shared gateway at the very edge handles auth, TLS termination, and global rate limits. Behind it sit per-client BFFs that handle aggregation and shaping. Mobile traffic goes edge → mobile-bff → services; web traffic goes edge → web-bff → services. Partners often get their own — with different rate limits and a stricter contract.

How to know you need a BFF

  • Mobile and web teams routinely argue about response shape in PR review.
  • You’re returning fields just so one client can ignore them.
  • The gateway has client-specific branches (if user-agent contains ‘Mobile’).
  • Mobile releases are blocked behind backend deploys.
  • Partner clients need different auth, different rate limits, different SLAs from your own apps.

Routing Strategies

Routing is the gateway’s simplest job and the one you’ll touch most often. Five flavors cover almost everything:

StrategyHow it matchesTypical use
Path-based/orders/* → orders serviceThe default. Clean, predictable, easy to debug.
Host-basedapi.example.com vs partners.example.comMulti-tenant; isolating partner traffic.
Header-basedX-Tenant: acme → tenant-specific upstreamSaaS with per-tenant isolation; A/B cohorts.
Weighted (canary)95% to v1, 5% to v2Rolling out a new release behind the gateway.
A/B / cohortHash user-id → bucketSticky experiments — same user, same variant.

A Kong declarative route

# kong.yaml — declarative config, version-controlled
_format_version: "3.0"

services:
  - name: orders-service
    url: http://orders.svc.cluster.local:8080
    routes:
      - name: orders-route
        paths:
          - /orders
        strip_path: false
        methods: [GET, POST, PUT, DELETE]
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 600
          policy: redis

  - name: payments-service
    url: http://payments.svc.cluster.local:8080
    routes:
      - name: payments-route
        paths:
          - /payments
    plugins:
      - name: jwt
      - name: rate-limiting
        config:
          minute: 60     # tighter limit on payment writes
          policy: redis

Notice the configuration is declarative — no code, no imperative steps. The gateway is the executor; this file is the truth. That’s the property you want: the gateway behaves the same way wherever you run it from this file.

An Envoy route fragment

# envoy.yaml route_config — weighted routing for a canary
route_config:
  name: api_routes
  virtual_hosts:
    - name: api
      domains: ["api.example.com"]
      routes:
        - match:
            prefix: "/orders"
          route:
            weighted_clusters:
              clusters:
                - name: orders_v1
                  weight: 95
                - name: orders_v2
                  weight: 5          # 5% canary
            timeout: 2s
            retry_policy:
              retry_on: "5xx,reset,connect-failure"
              num_retries: 2
              per_try_timeout: 800ms
        - match:
            prefix: "/payments"
            headers:
              - name: "x-tenant"
                exact_match: "acme"
          route:
            cluster: payments_acme       # tenant-isolated upstream
        - match:
            prefix: "/payments"
          route:
            cluster: payments_default

Cross-Cutting Concerns at the Edge

Why At The Edge

The Problem: Auth, rate limiting, CORS, and request validation are concerns that every service has. Implementing them N times means N inconsistencies and N bugs.

The Solution: Push them to the edge. The service trusts that anything that arrived already passed the gate. The service code shrinks; the security posture improves; the shared concerns become one team’s responsibility.

Authentication

Two flavors dominate at the edge:

# Kong JWT plugin — validate signatures, then forward claims
plugins:
  - name: jwt
    config:
      key_claim_name: iss
      claims_to_verify:
        - exp                           # reject expired
      maximum_expiration: 3600           # reject > 1h tokens
      header_names: ["Authorization"]
      uri_param_names: []                # no tokens in query strings
      cookie_names: []
      run_on_preflight: false            # skip OPTIONS
  - name: request-transformer
    config:
      add:
        headers:
          - "X-User-Id:$(jwt.claim.sub)"
          - "X-Tenant:$(jwt.claim.tenant)"
      remove:
        headers:
          - "Authorization"             # don’t leak tokens to upstream

That last bit matters. The internal service should trust the headers the gateway forwards (because nothing else can reach it) and never see the raw bearer token. Token handling stays at the edge; auth claims propagate inward as plain identifiers.

Rate limiting

Limit dimensionUse it for
Per IPCrude DDoS defense; anonymous traffic.
Per API keyPartner / B2B contracts — the meter for billing tiers.
Per user (sub claim)Authenticated traffic; per-account quotas.
Per routeTighter limits on writes (POST/DELETE) than reads.
Per service (global)Hard ceiling so one runaway upstream doesn’t take everything down.

Use a distributed store — Redis is the standard — so a multi-instance gateway counts requests across all replicas. A local in-memory limiter on each instance silently allows N× your intended limit, where N is the replica count.

CORS, body limits, schema validation

Aggregation and Fan-Out

Why Aggregation Is Tempting and Dangerous

The Problem: The mobile home screen needs data from 6 services. The naive answer is “the gateway calls all 6 and merges the result.” The naive answer also turns one client request into 6 backend dependencies, where the slowest one decides your latency and any one failure becomes your failure.

The Solution: Aggregate when it pays off — saves round trips on slow networks — but treat fan-out as a serious distributed-systems problem with timeouts, partial-result handling, and circuit breakers per leg.

A simple aggregation handler in pseudocode:

// Express-style BFF aggregation for the mobile home screen
app.get('/home', async (req, res) => {
    const userId = req.user.id;

    // Fan out in parallel; each leg has its own timeout + breaker
    const [profile, orders, recs, balance] = await Promise.allSettled([
        userClient.getProfile(userId),       // required
        ordersClient.recent(userId, 5),       // required
        recsClient.forUser(userId),           // optional — ok to fail
        walletClient.balance(userId),         // optional — ok to fail
    ]);

    if (profile.status === 'rejected' || orders.status === 'rejected') {
        return res.status(503).json({ error: 'home_unavailable' });
    }

    res.json({
        profile: profile.value,
        orders:  orders.value,
        recs:    recs.status    === 'fulfilled' ? recs.value    : null,
        balance: balance.status === 'fulfilled' ? balance.value : null,
    });
});

Synchronous fan-out is a multiplier of failure

If each downstream is 99.9% available and you call 6 of them serially with no fallback, your aggregated endpoint is 99.4% available. Add timeouts, retries with no jitter, and a sick downstream, and a single bad service can degrade your entire home screen. Always classify each leg as required or optional, fail soft on optional ones, and circuit-break each leg independently.

GraphQL as an alternative

GraphQL gateways — Apollo Gateway, GraphQL federation, GitHub’s public GraphQL API — turn aggregation into a first-class concern. The client sends one query specifying exactly the fields it wants; the gateway plans the resolver calls, fans out, and stitches the response. The win: clients can reshape responses without backend changes. The cost: a query planner is now in your hot path, and an over-eager client query can fan out into a denial-of-service against your own services. Guard with query cost analysis and persisted queries.

Gateway Tools Comparison

ToolOriginStrengthsWatch out forWhen to pick
Kong Nginx + Lua, OSS & enterprise Huge plugin ecosystem; declarative config; battle-tested Plugin quality varies; enterprise features pricey Plugin-rich, multi-team enterprise edge
Envoy Lyft, CNCF graduated L7 features, dynamic xDS config, foundation of Istio Steep learning curve; YAML voluminous Service mesh, sophisticated traffic shaping
AWS API Gateway AWS managed Zero ops; deep IAM/Lambda integration; auto-scale Cold starts; per-request pricing; AWS lock-in Serverless on AWS; low-traffic public APIs
Apigee Google Cloud Developer portal, monetization, full API lifecycle Enterprise pricing; heavyweight Public API products with billing tiers
Tyk OSS Go gateway Lightweight, multi-data-center, dashboard included Smaller community than Kong OSS gateway with good UX out of the box
Nginx OSS reverse proxy Universal, fast, well understood Plugins via Lua/njs; not API-aware by default Simple routing/TLS termination; teams already on Nginx
Traefik OSS, container-native Automatic service discovery in Docker/K8s; great defaults Less mature plugin ecosystem Container-first stacks; quick start on K8s
Spring Cloud Gateway Pivotal/VMware, JVM Native Spring integration; reactive; Netflix Zuul successor JVM resource footprint Spring shops; replacing Zuul

An AWS API Gateway resource

# SAM/CloudFormation snippet — HTTP API with JWT authorizer
Resources:
  Api:
    Type: AWS::Serverless::HttpApi
    Properties:
      Auth:
        DefaultAuthorizer: JwtAuth
        Authorizers:
          JwtAuth:
            JwtConfiguration:
              issuer: https://auth.example.com/
              audience:
                - api.example.com
            IdentitySource: "$request.header.Authorization"
      RouteSettings:
        "POST /orders":
          ThrottlingBurstLimit: 100
          ThrottlingRateLimit:  50
        "GET /catalog/{proxy+}":
          ThrottlingBurstLimit: 5000
          ThrottlingRateLimit:  2000

  OrdersFn:
    Type: AWS::Serverless::Function
    Properties:
      Handler: orders.handler
      Events:
        CreateOrder:
          Type: HttpApi
          Properties:
            ApiId: !Ref Api
            Path: /orders
            Method: POST

Operational Concerns

Why The Edge Needs Extra Care

The Problem: Every request goes through the gateway. If it’s down, your platform is down — even if every backend service is healthy.

The Solution: Treat the gateway as your most critical service. Multiple replicas, multiple AZs, blue/green deploys, separate change-management process from regular service deploys, and explicit fallback runbooks for the day it misbehaves.

The single-point-of-failure problem

A gateway is by definition a fan-in point. The standard mitigations:

Version skew with downstreams

The gateway and its upstream services rarely deploy at exactly the same instant. A change to a route prefix or a header convention has to land in both. The pattern that works: change is additive on both sides, deploy each side in arbitrary order, then remove the old behavior in a follow-up release. Never make a destructive change in one place that requires the other side to deploy at the same moment.

The gateway-as-monolith risk

The drift to gateway monolith

It starts innocently: “just put the user-id parsing in the gateway, every service needs it.” Six months later the gateway is parsing JSON bodies, calling out to enrichment services, applying region-specific business rules, and rendering response templates. Now every change requires a gateway deploy and a 3-team approval process. Resist this drift. The gateway should grow no faster than the platform team can own it.

Observability for the edge itself

Real-World Examples

Netflix Zuul → Spring Cloud Gateway. Netflix built Zuul as the front door to their cloud edge. Zuul 1 was synchronous and blocking; Zuul 2 introduced a reactive, non-blocking model. The patterns Netflix codified — dynamic routing, request and response filters, integrated circuit breakers (with Hystrix originally) — are now industry-standard. The community has largely migrated to Spring Cloud Gateway, which is the spiritual successor on the JVM.

Amazon API Gateway. AWS’s managed offering ties API routes directly to Lambda, Step Functions, or VPC services. Pay-per-request pricing, built-in throttling, IAM-based auth, JWT authorizers, and WebSocket support. The trade-off is the typical managed-service trade: you trade flexibility for not having to operate it.

GitHub’s GraphQL API. GitHub publishes a single GraphQL endpoint backing essentially their entire product surface. The GraphQL gateway plans queries across many internal services and assembles a single response. Cost analysis and rate limits are computed in “points” — a complex query consumes more budget than a simple one. This is the GraphQL-as-gateway pattern at industrial scale.

Twitter’s Finagle. Twitter’s Finagle library blurs the gateway/RPC boundary — it provides routing, load balancing, retries, circuit breakers, and observability as composable filters around any service-to-service call. Many of the patterns now standard in API gateways were validated at Twitter scale through Finagle first.

Kong at Cisco / Yahoo / Expedia. Kong’s public reference list reads like a who’s-who of large web platforms. The common shape: a Kong fleet behind a cloud load balancer, plugins for JWT and rate limiting, declarative config in Git, multi-region for residency.

Best Practices

The short list

  • Keep the gateway boring. Routing, auth, rate limit, observability. No business logic. Ever.
  • Declarative config in Git. The gateway is infrastructure; treat its config like infrastructure — reviewed, versioned, replayable.
  • One auth mechanism per consumer class. JWT for first-party apps, API keys for partners, mTLS for service-to-service. Don’t mix.
  • Strip the bearer token before forwarding. Let the upstream see X-User-Id and other parsed claims; never the raw token.
  • Distributed rate limit state. Use Redis (or the gateway’s native equivalent). Local counters lie when you scale out.
  • Hard body-size limits at the edge. Defends every upstream from a memory-hungry payload.
  • Per-route timeouts, retries, circuit breakers. A bad upstream should fail fast, not drag down the gateway.
  • Treat aggregation as a distributed-systems problem. Required vs optional legs, partial-result handling, breakers per leg.
  • Two replicas per AZ minimum, multi-AZ. The gateway is your top-priority HA target.
  • Run the BFF when client needs diverge. Don’t bend a shared gateway to fit one client’s screens.
  • Cap GraphQL query cost. If you expose GraphQL, make over-fetching expensive enough that it can’t happen accidentally.
  • Have a bypass runbook. The day the gateway misbehaves, you want a known way to get traffic to services directly.

The single most useful sentence about API gateways

The gateway exists so that everything else can stay simple. The moment it starts taking on the complexity its services were supposed to handle, you have made the platform worse, not better. When in doubt: push the logic down, keep the edge thin.