Why API Gateways Matter
Why API Gateways Matter
The Problem: Without a gateway, every client — mobile app, web SPA, partner integration, IoT device — has to know about every service. That means hardcoded hostnames, duplicated auth logic in every client, N×M CORS headaches, and a deploy of all your apps every time you split or rename a service.
The Solution: A single front door. Clients talk to one URL. The gateway handles authentication, routing, rate limits, retries, and the boring plumbing once — not in every client and not in every service.
Real Impact: Netflix routes billions of requests per day through its edge gateway (originally Zuul, now Spring Cloud Gateway). The gateway is the reason a UI engineer doesn’t need to know that there are 700 backend services behind it.
Real-World Analogy
Think of a hotel concierge desk:
- Single front door — guests don’t wander into the kitchen, the laundry room, or the boiler — they go to the desk.
- Routes guests to the right room — the concierge knows which staff handles which request and forwards accordingly.
- Handles common needs once — checking ID, taking payment, logging visits — the rooms don’t each implement check-in.
- Shields the back of house — if housekeeping reorganizes, guests notice nothing because the desk absorbs the change.
An API gateway plays the concierge role for your services. Without one, every guest needs the staff directory, a key to every door, and a credit card reader.
The pain a gateway solves is mostly invisible until you’ve felt it. Picture a 30-service backend with no edge layer. The mobile team needs to call 12 of them to render a home screen. Each one has its own DNS name, its own auth header conventions, its own way of paginating, its own preferred error format. The mobile app ships with a list of 12 hostnames hardcoded into a config — and the day you split orders into orders and order-history, every installed copy of the mobile app needs an update.
That’s the before. The after is one host, one auth scheme, one error envelope, one rate limit, one observability story. Every service-shaped change is invisible to clients because the gateway absorbs it.
What you avoid by not having one
- Auth-per-service: Each team rolls their own JWT validation. One of them gets the signature check wrong. You don’t find out until the post-mortem.
- CORS forever: Browsers preflight every cross-origin call. Multiply that across services. Now multiply it across environments.
- Client coupling to topology: Splitting a service requires coordinating a release with every consumer. Backwards compatibility becomes a treadmill.
- Inconsistent observability: Some services log JSON, some log text. Latency histograms are bucketed differently. Tracing is patchy.
- No throttle: A misbehaving client can hammer one service into the ground because nothing at the edge says “slow down.”
What an API Gateway Does
The gateway is doing a small number of things, repeatedly, very fast. None of these are exotic on their own — the value is that they all live in one place and ship as one configuration.
| Responsibility | What it actually does |
|---|---|
| Request routing | Match an incoming path/host/header to an upstream service and forward. |
| Authentication | Validate JWTs, exchange OAuth tokens, check API keys — before traffic touches a service. |
| Authorization | Coarse-grained policy: is this token allowed to call this route at all? |
| Rate limiting | Per-user, per-IP, per-API-key quotas. Reject overages with 429 fast. |
| Request/response transformation | Strip headers, rewrite paths, convert protocols (REST ↔ gRPC, JSON ↔ XML). |
| Aggregation | Compose multiple downstream calls into a single client response when needed. |
| Observability | Emit access logs, metrics, and tracing spans for every request — with consistent labels. |
| SSL/TLS termination | Decrypt at the edge, talk plaintext (or mTLS) to internal services. |
| Caching | Serve idempotent GETs from edge cache where it’s safe. |
What an API gateway should not do
Never put business logic in the gateway. The moment your gateway starts computing pricing, applying discounts, or deciding who owns a record, you have built a distributed monolith with a fancy reverse proxy. Anything domain-specific belongs in a service. The gateway should be boring.
Gateway vs BFF (Backend for Frontend)
Why Two Patterns Coexist
The Problem: A single shared gateway tries to please mobile, web, partners, and internal tooling all at once. Mobile wants tiny payloads to save bandwidth. Web wants larger ones with embedded relations. Partners want stable contracts. The gateway becomes the place every team fights over.
The Solution: Sam Newman’s Backend for Frontend pattern says: each client type gets its own gateway. The mobile BFF talks to the same backend services as the web BFF, but it shapes responses for its own client’s needs. No more compromise contracts.
The shared API gateway and the BFF are not opposites — they’re two points on the same spectrum, and many production systems use both.
| Aspect | Single Shared Gateway | Backend for Frontend (BFF) |
|---|---|---|
| Owners | Platform team | Each client team owns its BFF |
| Audience | All clients | One client type (mobile, web, partner) |
| Response shape | Generic, lowest common denominator | Tailored to one client’s screens/needs |
| Release cadence | Slow — affects everyone | Fast — in step with client releases |
| Fan-out logic | Discouraged (keep it boring) | Expected (the BFF aggregates) |
| When to use | You need a stable edge for many clients | Two client types diverge enough to hurt |
A common production pattern: a thin shared gateway at the very edge handles auth, TLS termination, and global rate limits. Behind it sit per-client BFFs that handle aggregation and shaping. Mobile traffic goes edge → mobile-bff → services; web traffic goes edge → web-bff → services. Partners often get their own — with different rate limits and a stricter contract.
How to know you need a BFF
- Mobile and web teams routinely argue about response shape in PR review.
- You’re returning fields just so one client can ignore them.
- The gateway has client-specific branches (
if user-agent contains ‘Mobile’). - Mobile releases are blocked behind backend deploys.
- Partner clients need different auth, different rate limits, different SLAs from your own apps.
Routing Strategies
Routing is the gateway’s simplest job and the one you’ll touch most often. Five flavors cover almost everything:
| Strategy | How it matches | Typical use |
|---|---|---|
| Path-based | /orders/* → orders service | The default. Clean, predictable, easy to debug. |
| Host-based | api.example.com vs partners.example.com | Multi-tenant; isolating partner traffic. |
| Header-based | X-Tenant: acme → tenant-specific upstream | SaaS with per-tenant isolation; A/B cohorts. |
| Weighted (canary) | 95% to v1, 5% to v2 | Rolling out a new release behind the gateway. |
| A/B / cohort | Hash user-id → bucket | Sticky experiments — same user, same variant. |
A Kong declarative route
# kong.yaml — declarative config, version-controlled
_format_version: "3.0"
services:
- name: orders-service
url: http://orders.svc.cluster.local:8080
routes:
- name: orders-route
paths:
- /orders
strip_path: false
methods: [GET, POST, PUT, DELETE]
plugins:
- name: jwt
- name: rate-limiting
config:
minute: 600
policy: redis
- name: payments-service
url: http://payments.svc.cluster.local:8080
routes:
- name: payments-route
paths:
- /payments
plugins:
- name: jwt
- name: rate-limiting
config:
minute: 60 # tighter limit on payment writes
policy: redis
Notice the configuration is declarative — no code, no imperative steps. The gateway is the executor; this file is the truth. That’s the property you want: the gateway behaves the same way wherever you run it from this file.
An Envoy route fragment
# envoy.yaml route_config — weighted routing for a canary
route_config:
name: api_routes
virtual_hosts:
- name: api
domains: ["api.example.com"]
routes:
- match:
prefix: "/orders"
route:
weighted_clusters:
clusters:
- name: orders_v1
weight: 95
- name: orders_v2
weight: 5 # 5% canary
timeout: 2s
retry_policy:
retry_on: "5xx,reset,connect-failure"
num_retries: 2
per_try_timeout: 800ms
- match:
prefix: "/payments"
headers:
- name: "x-tenant"
exact_match: "acme"
route:
cluster: payments_acme # tenant-isolated upstream
- match:
prefix: "/payments"
route:
cluster: payments_default
Cross-Cutting Concerns at the Edge
Why At The Edge
The Problem: Auth, rate limiting, CORS, and request validation are concerns that every service has. Implementing them N times means N inconsistencies and N bugs.
The Solution: Push them to the edge. The service trusts that anything that arrived already passed the gate. The service code shrinks; the security posture improves; the shared concerns become one team’s responsibility.
Authentication
Two flavors dominate at the edge:
- JWT validation — the gateway holds the public key (or a JWKS URL), checks the signature, validates
exp/iss/aud, and forwards claims to the upstream as headers. - OAuth introspection — the gateway calls the auth server’s
/introspectendpoint per request (or per cache window) to learn whether a token is still valid. Slower but supports revocation.
# Kong JWT plugin — validate signatures, then forward claims
plugins:
- name: jwt
config:
key_claim_name: iss
claims_to_verify:
- exp # reject expired
maximum_expiration: 3600 # reject > 1h tokens
header_names: ["Authorization"]
uri_param_names: [] # no tokens in query strings
cookie_names: []
run_on_preflight: false # skip OPTIONS
- name: request-transformer
config:
add:
headers:
- "X-User-Id:$(jwt.claim.sub)"
- "X-Tenant:$(jwt.claim.tenant)"
remove:
headers:
- "Authorization" # don’t leak tokens to upstream
That last bit matters. The internal service should trust the headers the gateway forwards (because nothing else can reach it) and never see the raw bearer token. Token handling stays at the edge; auth claims propagate inward as plain identifiers.
Rate limiting
| Limit dimension | Use it for |
|---|---|
| Per IP | Crude DDoS defense; anonymous traffic. |
| Per API key | Partner / B2B contracts — the meter for billing tiers. |
| Per user (sub claim) | Authenticated traffic; per-account quotas. |
| Per route | Tighter limits on writes (POST/DELETE) than reads. |
| Per service (global) | Hard ceiling so one runaway upstream doesn’t take everything down. |
Use a distributed store — Redis is the standard — so a multi-instance gateway counts requests across all replicas. A local in-memory limiter on each instance silently allows N× your intended limit, where N is the replica count.
CORS, body limits, schema validation
- CORS — one allowlist at the gateway, not per-service. Preflight responses cached aggressively.
- Body size limits — a hard cap (e.g. 1 MB) at the edge defends every upstream from oversized bodies.
- Schema validation — reject requests that don’t match the OpenAPI/JSON Schema before they ever touch your service. Cheap to do at the edge; valuable to remove from every service.
Aggregation and Fan-Out
Why Aggregation Is Tempting and Dangerous
The Problem: The mobile home screen needs data from 6 services. The naive answer is “the gateway calls all 6 and merges the result.” The naive answer also turns one client request into 6 backend dependencies, where the slowest one decides your latency and any one failure becomes your failure.
The Solution: Aggregate when it pays off — saves round trips on slow networks — but treat fan-out as a serious distributed-systems problem with timeouts, partial-result handling, and circuit breakers per leg.
A simple aggregation handler in pseudocode:
// Express-style BFF aggregation for the mobile home screen
app.get('/home', async (req, res) => {
const userId = req.user.id;
// Fan out in parallel; each leg has its own timeout + breaker
const [profile, orders, recs, balance] = await Promise.allSettled([
userClient.getProfile(userId), // required
ordersClient.recent(userId, 5), // required
recsClient.forUser(userId), // optional — ok to fail
walletClient.balance(userId), // optional — ok to fail
]);
if (profile.status === 'rejected' || orders.status === 'rejected') {
return res.status(503).json({ error: 'home_unavailable' });
}
res.json({
profile: profile.value,
orders: orders.value,
recs: recs.status === 'fulfilled' ? recs.value : null,
balance: balance.status === 'fulfilled' ? balance.value : null,
});
});
Synchronous fan-out is a multiplier of failure
If each downstream is 99.9% available and you call 6 of them serially with no fallback, your aggregated endpoint is 99.4% available. Add timeouts, retries with no jitter, and a sick downstream, and a single bad service can degrade your entire home screen. Always classify each leg as required or optional, fail soft on optional ones, and circuit-break each leg independently.
GraphQL as an alternative
GraphQL gateways — Apollo Gateway, GraphQL federation, GitHub’s public GraphQL API — turn aggregation into a first-class concern. The client sends one query specifying exactly the fields it wants; the gateway plans the resolver calls, fans out, and stitches the response. The win: clients can reshape responses without backend changes. The cost: a query planner is now in your hot path, and an over-eager client query can fan out into a denial-of-service against your own services. Guard with query cost analysis and persisted queries.
Gateway Tools Comparison
| Tool | Origin | Strengths | Watch out for | When to pick |
|---|---|---|---|---|
| Kong | Nginx + Lua, OSS & enterprise | Huge plugin ecosystem; declarative config; battle-tested | Plugin quality varies; enterprise features pricey | Plugin-rich, multi-team enterprise edge |
| Envoy | Lyft, CNCF graduated | L7 features, dynamic xDS config, foundation of Istio | Steep learning curve; YAML voluminous | Service mesh, sophisticated traffic shaping |
| AWS API Gateway | AWS managed | Zero ops; deep IAM/Lambda integration; auto-scale | Cold starts; per-request pricing; AWS lock-in | Serverless on AWS; low-traffic public APIs |
| Apigee | Google Cloud | Developer portal, monetization, full API lifecycle | Enterprise pricing; heavyweight | Public API products with billing tiers |
| Tyk | OSS Go gateway | Lightweight, multi-data-center, dashboard included | Smaller community than Kong | OSS gateway with good UX out of the box |
| Nginx | OSS reverse proxy | Universal, fast, well understood | Plugins via Lua/njs; not API-aware by default | Simple routing/TLS termination; teams already on Nginx |
| Traefik | OSS, container-native | Automatic service discovery in Docker/K8s; great defaults | Less mature plugin ecosystem | Container-first stacks; quick start on K8s |
| Spring Cloud Gateway | Pivotal/VMware, JVM | Native Spring integration; reactive; Netflix Zuul successor | JVM resource footprint | Spring shops; replacing Zuul |
An AWS API Gateway resource
# SAM/CloudFormation snippet — HTTP API with JWT authorizer
Resources:
Api:
Type: AWS::Serverless::HttpApi
Properties:
Auth:
DefaultAuthorizer: JwtAuth
Authorizers:
JwtAuth:
JwtConfiguration:
issuer: https://auth.example.com/
audience:
- api.example.com
IdentitySource: "$request.header.Authorization"
RouteSettings:
"POST /orders":
ThrottlingBurstLimit: 100
ThrottlingRateLimit: 50
"GET /catalog/{proxy+}":
ThrottlingBurstLimit: 5000
ThrottlingRateLimit: 2000
OrdersFn:
Type: AWS::Serverless::Function
Properties:
Handler: orders.handler
Events:
CreateOrder:
Type: HttpApi
Properties:
ApiId: !Ref Api
Path: /orders
Method: POST
Operational Concerns
Why The Edge Needs Extra Care
The Problem: Every request goes through the gateway. If it’s down, your platform is down — even if every backend service is healthy.
The Solution: Treat the gateway as your most critical service. Multiple replicas, multiple AZs, blue/green deploys, separate change-management process from regular service deploys, and explicit fallback runbooks for the day it misbehaves.
The single-point-of-failure problem
A gateway is by definition a fan-in point. The standard mitigations:
- N+1 replicas behind a load balancer — never run a single instance.
- Multi-AZ — replicas in at least two availability zones, with health checks that fail traffic over.
- Stateless — the gateway should hold no per-request state in memory; rate-limit counters and session caches live in Redis.
- Capacity headroom — size for 2–3× peak. The day you need it, you really need it.
- Bypass plan — for the rare disaster scenario, document how an internal team can call services directly. You hope to never use it.
Version skew with downstreams
The gateway and its upstream services rarely deploy at exactly the same instant. A change to a route prefix or a header convention has to land in both. The pattern that works: change is additive on both sides, deploy each side in arbitrary order, then remove the old behavior in a follow-up release. Never make a destructive change in one place that requires the other side to deploy at the same moment.
The gateway-as-monolith risk
The drift to gateway monolith
It starts innocently: “just put the user-id parsing in the gateway, every service needs it.” Six months later the gateway is parsing JSON bodies, calling out to enrichment services, applying region-specific business rules, and rendering response templates. Now every change requires a gateway deploy and a 3-team approval process. Resist this drift. The gateway should grow no faster than the platform team can own it.
Observability for the edge itself
requests_per_secondby route — baseline traffic shape.latency_p50/p95/p99by route — the gateway adds 1–5 ms of overhead. Watch for drift.upstream_error_rateby service — isolates whether a 5xx came from the gateway or behind it.auth_failures— spikes mean misconfigured clients or an attack.rate_limited_total— spikes mean a client (or attacker) hit a quota.active_connections— sudden climbs mean upstreams are slow and the queue is filling.
Real-World Examples
Netflix Zuul → Spring Cloud Gateway. Netflix built Zuul as the front door to their cloud edge. Zuul 1 was synchronous and blocking; Zuul 2 introduced a reactive, non-blocking model. The patterns Netflix codified — dynamic routing, request and response filters, integrated circuit breakers (with Hystrix originally) — are now industry-standard. The community has largely migrated to Spring Cloud Gateway, which is the spiritual successor on the JVM.
Amazon API Gateway. AWS’s managed offering ties API routes directly to Lambda, Step Functions, or VPC services. Pay-per-request pricing, built-in throttling, IAM-based auth, JWT authorizers, and WebSocket support. The trade-off is the typical managed-service trade: you trade flexibility for not having to operate it.
GitHub’s GraphQL API. GitHub publishes a single GraphQL endpoint backing essentially their entire product surface. The GraphQL gateway plans queries across many internal services and assembles a single response. Cost analysis and rate limits are computed in “points” — a complex query consumes more budget than a simple one. This is the GraphQL-as-gateway pattern at industrial scale.
Twitter’s Finagle. Twitter’s Finagle library blurs the gateway/RPC boundary — it provides routing, load balancing, retries, circuit breakers, and observability as composable filters around any service-to-service call. Many of the patterns now standard in API gateways were validated at Twitter scale through Finagle first.
Kong at Cisco / Yahoo / Expedia. Kong’s public reference list reads like a who’s-who of large web platforms. The common shape: a Kong fleet behind a cloud load balancer, plugins for JWT and rate limiting, declarative config in Git, multi-region for residency.
Best Practices
The short list
- Keep the gateway boring. Routing, auth, rate limit, observability. No business logic. Ever.
- Declarative config in Git. The gateway is infrastructure; treat its config like infrastructure — reviewed, versioned, replayable.
- One auth mechanism per consumer class. JWT for first-party apps, API keys for partners, mTLS for service-to-service. Don’t mix.
- Strip the bearer token before forwarding. Let the upstream see
X-User-Idand other parsed claims; never the raw token. - Distributed rate limit state. Use Redis (or the gateway’s native equivalent). Local counters lie when you scale out.
- Hard body-size limits at the edge. Defends every upstream from a memory-hungry payload.
- Per-route timeouts, retries, circuit breakers. A bad upstream should fail fast, not drag down the gateway.
- Treat aggregation as a distributed-systems problem. Required vs optional legs, partial-result handling, breakers per leg.
- Two replicas per AZ minimum, multi-AZ. The gateway is your top-priority HA target.
- Run the BFF when client needs diverge. Don’t bend a shared gateway to fit one client’s screens.
- Cap GraphQL query cost. If you expose GraphQL, make over-fetching expensive enough that it can’t happen accidentally.
- Have a bypass runbook. The day the gateway misbehaves, you want a known way to get traffic to services directly.
The single most useful sentence about API gateways
The gateway exists so that everything else can stay simple. The moment it starts taking on the complexity its services were supposed to handle, you have made the platform worse, not better. When in doubt: push the logic down, keep the edge thin.