Why API Versioning Matters
Why API Versioning Matters
The Problem: Once a single client calls your endpoint, the request and response shape are part of a contract. The day you rename a field, drop a property, or tighten a validation rule, that client breaks — and in microservices you usually do not know who all the clients are.
The Solution: Pick a versioning scheme up front, treat additive changes as cheap and breaking changes as expensive, and run an explicit deprecation lifecycle so callers find out from a header six months early instead of from a 500 at midnight.
Real Impact: Stripe still serves API versions from 2011. That is not nostalgia — it is the reason large customers stay on Stripe instead of rewriting their integration every year.
Real-World Analogy
Think about software releases on your laptop:
- macOS 14.4 -> 14.5 = patch release. Your apps keep working.
- macOS 14 -> 15 = major release. A few legacy apps stop working; the rest are fine.
- 32-bit Intel apps after Catalina = removed. Apple announced it years in advance, in release notes, in build warnings, in popup dialogs.
An HTTP API works the same way. Patch and minor changes are silent. Major changes are loud, scheduled, and announced in writing. The product SKU on a shelf tells you the same story — the model number is the version, and the manufacturer keeps replacement parts in stock for the old SKU long after the new one ships.
The cost of getting versioning wrong is not theoretical. Internal services that change shape without notice are how a single deploy turns into a multi-team incident. Public APIs that break their contract are how integrations get ripped out. Either way, the work to fix it is paid by other teams — which means it does not get prioritized, which means callers get stuck, which means the next change is even harder.
What you are actually versioning
Most engineers think of “the API version” as a string in a URL. In production, you are versioning at least four layers, and they evolve at different speeds:
| Layer | Example | Who Cares |
|---|---|---|
| Wire protocol | HTTP/1.1, HTTP/2, gRPC | Infra team, load balancers |
| Endpoint shape | URL path, verbs, status codes | Every caller |
| Payload schema | JSON fields, Protobuf messages | Every caller, every event consumer |
| Semantics | What does “PUT /order” actually do? | Every caller; the silent killer |
The fourth row is where most production incidents come from. The shape did not change. The status code did not change. But yesterday a 200 meant “queued” and today it means “committed.” That is a breaking change with no syntactic signal.
Versioning Strategies
Pick One and Be Consistent
The Problem: The schemes below all work. The schemes mixed together do not. A service that accepts both /v2/users and Accept: application/vnd.acme.v2+json will eventually serve different responses for the “same” request because two code paths drifted.
The Solution: Pick one strategy per surface (public REST, internal RPC, async events) and write it down. The choice is less important than the consistency.
The five common schemes
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | GET /v1/users/42 |
Visible in logs, browser, curl. Easy to route. Easy to cache. | Implies the whole API moves together. URLs are not really resource identifiers anymore. |
| Query parameter | GET /users/42?api-version=2024-08-01 |
Default version when omitted. Easy to A/B test. | Pollutes query strings. Cache keys get awkward. Easy to forget. |
| Custom header | X-API-Version: 2 |
URL stays clean and resource-shaped. | Invisible in logs unless you log headers. CDN caching needs Vary. |
| Accept header (media type) | Accept: application/vnd.github.v3+json |
The HTTP-correct answer; URL is the resource. | Hard to test in a browser. Most client libraries hide it. Caching needs Vary: Accept. |
| Hostname | api-v2.acme.com |
Hard isolation. Different infra per version. | DNS sprawl. CORS and cookies become per-version problems. |
What a header-versioned request actually looks like
# curl with explicit media-type versioning (GitHub-style)
curl -i https://api.acme.com/users/42 \
-H "Accept: application/vnd.acme.v2+json" \
-H "Authorization: Bearer ${TOKEN}"
# Response
HTTP/2 200
content-type: application/vnd.acme.v2+json
vary: accept
deprecation: false
x-api-version: 2
{
"id": "usr_42",
"display_name": "Ada Lovelace",
"email": "ada@example.com"
}
# Same request without a version header falls back to the server’s default
curl -i https://api.acme.com/users/42 \
-H "Accept: application/json"
# Server replies with the default version and tells you what it picked
HTTP/2 200
content-type: application/json
x-api-version: 2
warning: 299 - "No version requested; defaulting to v2"
The non-negotiables, regardless of strategy
- Echo the version in the response. Always.
X-API-Versionon every response means a caller can look at a log line and know which contract was served. - Set
Varyif you version by header. Without it, your CDN will happily hand a v1 response to a v2 caller. - Reject unknown versions explicitly. Return 400 with a body that lists the supported versions. Do not silently fall back.
- Pin the default version. “Latest” as a default is how you ship breaking changes by accident.
A versioned router in Python (FastAPI)
from fastapi import FastAPI, Header, HTTPException, Depends
from fastapi.responses import JSONResponse
from typing import Annotated
app = FastAPI()
SUPPORTED = {"1", "2"}
DEFAULT_VERSION = "2"
DEPRECATED = {"1"} # still served, but on notice
def resolve_version(x_api_version: Annotated[str | None, Header()] = None) -> str:
version = x_api_version or DEFAULT_VERSION
if version not in SUPPORTED:
raise HTTPException(
status_code=400,
detail={
"error": "unsupported_api_version",
"requested": version,
"supported": sorted(SUPPORTED),
},
)
return version
@app.get("/users/{user_id}")
async def get_user(user_id: str, version: Annotated[str, Depends(resolve_version)]):
user = await repo.fetch(user_id)
if version == "1":
body = {"id": user.id, "name": user.display_name, "email": user.email}
else:
body = {
"id": user.id,
"display_name": user.display_name,
"email": user.email,
"created_at": user.created_at.isoformat(),
}
headers = {"X-API-Version": version, "Vary": "X-API-Version"}
if version in DEPRECATED:
headers["Deprecation"] = "true"
headers["Sunset"] = "Wed, 01 Apr 2026 00:00:00 GMT"
headers["Link"] = '<https://docs.acme.com/migrate-v1-v2>; rel="deprecation"'
return JSONResponse(content=body, headers=headers)
Two things to notice. First, the version-resolution logic is one function, not scattered across handlers — the moment you put it in two places, they will drift. Second, the v1 branch is deliberately small. The longer it stays in the codebase, the more you owe a migration.
Breaking vs Non-Breaking Changes
Most Changes Should Be Non-Breaking
The Problem: If every change is a new major version, you end up with v17 in two years and no team has the energy to migrate. If no change is a new major version, you ship breaking changes silently.
The Solution: Default to additive (non-breaking) changes. Reserve a major-version bump for changes that genuinely cannot be expressed additively.
The classification
| Change | Breaking? | Why |
|---|---|---|
| Add a new optional field to a response | No | Old clients ignore unknown fields (Tolerant Reader). |
| Add a new endpoint | No | Nobody is calling it yet. |
| Add a new optional request parameter | No | Old clients omit it; server applies a default. |
| Add a new value to an enum | Maybe | Breaks any client that switches exhaustively over the enum. |
| Remove a field from a response | Yes | Clients that read it get null/undefined. |
| Rename a field | Yes | Equivalent to remove + add. |
| Tighten a validation rule | Yes | Requests that used to succeed now 4xx. |
| Change a field’s type (e.g., int -> string) | Yes | Parsers fail on the wire. |
| Change semantics without changing shape | Yes | The silent killer. Document loudly even if no schema change. |
| Change default value of an optional field | Yes | Clients that relied on the old default get different behavior. |
The Tolerant Reader principle
Originally articulated by Martin Fowler, the Tolerant Reader rule says: read what you need; ignore what you do not. A v1 client that parses only id and email should keep working when v2 adds a created_at field. That is what makes additive changes safe.
This is a contract between client and server. Strict-schema clients (Protobuf with unknown-field rejection enabled, JSON Schema with additionalProperties: false) opt out of tolerance — which means every additive change is a breaking change for them. If you write client code, default to tolerant reading. If you author the schema, document which mode you assume.
# Tolerant Python client — survives the server adding fields
from dataclasses import dataclass
@dataclass
class User:
id: str
email: str
@classmethod
def from_api(cls, payload: dict) -> "User":
# Read only what we need. Ignore the rest. No KeyError on new fields.
return cls(id=payload["id"], email=payload["email"])
Never reuse a deprecated version’s URL for a new endpoint
Once /v1/orders has shipped, that path belongs to v1 forever — even after sunset. Reusing it (“v1 is gone, so /v1/orders is free real estate”) means a stale client somewhere will hit it and silently get a different schema. The same applies to enum values, error codes, and event types: once published, retired identifiers are radioactive. Pick a new path.
Schema Evolution
Schema Languages Have Opinions About Versioning
The Problem: “Just add a field” means different things in JSON, Protobuf, and Avro. Each has its own rules for what is forward-compatible and what is backward-compatible, and the rules are easy to violate.
The Solution: Learn the rules of the schema you actually use. Enforce them with tooling so violations fail in CI, not in production.
JSON and JSON Schema
JSON itself is permissive — the runtime does not enforce a schema. JSON Schema is what you use to make compatibility checkable. The two settings that decide your evolution story are additionalProperties and required.
additionalProperties: true— tolerant reader friendly. Servers can add fields without breaking strict clients.additionalProperties: false— strict. Adding a field is a breaking change for any client validating with this schema.required: ["id"]— promoting an optional field to required is always breaking. Demoting required to optional is breaking for clients that relied on its presence.
Tooling matters here. openapi-diff and oasdiff compare two OpenAPI documents and tell you whether the change is breaking. Run them in CI on every pull request that touches a spec.
Protobuf: field numbers are forever
Protobuf encodes wire compatibility into the schema itself. The field number, not the field name, is what gets serialized. That means renaming is free; reusing a number is a disaster.
// Order schema, v3
syntax = "proto3";
package acme.orders.v1;
message Order {
string id = 1;
string customer_id = 2;
int64 amount_cents = 3;
string currency = 4;
// Field 5 was “notes” (string). Removed in v2 because PII.
// NEVER reuse number 5 or the name “notes” — old clients still
// have it in their generated code and would deserialize garbage.
reserved 5;
reserved "notes";
// Added in v2 — safe because field 6 is new.
string idempotency_key = 6;
// Added in v3. Marked deprecated because we’re moving to a richer
// LineItems message; old clients can still read it.
int32 item_count = 7 [deprecated = true];
repeated LineItem line_items = 8;
}
message LineItem {
string sku = 1;
int32 quantity = 2;
int64 unit_price_cents = 3;
}
Protobuf rules to internalize
- Never change a field’s number. Number is the wire identity.
- Never change a field’s type across the categories that change wire format (string <-> bytes is fine; int32 <-> string is not).
- Always
reserveddeleted field numbers and names. The compiler will refuse a future re-add — that is the point. - Mark replaced fields
[deprecated = true]instead of deleting them. Generated code emits a warning so callers find out at compile time. - Use
v1in the package name (acme.orders.v1) so a true breaking redesign can live asacme.orders.v2alongside it.
Avro: forward and backward compatibility
Avro distinguishes the schema used to write a record from the schema used to read it, and resolves between them at decode time. That makes the compatibility model explicit:
- Backward compatible — new schema can read old data. Achieved by giving every new field a default.
- Forward compatible — old schema can read new data. Achieved by ensuring deletions only target fields that had defaults.
- Full compatible — both at once. The bar Confluent Schema Registry enforces by default.
Schema registries enforce this at publish time: try to register a non-compatible schema and the registry rejects the change before any producer can ship it. That is the right place to draw the line — not in code review.
Versioning Events and Messages
Async Versioning Is Harder Than Sync Versioning
The Problem: An HTTP caller can be told to upgrade. An event in a Kafka topic, written six months ago by a producer that no longer exists, has to keep deserializing. Consumers cannot “negotiate.”
The Solution: Treat the schema registry as the source of truth, version every event, and write upcasters that turn old shapes into the current shape on the read side.
Schema registries
A schema registry stores every version of every event schema and gives each one a stable id. Producers serialize [schema_id, payload]. Consumers fetch the schema by id, decode the payload, and (optionally) reshape it into the current internal model.
The two production options are Confluent Schema Registry (Avro, Protobuf, JSON Schema; ships with Confluent Platform and works with any Kafka) and AWS Glue Schema Registry (same idea, integrated with MSK, Kinesis, and Lambda). Both enforce a configurable compatibility mode (backward / forward / full / none) on schema registration.
Upcasting
An upcaster is a small function that takes an old event and returns the current shape. Run it once on read, cache the result, and the rest of your code only ever sees the latest version.
# Upcaster chain for OrderPlaced events
from typing import Callable
UPCASTERS: dict[int, Callable[[dict], dict]] = {}
def upcast(from_version: int):
def decorator(fn):
UPCASTERS[from_version] = fn
return fn
return decorator
@upcast(1)
def v1_to_v2(payload: dict) -> dict:
# v1 used “amount” in dollars; v2 uses “amount_cents” in integer cents
payload = {**payload, "amount_cents": int(payload.pop("amount") * 100)}
payload["_schema_version"] = 2
return payload
@upcast(2)
def v2_to_v3(payload: dict) -> dict:
# v3 split full_name -> first_name + last_name
full = payload.pop("full_name", "")
first, _, last = full.partition(" ")
payload["first_name"] = first
payload["last_name"] = last or ""
payload["_schema_version"] = 3
return payload
CURRENT_VERSION = 3
def to_current(event: dict) -> dict:
version = event.get("_schema_version", 1)
while version < CURRENT_VERSION:
upcaster = UPCASTERS.get(version)
if upcaster is None:
raise ValueError(f"no upcaster from v{version}")
event = upcaster(event)
version = event["_schema_version"]
return event
Dead-letter unknown versions
If a consumer encounters a schema id it has never seen — the producer is newer than the consumer — do not crash and do not guess. Route the message to a dead-letter topic with the original payload, schema id, and timestamp. Alert on dead-letter rate. The consumer team can replay after deploying support for the new version.
Deprecation Lifecycle
Deprecation Is a Process, Not an Email
The Problem: “Email the integrators” is not a deprecation policy. People miss emails. Bots cannot read emails. The 4 AM team that owns the legacy script does not work for you.
The Solution: A written, machine-readable lifecycle — HTTP headers per response, a sunset date in the future, and per-caller telemetry so you know who is still on the old version before you turn it off.
The standard headers
Two RFCs codify the in-band signals every API should emit during deprecation:
- RFC 8594 — the
Sunsetheader. An HTTP date after which the resource will no longer be available. Tells callers when to migrate. - The
Deprecationheader. A boolean or HTTP date indicating the resource is deprecated as of now. Tells callers that they need to migrate. - The
Linkheader withrel="deprecation". Points at human-readable migration docs. - The
Linkheader withrel="successor-version". Points at the URL or media type that replaces the deprecated resource.
# A deprecated v1 endpoint, properly labeled
HTTP/2 200
content-type: application/vnd.acme.v1+json
deprecation: "Wed, 01 Apr 2025 00:00:00 GMT"
sunset: "Wed, 01 Oct 2025 00:00:00 GMT"
link: <https://docs.acme.com/migrate-v1-v2>; rel="deprecation"; type="text/html",
<https://api.acme.com/v2/users/42>; rel="successor-version"
warning: 299 - "v1 deprecated 2025-04-01; sunset 2025-10-01"
Timeline
| Phase | Notice | What Happens |
|---|---|---|
| Announcement | T − 90 days minimum (T − 12 months for paid public APIs) | Deprecation header on every response. Changelog entry. Email to known integrators. |
| Active deprecation | T − 60 days | Migration guide published. Per-caller telemetry surfaces top users still on v1. |
| Final warning | T − 30 days | Direct outreach to remaining users. Optional brownouts (return 503 for one hour, escalating). |
| Sunset | T = 0 | 410 Gone with a body that links to v2 docs. Never 404 — 404 is ambiguous. |
Per-caller telemetry
You cannot retire what you cannot measure. Tag every request with the caller (API key, OAuth client id, internal service identity from mTLS) and the version they used. Aggregate by (caller, version, day) so you can answer two questions: who is still calling v1? and is that number going down?
# Prometheus exposition for per-caller version usage
api_requests_total{caller="acme-mobile", version="v1", status="200"} 182734
api_requests_total{caller="acme-mobile", version="v2", status="200"} 9482710
api_requests_total{caller="partner-acme-corp", version="v1", status="200"} 812
# Alert: a known-deprecated version is still seeing > 1k QPS, 30 days from sunset
- alert: DeprecatedVersionStillBusy
expr: sum by (caller) (rate(api_requests_total{version="v1"}[5m])) > 1000
for: 10m
labels: { severity: page }
annotations:
summary: "{{ $labels.caller }} still on v1 with sunset 30 days out"
Brownouts before blackouts
A common production pattern: in the final week of deprecation, the deprecated version returns 503 for a scheduled window (one hour, then four, then a full day). The brownouts force every still-running caller to surface in someone’s on-call queue while there is still time to fix it. Quietly turning the lights off at midnight on sunset day is how you create incidents for other teams.
Real-World Examples
Four Strategies, All Working in Production
The Problem: Tutorials usually pick one versioning style and call the others wrong. Production proves all of these can work — the choice depends on your audience and your release cadence.
The Solution: Look at the trade-offs the giants made. The patterns repeat.
Stripe — per-account pinned versions
Stripe versions its API by date (2024-06-20) and pins each account to the version that was current when the account was created. Your account’s requests get that version’s response shape forever, unless you explicitly upgrade in the dashboard or pass Stripe-Version per request. The result: an integration written in 2014 keeps working in 2026 with no changes.
Internally, Stripe transforms responses through a chain of version-to-version compatibility shims. Adding a new version means writing one new shim and shipping it; older shims stay in place. The cost of the approach is real engineering work on every breaking change — but the cost of not doing it would be lost integrations.
GitHub — media-type versioning
GitHub’s REST API uses Accept: application/vnd.github.v3+json headers. The URL is the resource (/repos/{owner}/{repo}); the version is in the Accept header. New media types layer in opt-in features (application/vnd.github.machine-man-preview+json for early access). The trade-off: cleanest URLs, but you cannot test a versioned response by pasting a URL into a browser.
AWS — service-by-service, in the API itself
AWS has hundreds of services and no global API version. Each service publishes a versioned API spec (2012-08-10 for SQS, 2006-03-01 for S3) and the SDK pins the version it was generated against. When a service ships a new spec, the SDK does not pick it up until you regenerate. Old specs continue to work essentially forever — some S3 callers have been on the same version since 2006. The price is consistency: you cannot reason about “the AWS API” as a single thing.
Twilio — date-based with default-locked accounts
Twilio uses date-stamped API versions (/2010-04-01/Accounts) baked into the URL path. Like Stripe, accounts are pinned: the first time you use the API, your account locks to that version. Twilio then exposes opt-in flags to upgrade individual capabilities without taking the rest of the new version. The path-based scheme makes it trivially debuggable from logs, at the cost of URL longevity.
| Vendor | Scheme | Pinning | Lifetime |
|---|---|---|---|
| Stripe | Header (Stripe-Version: 2024-06-20) | Per account | Indefinite |
| GitHub | Accept media type (vnd.github.v3+json) | Per request | Years; long previews |
| AWS | Date in spec, baked into SDK | Per generated SDK | Effectively forever |
| Twilio | URL path (/2010-04-01/...) | Per account | Indefinite |
The common thread: none of them break old callers. The versioning style varies; the willingness to keep paying the cost of compatibility does not.
Best Practices
The short list
- Pick one versioning scheme per surface and write it down. Mixing path and header versioning in the same API is how drift happens.
- Prefer additive changes. A v2 every five years is a sign of a healthy API. A v7 every twelve months is a sign you are using major versions to avoid the discipline of additive design.
- Version internally too. Service-to-service traffic deserves the same discipline as public APIs — especially when teams own services independently.
- Echo the version in every response.
X-API-Versionon the way out, always. - Reject unknown versions explicitly. Return 400 with the supported list in the body; never silently fall back.
- Use
SunsetandDeprecationheaders (RFC 8594). Machines can read them; humans can grep them; clients can build tooling around them. - Reserve removed Protobuf field numbers and names. The compiler is your second brain.
- Enforce schema compatibility in CI.
oasdifffor OpenAPI, the schema registry’s compatibility check for Avro/Protobuf. Breaking changes should fail the pipeline, not the deploy. - Track per-caller version usage. You cannot retire what you cannot measure.
- Brownouts before blackouts. Force the issue while there is still time to fix it.
- Never reuse a deprecated identifier. Old paths, old enum values, old field numbers belong to history.
- Document semantics, not just shape. A status code change is breaking even if the body is identical.
The single most useful sentence about API versioning
Every endpoint you ship is a promise to keep that endpoint working. The versioning scheme is just how you keep score — the actual work is staying disciplined enough that adding a field never tempts you to remove one.