API Versioning, Gateways, and Evolution
A field rename can break a mobile client you shipped eight months ago.
The route still returns 200, the response still parses as JSON, and the server logs show success. The handler even has better names now.
{
"id": "user_42",
"name": "Asha Rao"
}An older client asks its generated model for displayName. It gets undefined, runs its empty-state path, and removes the account owner from the screen. One field changed on the server, and the client saw a different contract.
API evolution means changing an API contract over time while existing consumers keep making requests. The term covers a lot, but the mechanism behind it is specific. Consumers remember routes, fields, status codes, error codes, enum values, cursor formats, idempotency behavior, GraphQL fields, protobuf field numbers, and generated client types. A server deploys faster than most of its consumers can follow. Mobile clients wait on app-store review and user adoption, partner integrations sit in someone else's release calendar, and internal services can have pinned generated clients and old containers still running in some region.
An internal change becomes an external one as soon as a consumer can observe it.
Some of those changes never reach a consumer. You can rename a private variable, reorder internal middleware, or move a handler to another service, and it stays local as long as the route, payload, status behavior, deadlines, and side effects hold steady. Other changes reach the contract. If you rename a response field, old code reading the old name breaks. A client that branches on a validation error code sees something new when you change which code comes back. And if you change the cursor format on a paginated endpoint, you have changed the contract even though the route and JSON Schema never moved.
API contracts stick around in client code. A database schema can migrate in one controlled window if every writer and reader moves together. Public APIs rarely move that cleanly. They pile up clients, generated code, dashboards, examples, scripts, and tests, all built around old responses. Even a private API inside one company starts behaving like a public one once enough teams depend on it.
The observed contract is larger than the source diff. A server patch might touch one serializer, but what a consumer observes is the combined output of routing, validation, handler behavior, serialization, status selection, and gateway processing. The contract sits at that combined output. It is the set of stable facts a consumer gets to build against.
For a JSON HTTP API, those facts include route names, path parameters, query parameters, request headers, accepted request bodies, response fields, nullability, enum values, status codes, error envelopes, pagination rules, idempotency behavior, and cache-relevant headers. For GraphQL, they include fields, argument names, input types, nullability, resolver error structures, and schema deprecations. For gRPC, they include service names, method names, message types, field numbers, streaming mode, metadata, status codes, trailers, and deadline behavior.
The implementation can move freely while the observed contract stays still. A service can replace a database table, split a handler into smaller modules, swap a validation library, or run behind a different gateway target, and consumers still see the same API, as long as the same request produces the same documented behavior. The same logic runs the other way. A small serializer change counts as a contract change when old consumers come back with a different field, a different error code, or a different cursor token.
Compatibility comes down to observation. Start from the request an existing consumer already sends. Trace it through the route, the version selector, the validation path, the handler, and the serializer, then compare the response and the side effects against the old contract. That trace tells you more than a source diff does, because consumers run against your behavior rather than your commits.
One rule carries most of the weight. Classify the change at the point where the consumer sees it. A handler refactor stays local while the public request and response hold steady. Rename a response field and you break the consumer the moment its old code reads the old name. Adding a new route stays additive as long as the old routes still behave the way they did. And a new enum value only counts as backward-compatible when consumers already have a documented path for values they do not recognize.
Change Classification
Every change lands in one of a few categories, and which one depends on what the change does to a consumer that already worked.
A breaking change makes an existing, correct consumer fail. The consumer rejects the response, sends a wrong request, or shows different user-visible behavior, all under the same contract it was built against. The consumer did nothing wrong. The server changed the rules underneath it.
A backward-compatible change leaves existing behavior alone while it adds to the contract or makes it more precise. The old request still works, the old response fields still mean what they meant, and error codes hold their meaning. Cursor tokens keep working for their documented lifetime.
Additive changes add to the surface without taking anything away. A new optional response field, a new endpoint, or a new optional request field that old clients can leave out, where only planned rollout paths reject the missing value. Most additive changes end up backward-compatible, though that depends on how the client behaves and not on what the server intended.
A small response change shows the risk.
{
"id": "ord_9",
"status": "paid",
"currency": "USD"
}Adding currency to an order response is usually safe for JSON clients that ignore unknown fields. A generated client with strict decoding can reject the very same payload if its decoder treats extra fields as invalid. An additive server change runs into a stricter consumer contract.
Enums carry their own risk.
{
"id": "ord_9",
"status": "refunded"
}Adding refunded to an existing status enum looks additive from the server side. A TypeScript client with an exhaustive switch over the known statuses may fall into a default error branch. A generated Swift or Kotlin client may fail to decode the value when the enum is closed, and a metrics pipeline may bucket it as unknown and drop it from reports. A single new value on the server produces breakage across several consumers.
A new enum value is safe only when every consumer already has a documented path for values it does not recognize. Without that, strict generated clients reject or mishandle the unknown variant, closed Swift and Kotlin enums fail to decode, an exhaustive TypeScript switch drops into its default branch, and metrics pipelines bucket the value as unknown and lose it from reports.
Ship the unknown-handling contract first, let consumers adopt it, and only then start emitting new values. Add the value before that tolerance exists and a one-line server change turns into a multi-client break.
Behavior contracts from earlier in this chapter follow the same rule. Rename an error code from PAYMENT_DECLINED to CARD_DECLINED, and any client branching on that machine-readable code breaks. A client that saved a page token and came back later breaks when you switch cursor pagination from createdAt to id. Shorten idempotency-key retention and repeated writes start to behave differently. The same GET /orders request means something else once you flip the default sort order.
A route can stay stable while the contract breaks.
GET /orders?limit=50If the endpoint used to return newest orders first and now returns oldest orders first, the client receives valid JSON from the same route while the visible result changed, which makes it a breaking behavior change.
Request-side changes get the same treatment. Add a required field to an existing POST route and old clients break, because they keep sending the old body. An optional field can stay backward-compatible, as long as the server still accepts old bodies and the new field has a sensible default. Tightening validation breaks clients that were sending loose values the server used to accept. Loosening it can break downstream code that counted on the stricter input and fails on data it never expected.
Response-side changes need a consumer-by-consumer read. A nullable field is usually easier to add to generated models than a required one. Remove a field and readers break. A rename is a removal and an addition at the same time. Changing a field's type breaks decoders. If you drop a field from explicit null to absent, code that checks for an explicit null breaks. Move money from cents to decimal strings and the meaning changes even though both forms stay valid JSON. The syntactic category only tells you part of what happened.
Status codes belong to the contract too.
old: POST /payments -> 409 IDEMPOTENCY_CONFLICT
new: POST /payments -> 422 VALIDATION_FAILEDBoth statuses are client errors, and existing clients may branch on them differently. One path tells the user to retry with a different idempotency key, while another highlights form fields. The status code and the machine-readable code together are part of the behavior.
Pagination changes are often breaking because clients store position. An offset client may request page=4 after a user returns to a list. A cursor client may store cursor=abc and resume after a restart. Changing token encoding, token lifetime, sort key, or default filters can invalidate that saved position. A migration path might accept both token formats for a while and emit the newer format in responses. This counts as API evolution work, even when the handler's database query came out cleaner.
A short table makes the classification easier to run before you write any code.
change old clients new clients class
add optional field ignore/read fail read field depends
remove field fail or degrade use replacement breaking
new enum value unknown branch handle value depends
new route keep old route call new route additiveThe table centers on the consumer on purpose, because the same server change can fall into different classes for different consumers. A tolerant JavaScript client may ignore an added field. A strict generated Go client may reject the same field, and a partner SDK may expose it only after regeneration. The migration plan has to account for those differences.
Protobuf adds another layer, because the field number is the on-wire identity, the value that the encoded bytes actually carry. The field name only helps the generated code.
message Order {
string id = 1;
reserved 2;
string status = 3;
}reserved 2 stops anyone from reusing field number 2 after the field is gone. Reuse it for a new meaning and older binaries will decode the new data as the old field. An earlier subchapter covered protobuf encoding. Field-number changes need migration discipline, because generated clients and servers compile those numbers straight into code.
In protobuf the field number is the wire identity. Reuse a removed number for a new meaning and old binaries decode the new bytes into the old field, with no error, no type mismatch, and no log line. The corruption reaches persisted and in-flight data. Mark both the number and the old name as reserved when you delete a field, so a later author cannot reuse either one by accident.
GraphQL has a different contract form. The schema is the API surface, and clients choose fields from it.
type User {
fullName: String @deprecated(reason: "Use displayName")
displayName: String!
}The deprecated field still exists, so existing queries keep running while new clients move to displayName. Removing fullName later is the breaking change. The @deprecated marker tells consumers to migrate, and pulling the field is what actually breaks them.
Compatibility depends on who consumes the contract.
A browser frontend deployed with the backend can move quickly, while a mobile app installed on older devices moves slowly. A generated SDK adds compile-time coupling. Partner integrations bring their own release calendars and support agreements. Some consumers read very little of the response. A batch job may parse one field and ignore the rest, and a GraphQL client may ask for exactly two fields yet cache the result under operation names. A gRPC client can carry generated stubs for months.
Useful classification starts from the consumers, because the size of the server diff tells you very little.
Versioning Strategy
A versioning strategy answers two questions for you. Which contract the consumer is using, and how the server should route, validate, serialize, and document that contract. The label is only useful for what it selects.
Two API versions might live in one codebase, or in one handler that serializes differently per version, or in two separate deployments. They might share one OpenAPI document with tagged operations, or each get its own. Every one of those choices follows from the version label.
The most visible form puts the version right in the path.
GET /v1/orders/ord_9
GET /v2/orders/ord_9This is URI versioning. The version reaches the route table early, so reverse proxies and gateways can route by the prefix and logs show the version in the request target. Because the URIs differ, caches separate the two paths on their own. Documentation can publish v1 and v2 route groups without much extra machinery.
The cost is that the version leaks into every link and every route pattern. A resource identifier can end up tied to the versioned path, and a client may copy /v1 links into storage. Server code often duplicates route registration for small differences. Large teams handle that by treating the duplicated path as a compatibility branch over a shared internal implementation.
Header versioning keeps the path stable and places the version in an explicit request header.
GET /orders/ord_9 HTTP/1.1
Host: api.example.com
API-Version: 2026-06-01The route stays resource-oriented, and the handler or gateway reads API-Version to pick the contract behavior. Your logs have to capture that header, or the version disappears from basic request-path analytics. Caches need a Vary policy that lists the version header whenever responses can differ by it. And generated clients need that header baked into their transport layer or config by default.
Header versioning works well when you want stable URLs and strong client configuration. It gives weaker visibility at systems that only inspect paths. Many tools can handle it, but the team has to wire that visibility up on purpose.
Media type versioning uses content negotiation, the HTTP fields a client uses to ask for a particular representation. Usually that is Accept for the response schema, sometimes Content-Type for the request body schema.
GET /orders/ord_9 HTTP/1.1
Accept: application/vnd.example.order+json; version=2The version travels with representation selection, which fits APIs where one resource has several response representations. It does tie your versioning to HTTP content negotiation, something many teams use only lightly. Clients have to set more detailed headers, servers have to parse and validate them, and docs have to spell the media type contract out. Caches again need the right Vary, because the path alone does not say which representation came back.
Version negotiation is how the server picks a contract version out of the request. The request might name one exact version, give a date, send a supported range, or carry nothing and fall back to a default.
Exact versions are easy to audit.
API-Version: 2Date-based versions can work when an API publishes regular compatibility snapshots.
API-Version: 2026-06-01The server maps that value to a bundle of routes, schemas, serializers, error codes, and behavior choices. Many APIs also keep a default version, and defaults carry risk. A client that omits the version is running on an implicit contract, so changing the default can break every client that was quietly relying on it.
Treat the default version as an undocumented contract that some unknown set of clients depends on. Moving the default is a breaking change for every consumer that never sent a version header, even though no client code changed. Pin the default explicitly, log defaulted traffic, and migrate the default the same way you would retire a named version, with an announcement, a deprecation period, and enough time for consumers to move.
Response metadata should make the selected version visible.
HTTP/1.1 200 OK
API-Version: 2026-06-01
Vary: API-VersionThe response says which contract generated the body. Vary tells caches that the version header affects the representation. The exact header names are the API's own convention, unless a standard or platform rule is already in use. What counts is consistency across clients, gateways, docs, and caches.
The strategy drives your code layout too.
Path versions often map cleanly to route groups.
app.get('/v1/orders/:id', getOrderV1);
app.get('/v2/orders/:id', getOrderV2);That layout is simple to route and easy to delete later. It can also duplicate validation and authorization wiring if the team copies whole route modules. Shared domain work should sit past version-specific request and response adapters.
Header and media type versions often push branching into middleware, hooks, serializers, or gateway route rules. That keeps paths compact, but it can bury version behavior inside conditionals.
const version = req.headers['api-version'] ?? '2025-01-01';
const body = serializeOrder(order, version);The branch belongs near the contract edge. Push version checks down into domain logic and later removal gets harder, because business code starts depending on the client version. The handler should translate versioned API input into internal commands, then translate internal results back into versioned output.
Every versioning strategy still depends on compatibility discipline. URI versions can break old clients when /v1 changes behavior. Header versions fail when caches ignore Vary, and media type versions become unreadable once every resource invents its own format names. All the strategy gives you is a selection mechanism, and the contract still needs its own change rules.
Operational visibility should influence the choice. Path versions show up in basic logs, routing dashboards, CDN keys, gateway route stats, and support screenshots. Header versions need structured logging and gateway metrics that capture the header value, and media type versions need parsing at every layer that reports API usage. A team that has no current v1 usage data is running an under-instrumented versioning strategy.
Documentation layout changes with the strategy too. URI versioning often creates separate pages or route groups. Header versioning needs every operation to state the required version header and the default behavior. Media type versioning needs examples that show exact Accept and Content-Type values because clients copy examples. GraphQL may use schema deprecations more than URL versions. gRPC often uses package names, service names, and protobuf field evolution rather than a public /v1 prefix, although HTTP transcoding can add URI versions at the gateway.
Generated clients turn version strategy into package behavior. A generated TypeScript client can set a version header by default, a generated Java client can encode /v2 paths, and a gRPC client stub can carry a package namespace. The API version and the client package version should be documented as separate things.
api contract: 2026-06-01
client package: @example/api-client@4.2.0That mapping helps support. A consumer can be using the newest npm package against an older API contract if the package supports multiple versions. Another consumer can be using an old package that defaults to a deprecated version. Version strings need enough context for that conversation.
Caching can turn a versioning bug into a stale-data bug. URI versioning gets separate cache keys on its own, through separate paths. Header and media type versioning depend on a cache config that varies by the selected header. If the response body differs by API-Version and the cache key ignores that header, one consumer can receive another version's representation. The server response needs metadata that matches the strategy, and the gateway or CDN needs cache-key rules to match. Full cache design comes later, but contract selection still has to expose the right signal.
Header and media-type versioning share one URL across versions. A cache that does not key on the version signal can hand one consumer another version's body. Set Vary and the gateway or CDN cache key to include the exact version header or media type whenever responses can differ by it. Forget this and a single cached entry gets served to every client on the wrong version.
Rollout behavior also differs. URI versions let a gateway route /v2 to a new upstream while /v1 stays on the old one. Header versions let the same route split by version metadata. Media type versions can split by Accept, which works only when the gateway parses it correctly. A path split is simpler for operators, while a header or media type split keeps resource URLs cleaner. The right choice comes from client behavior, gateway capabilities, cache behavior, and documentation workflow.
Date-based strategies need one clarification. Dates should represent published contract snapshots rather than deployment timestamps. A deployment can happen ten times in one day while the contract version remains 2026-06-01. Another deployment can add a backward-compatible field while keeping the same date because old consumers still work. The date should move when the published contract set changes in a way clients need to request or understand.
Deprecation, Sunset, and Migration
Deprecation is a promise about what happens next. A field, route, value, or behavior stays available for now, and it ships with a preferred replacement or a removal plan.
The deprecation policy writes that promise down. It has to answer the mechanical questions. Which changes get marked deprecated, how consumers see the warning, how long the old behavior stays live, what usage data gates removal, which response or schema artifacts carry the status, and who signs off on the final removal.
Sunset is the removal side of that plan. A sunset policy states when a deprecated contract stops being supported and what the API returns after that point. It can apply to a whole version, one route, one field, one enum value, or one behavior path.
Deprecation on its own leaves stale branches sitting in the code, and a sunset with no warning period before it causes outages.
Every contract element moves through a lifecycle of states, and a practical set of them stays small.
experimental -> active -> deprecated -> sunset -> removedExperimental means consumers should expect change under stated terms. Active means the contract is supported. Deprecated means the contract is still accepted or returned while consumers migrate. Sunset means the removal date or removal condition has been published. A removed contract path has gone away or returns a planned error.
Migration is whatever work the consumer has to do to move from one contract path to another. The server team can publish a replacement, but the consuming code still has to regenerate clients, release apps, change parsing logic, rotate feature flags, update tests, and deploy.
A compatibility matrix keeps those facts visible.
consumer supported versions target
web-app 2026-06-01, 2026-01-01 2026-06-01
ios-app 2026-01-01 2026-06-01
partner-sdk v1 v2The matrix can use any format the team keeps current. It needs one owner and current data. It should show which consumers still use the old contract, which replacement they should adopt, and which versions the API still supports.
Telemetry plays a part here. Before you remove a deprecated route or field, you need evidence that traffic has actually moved off it. That evidence usually comes from request logs, version headers, client identifiers, gateway metrics, or generated-client user agents. Chapter 29 covers the observability mechanics in full. For versioning, the narrow point is that removal decisions need usage signals.
Deprecation also needs response behavior.
A deprecated endpoint might keep returning success and add warning metadata. A deprecated request field might be accepted and then silently ignored. A response field can stay populated right up until the sunset window closes, and a deprecated enum value can still be accepted in writes while responses stop using it. Whatever you pick belongs to the contract, and it has to be explicit.
Generated clients change the migration schedule. A client generated from a new OpenAPI document may lose access to a deprecated field if the generator omits it or changes its type. A GraphQL client may show deprecation warnings in editor tooling while still compiling. A gRPC client regenerated from a .proto file may keep reserved fields out of new code. Those artifacts can help consumers move, but they can also break builds earlier than the runtime API would.
The migration window should account for release mechanics.
Server-to-server consumers can usually deploy within days. Mobile apps can take weeks or months because old binaries keep running. Third-party integrations may need a contract period. Batch jobs may run weekly and only touch the deprecated path on certain data. Removing a field after seven days may show zero recent request traffic and still break the monthly job.
A short window with no recent traffic hides low-frequency consumers. Weekly batch jobs, monthly reconciliations, partner integrations that only touch the deprecated path on certain data. Gate removal on both a schedule and an evidence condition, for example deprecated for 90 days and zero usage for 30 days. Make the observation window longer than your slowest consumer's run interval, or you will miss that consumer entirely.
Lifecycle policy should name both a time and a piece of evidence. "Deprecated for 90 days" is a schedule. "Deprecated until usage reaches zero for 30 days" is evidence. Many teams need both.
Removal should also be predictable at runtime. A removed route should return a planned error response, and a removed version should have a clear gateway or handler rule. A removed field should fail validation when clients still send it, or be ignored only when the contract says so. Accidental route misses produce noisy 404 behavior and confused consumers.
A migration plan usually runs along one of three paths.
The first path is dual-read, or dual-acceptance, at the contract layer. The API takes both the old and new request structures and normalizes them into one internal command. That works for field renames, enum replacements, and token format changes, as long as the old meaning still maps onto the new one.
const displayName = body.displayName ?? body.name;
updateUser({ displayName });The handler accepts both fields and writes one internal value. This is compatibility code, and it should carry a deletion date or deletion condition, because every accepted input structure becomes part of the API surface during the migration.
The second path is dual-write or dual-emit. The API emits old and new response fields together while consumers move.
{
"displayName": "Asha Rao",
"name": "Asha Rao"
}Old clients read displayName while new clients read name. The server owns consistency between the two until the old field sunsets. This path is common, though it should stay short-lived, because two fields with the same intended meaning can drift when later code updates one serializer branch and misses the other.
The third path is a version split. The API routes old consumers down one contract path and new consumers down another. That split can live in the gateway, the framework router, or the serializer. It costs more than dual fields, but each contract stays cleaner.
API-Version: 1 -> serializeOrderV1(order)
API-Version: 2 -> serializeOrderV2(order)Version split fits larger representation changes, behavior changes, and removal of old fields. It also gives usage data by version, which helps sunset decisions.
Deprecation metadata should sit close to the response. A GraphQL field can carry @deprecated, and an OpenAPI description can mark a parameter or operation as deprecated. A response can include API-specific deprecation metadata when the API standardizes it, and a documentation page can explain the replacement steps. Those channels should agree. Mixed signals cause slow migrations, because consumers trust the artifact they see most often.
The deprecation policy should also define request behavior for old clients after sunset. A removed version can return 410 Gone, a version-specific error envelope, or another documented status. A removed field in a request body can fail validation with a machine-readable code, and a removed GraphQL field will fail validation before resolver execution. A removed gRPC field number should stay reserved in the .proto file even after generated code stops exposing it.
Consumer communication has a mechanical part. Tell consumers which contract element is deprecated, which replacement exists, which versions are affected, how to detect usage, how to test the new path, and when removal happens. A vague changelog entry creates support load, while a migration note with old and new request examples gives consumers a concrete diff.
The migration plan should also spell out how rollback works. If the server removes /v1 and a major consumer is still calling /v1, the team needs an answer ready. That answer might be "restore the route," "extend the gateway rule," or "contact the consumer and keep returning the planned sunset error." It belongs in the policy before the removal deploys.
Gateways in the API Path
An API gateway is a reverse proxy, but tuned for API traffic. It takes the client request, matches it to a gateway route, attaches whatever policy that route carries, and forwards it to a backend service or a local handler that implements the contract.
Chapter 10 covered reverse proxies as HTTP intermediaries. A gateway adds routing that is aware of the contract. It can tell that /v1/orders and /v2/orders belong to different upstreams, normalize request headers before the backend ever sees them, pick a backend from version metadata, and pin a coarse policy onto a whole route group. The security enforcement details are Chapter 25's job. The point here is placement. The gateway can sit in front of every framework handler in the request path.
A gateway route is just a matching rule that runs at the gateway. It can match on host, path, method, headers, or media type. What comes out is an upstream target plus whatever request transformations the gateway owns.
A gateway policy is config attached to that route or route group. For versioning, keep the meaning small. It covers version selection, header normalization, path rewrite, request-size limits in passing, and which backend target to use. Authentication checks, OAuth token validation, WAF rules, abuse controls, and rate limits all have later owners.
Routing happens in two stages.
client
-> gateway route
-> version selection
-> upstream service
-> framework route
-> handlerThe gateway route runs before the Node framework route. A gateway might receive GET /v2/orders/ord_9, strip the /v2, add API-Version: 2, and forward GET /orders/ord_9 to an internal service. The framework route never receives the original request. It receives only the upstream request the gateway produced.
This works when the gateway and service share a clear contract. It gets harder when the gateway takes on too much transformation. Say the gateway rewrites a v2 request into a v1 backend path and patches the response body on the way out. The backend looks stable, but the API contract now depends on proxy-side transformation. That can be acceptable during a migration, as long as you name it as compatibility code, test it as contract code, and remove it when the migration is done.
The gateway carries its own runtime state, including a listener, route config, upstream connection pools, timeout rules, and header normalization logic. It may stream request and response bodies, or buffer them for a transformation. Each of those choices changes where a failure shows up.
A gateway timeout can mean a backend connection failed, response headers came late, or streaming stalled. A gateway 404 can mean the request missed the gateway route table even though the backend framework has a matching route. A backend 404 is different. It means the gateway matched and forwarded, and then the service route missed. Your logs have to tell those two owners apart.
Path rewriting is a common source of confusion.
public: GET /v2/orders/ord_9
upstream: GET /orders/ord_9The client contract has /v2 in it. The contract between gateway and backend has /orders. Both are real, and debugging needs both. A request log that records only the upstream path loses the public contract, and a gateway log that records only the public path loses the backend route.
Header normalization works the same way. During a migration, a gateway might accept several client version signals at once.
/v2/orders/ord_9
API-Version: 2
Accept: application/vnd.example.order+json; version=2The gateway can collapse all three into one internal header, say x-api-contract-version: 2. That keeps backend code simple. It also gives the gateway the job of rejecting ambiguous requests. A request that sends /v2 and API-Version: 1 should fail with a planned client error instead of silently picking whichever value got parsed last.
When a request carries conflicting version signals, a /v2 path plus API-Version: 1, or a mismatched media type, reject it with a planned client error instead of letting parse order decide. Implicit precedence rules, "path wins" or "header wins", route the request to a contract the client never asked for. The failure then surfaces as wrong data far downstream instead of a clear 4xx at the edge.
Body handling is the next decision point. A gateway that routes only by method, path, and headers can stream bodies straight through with backpressure, the flow control that slows reads when the far side cannot keep up, using the proxy mechanics from Chapter 10. A gateway that rewrites JSON bodies has to buffer or parse them first, which pulls payload validation and memory risk into the gateway. For a version migration, keep the gateway deciding on headers, paths, and response schema. Body transformations can help during short migrations, but they turn the gateway into contract code that needs the same review as a handler.
Coarse gateway policies keep route ownership visible. A route group can say "all /v1 traffic goes to orders-v1," or "all requests with API-Version: 2026-06-01 go to orders-current." That is a different decision from the backend handler picking which serializer to run. The gateway selects the upstream contract path, and the backend implements the contract it was handed.
Small gateway configs usually read as routing tables, even when the real product has its own syntax.
routes:
- match: { pathPrefix: "/v1/orders" }
upstream: "orders-v1"
- match: { pathPrefix: "/v2/orders" }
upstream: "orders-v2"The rule picks an upstream from the public path prefix. A real gateway needs more than this, host matching, method handling, timeouts, header rules. The compatibility point is narrower than all of that. Version routing gets decided before the request ever reaches the service.
Putting a gateway in front helps a migration, because the API keeps one public endpoint while the backend targets move around. Old clients route to old handlers, and new clients route to new ones. One version can route to a compatibility facade. The gateway can shadow-read a new service for testing, though those mechanics come later, and it can cut traffic to a sunset version in one place and return the planned error from there.
The gateway still handles only a slice of API evolution. The contract itself still needs schemas, docs, generated clients, error behavior, pagination behavior, and migration policy. The gateway enforces or selects some of those, and the rest has to live in artifacts that humans and tools can read.
Down at the mechanism level, this is a two-hop request path with contract selection wedged between the hops.
The client opens a connection to the public endpoint. TLS may terminate before the gateway or at it, depending on the deployment. The gateway reads the HTTP request and builds a request context from method, host, path, query string, headers, and sometimes client metadata that trusted infrastructure supplied. At this point the body bytes may still be unread. Version selection should run off metadata that arrives early whenever it can, because selecting early keeps streaming intact.
The route matcher walks its configured rules. Host rules usually run first, because the same path can mean different APIs under different hostnames. Method rules narrow it down. Header or media-type rules can pick a version. The route that wins carries the upstream target and its transformations. When several routes match, each gateway product has its own priority rules, so API teams should still make the intended winner obvious through route specificity and tests.
Once a route is chosen, the gateway builds an upstream request. That request has its own method, path, headers, timeout budget, and connection to acquire. A URI-versioned public request can turn into an internal path with the version stripped off. A header-versioned request can gain a normalized internal header, and a media-type-versioned request can pass Accept through untouched or have it converted into internal version metadata.
What happens to the body depends on whether anything transforms it. With pure routing, the gateway streams the inbound body to the upstream request. Backpressure then travels through the same stream chain Chapter 10 covered. Upstream socket pressure pauses the gateway's reads, those paused reads pause the client-side request stream, and TCP flow control slows the sender down. With body transformation, the gateway has to parse or buffer at least part of the body before it forwards anything. That shifts memory use, validation ownership, and the timing of failures.
Responses run the same path in reverse. The upstream service returns headers and a body. The gateway can pass them through, normalize headers, remap status and error envelopes, or transform the body. A pure routing gateway streams the response the moment headers land. A transforming gateway often waits for the whole body before it can emit a corrected public representation. That wait costs latency, and it changes what happens when the upstream response is large or only partial.
Keep version-routing transformations shallow where you can. Header normalization and path rewrite are easy to reason about. Response field mapping is heavier, and request body translation is heavier still. Error mapping is the one people forget, because the success path gets all the attention. A gateway that maps success responses between versions also needs a plan for validation errors, conflict responses, pagination errors, and upstream failures.
The gateway's logs need to capture both sides of the path.
public=/v2/orders/ord_9
upstream=/orders/ord_9
version=2
target=orders-v2Each of those four values answers a different question. public is the client contract, upstream is the backend route, and version is the API contract that got selected. target names the service or pool that handled the request. Drop any one of them and migration debugging gets harder than it needs to be.
Gateway policy also creates an ordering problem. Version selection has to run before any auth or rate policy that varies by API version, though Chapter 24 and Chapter 25 cover those details. For now, the point is that policy attachment depends on route identity. Misclassify the version at the gateway and every later enforcement step, along with the logs, attaches to the wrong contract path.
Config deployment is part of the mechanism too. Gateway routes usually deploy on their own schedule, separate from service code. That gives teams flexibility and adds coordination risk between the gateway contract and the backend contract. A gateway can start routing v2 before the backend that serves it is even deployed. A backend can drop a compatibility serializer while the gateway is still sending it v1 traffic. Release plans should treat gateway config as code that takes part in the API lifecycle.
Gateway Version Routing
Gateway version routing turns the versioning strategy into a concrete upstream path.
URI versioning is the easiest to picture.
client -> /v1/orders/ord_9 -> gateway -> orders-v1
client -> /v2/orders/ord_9 -> gateway -> orders-v2The public path decides the upstream. The gateway route table can make that call from the request target alone. The backend services can keep the version prefix in their internal paths, or drop it and take the version through metadata instead. Either way, the public contract stays at the gateway.
Header versioning moves the choice into metadata.
GET /orders/ord_9
API-Version: 1 -> orders-v1
API-Version: 2 -> orders-v2Now the gateway has to parse the header before route selection can finish. It also needs a fallback for requests that arrive with no version. Make that fallback explicit. Either reject missing versions once the migration period is over, or route them to a named default and log them as defaulted traffic. A hidden default quietly becomes an undocumented version.
Media type versioning adds parsing work.
Accept: application/vnd.example.order+json; version=2The gateway can parse the Accept value and pick an upstream from it, or it can pass the header through and let the backend pick the serializer. Both work. The difference is who owns the choice. Gateway selection gives operations a clean route split, while backend selection keeps representation logic next to the code. Do both and you need exact rules, because the gateway and backend can disagree about the same request.
Version routing can also send two versions to one service.
routes:
- match: { header: { API-Version: "1" } }
upstream: "orders-api"
- match: { header: { API-Version: "2" } }
upstream: "orders-api"Here the gateway splits the contract versions apart for visibility while one backend handles both. That helps when the service already has versioned serializers and validators inside a single deployment. The gateway logs still show version adoption, and if the implementation later changes, the gateway can route v2 somewhere else without touching clients.
Path rewrites need discipline during version routing.
public: /v2/orders/ord_9
internal: /orders/ord_9
version: 2That mapping is fine when the backend gets the version through a header or route context. It gets risky when the backend gets no version at all and falls back to an internal default. The gateway stripped the public version off, so the backend lost its contract selector.
Response transformations carry more risk than request routing. Suppose v2 renamed displayName to name, and the gateway patches the v1 backend response for v2 clients. That gains migration time. It also means response validation has to run against the public v2 contract after the transformation rather than before it. Skip that and the gateway can emit a response that no backend test ever covered.
Rewrites run into errors too. If the backend returns a v1 machine-readable error code, the gateway may have to map it to the v2 error contract. Map the success response but leave the error mapping stale and you have shipped a partial version. Clients reach those paths during failures, when a split contract is hardest to diagnose.
A gateway or facade that maps success responses between versions has to map the failure side too, including validation errors, conflict and idempotency codes, pagination errors, and upstream failures. Map only the success path and you ship a partial version that breaks during failures. Run response validation against the public version contract after the transformation, error envelopes included.
Cursor formats need extra care. A gateway can route old cursors to old backends and new cursors to new backends only if it can tell the cursor versions apart safely. The client should still treat a cursor as opaque, but the server side needs a parser or a prefix that can route old tokens during the migration. Change the cursor format in the middle of a path rewrite and any client that stored an old page token gets broken resume behavior.
One clean pattern is to version tokens internally.
cursor=v1.aWQ6OTk=
cursor=v2.c3RhcnQ6MjAyNi0wNi0wMQ==The client still sees an opaque cursor. The server or gateway routes on the prefix. That prefix is part of the server-side migration design, an internal contract selector that clients never have to know about.
Gateway routing gives an API team flexibility in implementation. The compatibility obligations do not go away. If v1 and v2 differ, both contracts need tests, docs, generated artifacts, and removal plans. The gateway is one place where the decision gets executed, and not the whole job.
Traffic splitting needs a scope limit here. A gateway can send all v1 traffic to one upstream and all v2 traffic to another. Some platforms can also split a percentage of one version's traffic across two deployments. Percentage rollout is a deployment topic, and load-balancing algorithms come later. The API design concern is narrower. The same version should keep the same contract while the deployment behind it changes.
Route order can cause accidental breaks. A broad route like /v1/* can catch a more specific sunset route like /v1/legacy-report when the priority is wrong. Header matching can collide when a client sends both API-Version and a media type version. Loose wildcard host rules can send staging clients to production upstreams. These are contract bugs that live at the gateway while the backend code is perfectly healthy.
A good route table makes ambiguity explicit.
host=api.example.com path=/v1/orders/* -> orders-v1
host=api.example.com path=/v2/orders/* -> orders-v2
host=api.example.com path=/v1/reports/* -> sunset-handlerThe table shows three contract paths. The sunset handler is a deliberate route. It returns the planned error envelope with migration instructions. Consumers get a predictable failure, and support gets a clear owner for it.
Version routing reaches generated clients too. A client built for URI versioning bakes /v2 into its paths. A header-versioned client sets the header in one transport wrapper, and a media-type-versioned client sets Accept and Content-Type per operation. If the gateway expects one strategy and the generated client emits another, every request can fall through to the default. Treat client generation templates as part of the gateway contract.
BFFs and Compatibility Facades
A backend-for-frontend, or BFF, is an API surface built for one specific client category. The usual categories are web app, mobile app, admin UI, partner integration, or device class.
A BFF helps API evolution, because one client group can move while every other group stays on its current representation. The mobile BFF can hold onto old fields while the web BFF moves to a newer one. The admin BFF can expose operational views that never appear in the public API, and the shared backend domain code keeps its own internal structure underneath all of them.
A small example.
GET /mobile/v1/orders/ord_9
GET /web/v2/orders/ord_9Those paths carry two facts, the client category and the contract version. Some teams move the client category into hostnames instead, like mobile-api.example.com. The placement counts for less than the ownership. A BFF owns its client-specific representation and its own migration timing.
BFFs cost something. They multiply the number of API surfaces, they can duplicate validation and error behavior, and they can scatter divergent business rules across client-specific code. Draw the line at representation and workflow fit, instead of standing up a BFF for every client that asks for a custom field.
A compatibility facade is a layer that keeps presenting an old contract after the implementation behind it has changed. It can be a handler module, a service, or a gateway-backed adapter. It takes old requests, translates them into the new internal command or query, then serializes the response back into the old structure.
For example, v1 might expose fullName while the new internal model stores givenName and familyName.
{
"id": "user_42",
"fullName": "Asha Rao"
}The facade assembles that old response from the newer data. The old client keeps its contract, while the new implementation sits on the far side of the translation. Give the facade an owner and a removal plan, because a compatibility branch tends to live on until someone deletes it on purpose.
The strangler pattern is a migration approach where traffic moves from an old implementation to a replacement one route, capability, or consumer group at a time. In API evolution, the gateway or facade usually controls that movement.
/v1/orders -> old service
/v1/payments -> old service
/v2/orders -> new service
/v2/payments -> old serviceThat table shows a half-finished migration. Orders moved to the new service, and payments stayed on the old one. The public API can surface that movement as a new version, or the gateway can hold the public contract steady while one backend capability quietly changes target. Either choice needs clear contract ownership.
Strangler migrations go well with compatibility matrices. Route by route, the matrix shows which consumers still depend on old behavior, which routes have moved, and which contract artifacts are current. Keeping that list current keeps the compatibility code that is still in service visible.
BFFs and facades are API-contract work rather than service decomposition. Chapter 32 covers how services get divided. The concern here is narrower. Client-specific surfaces and compatibility layers can keep old consumers working while the internal implementation changes underneath them.
BFFs also move where versioning shows up. A mobile BFF can hold a stable /mobile/v1 path for years while the shared backend cycles through several internal contracts. The mobile API version then names the mobile representation rather than the internal service version. That split shows up in docs and logs. A mobile support ticket should cite the mobile BFF contract, and a backend deployment ticket should cite the internal service contract.
Facades work best when the translation is mechanical, such as a field rename, a field split, a field join, a status-code mapping, or an old cursor parser feeding a new query path. Those are all manageable. A facade that rewrites business meaning, though, becomes a second implementation of the product rules, and that tends to produce conflicting behavior between old and new consumers.
Put validation on the public side of the facade. The facade receives an old contract, so it should validate the old request structure and return old error envelopes. Once that passes, it translates into a new internal command, and the newer backend validates its own internal input. Old consumers get old errors, and the new internals keep their current structure.
Response serialization follows the same rule. The facade can call new code, then serialize old fields and old error codes on the way out. What it should not do is pass raw new errors straight to old consumers, because those errors can carry new machine-readable codes or new validation paths. A compatibility facade owns both the success and the failure representation.
Deleting a facade should be a routine lifecycle event. When traffic hits zero and the sunset window closes, remove the gateway route, the facade handler, the old schemas, and the old generated-client docs in one pass. Partial deletion leaves confusing artifacts behind. An OpenAPI document that still lists v1 after the gateway rejects it is schema drift, and a gateway route that still points at a deleted facade is an outage. A facade that outlives its docs keeps an undocumented contract alive.
Contract Artifacts During Evolution
Contract artifacts are the files and generated outputs that turn API design into build-time and runtime behavior. For evolution work, the main ones are OpenAPI documents, JSON Schemas, GraphQL schemas, .proto files, generated clients, generated server types, validators, serializers, examples, and changelogs.
An API change should move through those artifacts in a controlled order.
contract change
-> spec/schema/proto
-> generated client
-> server validator or stub
-> examples and docs
-> release notesThe exact order depends on whether the workflow is contract-first or code-first, but the whole artifact set has to agree before consumers depend on the change. Change the handler and leave the OpenAPI document stale, and you get schema drift. Update the schema while the server serializers stay stale, and you ship generated clients that expect fields the server never sends.
In OpenAPI, versioning can show up as separate documents, separate server URLs, path groups, header parameters, or media type definitions. Match the choice to the versioning strategy. URI versioning usually publishes /v1 and /v2 paths, header versioning should document the version header as a required parameter once the default window closes, and media type versioning should spell out the exact request and response media types.
Generated API clients bake in assumptions. A single client can encode route paths, required fields, enum unions, response types, and error schemas. When the contract changes, the client package version changes with it. That package version is its own thing, separate from the API version, but the two interact. A client@4.2.0 package might target API contract 2026-06-01. When generated clients are part of the support model, the compatibility matrix should capture that mapping.
GraphQL evolution runs through the schema. Adding a nullable field is usually additive. Adding a required input field to an existing mutation breaks old callers, because they have no value to supply for it. Removing a field breaks persisted queries and generated types. Deprecation markers hand clients and tooling a migration signal while the field stays callable.
GraphQL has one extra pressure point. Clients choose their own fields. A server team can believe a field has low usage because one frontend stopped rendering it, while a different operation still selects it for a cached path. Schema usage needs operation-level data before any removal. The contract-test mechanics come later, though the design call is here. A GraphQL field stays until the consumer queries have actually moved off it.
Protobuf evolution runs through .proto files and generated stubs. Adding a new field with a new field number is the ordinary backward-compatible move. Renaming a field but keeping its number mostly affects generated code and JSON mapping, depending on how it is used. Changing a field number changes the wire contract. Reusing a removed number can corrupt the meaning of the data. Reserving the removed numbers and names stops a future author from reusing them by accident.
The artifact flow should include examples. Examples are the client code humans actually copy. Show an old error code in the docs after the API changed and someone will paste it into a branch. Say "renamed field" in the changelog while the OpenAPI document still shows both fields, and generated clients and humans end up reading different signals.
Testing gets only a brief mention here. Contract tests and consumer-driven contract tests are Chapter 28's territory. For API evolution, the point is that every supported contract version needs some automated evidence behind it. A route that still accepts v1 traffic should rest on more than a stale handler branch.
Release notes count as a contract artifact too. They are less formal than schemas, but consumers read them to decide whether to upgrade. A useful changelog entry names the old behavior, the new behavior, the affected versions, the migration action, and the sunset date if there is one.
deprecated: User.fullName
replacement: User.displayName
sunset: 2026-12-01
affected: GraphQL schema 2026-06-01That is enough for a consumer to grep its codebase and plan a release. It also gives support teams a stable reference when old clients keep hitting the deprecated path.
Examples need version labels. A curl example with /orders and no version header teaches consumers to leave the version off. A JSON response example that carries both old and new fields should say which migration window it belongs to. A GraphQL example should show deprecated fields only in a section about migration, and a protobuf example should show the reserved field numbers after a removal.
Artifact ownership should be explicit. If OpenAPI is contract-first, the spec change leads and the server code follows it. With code-first generation, the code change leads and the generated docs have to ride in the same review. Hand-authored GraphQL schema files make schema review the same thing as API review. And when .proto files sit in a shared repository, package release cadence becomes part of the API lifecycle.
Schema drift during evolution shows up in predictable ways. The server accepts a new request field before the spec documents it. The spec documents a new response field before the serializer emits it. A generated client rejects a field the server now returns. The gateway routes v2 while the docs still show v1. A changelog mentions a deprecation that the GraphQL schema never marked. Every one of those mismatches points consumers at a different source of truth.
Contract artifacts should converge before a public release. Plenty of toolchains can help with that convergence, but the release path still needs a checklist that covers the route rule, the schema, the generated client, the examples, the deprecation metadata, the measured usage of the old version, and the defined sunset behavior. Chapter 28 covers automated contract tests. The design chapter only needs the ownership clear enough that those tests have something stable to check against.
Where API Design Ends
API design ends once contract decisions turn into production controls.
A route version is API design. The routing table that picks /v1 or /v2 can live in a gateway. From there the work hands off to other systems. WAF inspection of malicious payloads is security, OAuth token validation is identity, and rate limiting is abuse control. Distributed tracing is observability, Kubernetes ingress and service mesh routing are platform engineering, and load-balancing algorithms are scaling.
Those later systems still touch API evolution. A gateway can reject sunset versions, logs can surface old client usage, and auth policy can vary by resource. Traces can show which backend served which version, and platform routing can shift traffic between deployments. None of them can enforce anything useful until the API contract itself is legible.
The final check is mechanical. A consumer sends a request, and the API version gets selected. The gateway and the service agree on that version. The validator accepts the documented input. The handler runs internal code while keeping the version branches out of the core domain logic, and the serializer emits the documented response. Errors use the documented envelope and codes. Deprecation and sunset metadata line up with the lifecycle policy, and the contract artifacts match the runtime behavior.
That chain holding is what keeps API evolution controlled engineering work. When it breaks, the version labels stop meaning anything, and your consumers have to work out what changed from the errors they get back.