Service Communication

Every microservice boundary is a communication decision. REST, gRPC, GraphQL, queues — none of them is “the right answer.” The right answer is whichever survives your traffic, your team, and your failure modes.

Medium30 min read

Why Communication Patterns Matter

Why Service Communication Matters

The Problem: In a monolith, “calling another module” is a function call — nanoseconds, no failure modes you didn’t already model. The moment you split into services, every internal call becomes a network round-trip with its own latency, its own failure modes, and its own contract version skew.

The Solution: Pick the protocol intentionally per call. REST when humans and partners read your API. gRPC when latency and strict typing matter. GraphQL when many client shapes hit the same data. Async messaging when the caller doesn’t need (or shouldn’t need) a synchronous answer.

Real Impact: The wrong choice metastasizes. A REST call where an event would have been right turns into a sync cascade that takes the whole site down when one downstream stalls. An event where a REST call would have been right turns into mysterious eventual-consistency bugs that QA can never reproduce.

Real-World Analogy

Think about how a restaurant kitchen actually communicates:

  • Server → line cook = a synchronous call. The server stands at the pass and waits for the plate.
  • Server → bar = a queued message. The drink ticket goes up; the server walks away and comes back.
  • “Fire table 12” over the headset = a broadcast event. Everyone who needs to act on it hears it.
  • Hot line shouting back “heard!” = an explicit acknowledgement. Without it, the order is lost.

A microservice mesh is the same: every interaction is one of these shapes. Picking the wrong shape doesn’t just feel awkward — it produces real, expensive failures during a Saturday-night rush.

This tutorial walks the four shapes that cover essentially everything you’ll build: REST, gRPC, GraphQL, and asynchronous messaging. For each one we’ll cover when to reach for it, what it costs, and the production gotchas that hurt the most.

Synchronous vs Asynchronous

Before the protocol choice, the bigger choice: do you need an answer now, or do you need the work to eventually happen? That’s the only question that matters at this layer.

AspectSynchronous (REST, gRPC, GraphQL)Asynchronous (Queues, Streams, Events)
CouplingCaller and callee both have to be upProducer publishes; consumer can be down or slow
Latency contractEnd-to-end latency = sum of every hopBest-effort; consumer drains the queue at its pace
Failure modeOne slow downstream blocks the whole chainProducer keeps emitting; consumer backlog grows
BackpressureManual — you build it (timeouts, breakers)Free — the queue is the buffer
DebuggingOne trace, one stack — easyDistributed trace + correlation IDs required
Right forUser-facing reads, anything the caller needs an answer toBackground work, fan-out, pipelines, event sourcing

Reach for synchronous when

  • The caller cannot continue without the answer (login, search, cart price).
  • The data is small and the latency is tight.
  • You need to surface a clean error to the end user (“card declined”).
  • The interaction is naturally one-shot — not a pipeline.

Reach for asynchronous when

  • The work is “fire and forget” from the caller’s point of view (send email, write to analytics).
  • The producer’s rate and the consumer’s rate are different.
  • Many services need to react to the same business fact.
  • Failure of the consumer must not take down the producer.

The most common mistake is treating these as a strict dichotomy. Most production paths are both: a synchronous request returns “accepted, here’s your tracking ID,” and the heavy work happens asynchronously behind it.

REST: The Workhorse

Why REST Wins by Default

The Problem: Every team has different languages, different tooling, and external partners who’ll never read your .proto file.

The Solution: REST is the lowest-common-denominator protocol that every HTTP client on Earth speaks. Plain JSON over HTTP/1.1 or HTTP/2. Every CDN, every proxy, every browser, every curl works. The cost is verbosity and weak typing — usually a fair trade.

REST is what you should choose when in doubt. It is also what you should choose when partners or third-party clients will use the API — OpenAPI is the lingua franca for documentation, mock servers, and SDK generation.

Resource design that actually scales

# Flask example: a small but well-shaped REST resource.
from flask import Flask, jsonify, request, abort

app = Flask(__name__)

products = {
    1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50},
    2: {"id": 2, "name": "Mouse",  "price": 29.99,  "stock": 200},
}

@app.route("/v1/products", methods=["GET"])
def list_products():
    # Pagination — never return unbounded lists.
    limit  = min(int(request.args.get("limit", 50)), 200)
    cursor = int(request.args.get("cursor", 0))
    items  = [p for p in products.values() if p["id"] > cursor][:limit]
    next_cursor = items[-1]["id"] if items else None
    return jsonify({"items": items, "next_cursor": next_cursor})

@app.route("/v1/products/<int:product_id>", methods=["GET"])
def get_product(product_id):
    product = products.get(product_id)
    if not product:
        # RFC 7807 problem+json — see the error handling section below.
        return jsonify({
            "type": "https://errors.example.com/not-found",
            "title": "Product not found",
            "status": 404,
            "instance": f"/v1/products/{product_id}",
        }), 404
    return jsonify(product)

@app.route("/v1/products", methods=["POST"])
def create_product():
    # Idempotency key makes POST safe to retry.
    idem = request.headers.get("Idempotency-Key")
    if not idem:
        abort(400, "Idempotency-Key header required")
    if idem in idempotency_store:
        return idempotency_store[idem]   # replay the exact same response

    data = request.get_json()
    new_id = max(products) + 1
    products[new_id] = {"id": new_id, **data}
    response = (jsonify(products[new_id]), 201)
    idempotency_store[idem] = response
    return response

Errors are types, not just status codes

HTTP status codes carry the category (4xx vs 5xx, retryable vs not). The body carries the specifics — which field, which constraint, what to do next. RFC 7807 (application/problem+json) is the de facto standard shape:

{
    "type": "https://errors.example.com/insufficient-stock",
    "title": "Insufficient stock",
    "status": 409,
    "detail": "Requested 50 units of SKU-1234; only 12 available.",
    "instance": "/v1/orders/abc-123",
    "available_stock": 12
}

Stripe’s API is the canonical example to imitate: every error has a stable type, a human-readable message, and structured fields the client can branch on without parsing English.

The OpenAPI contract is your real API

Hand-written API docs lie within the week. An OpenAPI (formerly Swagger) spec is the only documentation that stays honest, because every other artifact — mock servers, SDKs, contract tests, gateway routing — is generated from it. Treat the spec as code: review it in PRs, version it, and break the build when handlers don’t match it.

gRPC and Protobuf

Why gRPC for Internal Calls

The Problem: JSON over HTTP/1.1 is fine for one call to your server. It is wasteful for a service mesh making millions of internal calls per second — serialization is slow, payloads are large, and there’s no native streaming.

The Solution: gRPC ships binary Protobuf payloads over HTTP/2 multiplexed streams, with code generation in 11+ languages. The contract is the .proto file — the server and client are both generated from it, so you can’t accidentally drift.

gRPC pays off when you have many internal services calling each other a lot. The wins are concrete: smaller payloads (binary encoding, no field names), lower CPU (fast codegen-based serialization), real streaming (server, client, and bidirectional), and a typed contract that catches mismatches at compile time instead of in production.

FeatureREST + JSONgRPC + Protobuf
TransportHTTP/1.1 (mostly)HTTP/2 streams
PayloadJSON textProtobuf binary
SchemaOptional (OpenAPI)Mandatory (.proto)
StreamingServer-sent events / hacksNative, all four directions
BrowserNativegRPC-Web proxy required
Debug-with-curlYesNo (need grpcurl)
Throughput (typical)Baseline5–10x baseline

The contract: a .proto file

// product.proto — this file IS the API.
syntax = "proto3";

package ecommerce.v1;

service ProductService {
    rpc GetProduct (ProductRequest) returns (ProductResponse);
    rpc ListProducts (ListProductsRequest) returns (stream ProductResponse);
    rpc CreateProduct (CreateProductRequest) returns (ProductResponse);
}

message ProductRequest {
    int32 product_id = 1;
}

message ListProductsRequest {
    int32 page_size = 1;
    string page_token = 2;
}

message CreateProductRequest {
    string name = 1;
    double price = 2;
    int32 stock = 3;
}

message ProductResponse {
    int32 id = 1;
    string name = 2;
    double price = 3;
    int32 stock = 4;
}

From this one file you generate server stubs and client libraries in Go, Python, Java, Kotlin, Swift, TypeScript, C#, Rust — whatever you need. The fields are wire-tagged by number (= 1, = 2), which is what makes Protobuf forward- and backward-compatible: never reuse a field number, never change its type, and your old clients keep working forever.

Server in Python

import grpc
from concurrent import futures
import product_pb2, product_pb2_grpc

class ProductServicer(product_pb2_grpc.ProductServiceServicer):
    def __init__(self):
        self.products = {1: {"id": 1, "name": "Laptop", "price": 999.99, "stock": 50}}

    def GetProduct(self, request, context):
        product = self.products.get(request.product_id)
        if not product:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details("Product not found")
            return product_pb2.ProductResponse()
        return product_pb2.ProductResponse(**product)

    def ListProducts(self, request, context):
        # Server-side streaming — yield each product as it’s available.
        for p in self.products.values():
            yield product_pb2.ProductResponse(**p)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    product_pb2_grpc.add_ProductServiceServicer_to_server(ProductServicer(), server)
    server.add_insecure_port("[::]:50051")
    server.start()
    server.wait_for_termination()

gRPC has sharp edges

  • Browser pain: Browsers cannot speak gRPC directly — you need a gRPC-Web proxy (Envoy, grpc-web). For public APIs, this alone is usually a deal-breaker.
  • Debugging tools: No curl, no Postman without plugins. Use grpcurl and Bloom RPC, but the discoverability tax is real.
  • Load balancing: HTTP/2 keeps a single long-lived connection per server. A naive L4 load balancer pins all traffic to one backend. You need an L7 LB (Envoy, Linkerd) or client-side load balancing.
  • Status codes: gRPC has its own status enum (OK, NOT_FOUND, UNAVAILABLE, etc.) — not HTTP codes. Map them carefully when bridging to REST gateways.

GraphQL

Why GraphQL Exists

The Problem: A mobile screen needs some fields from the user, some from their orders, and some from the product catalog. With REST that’s 3 round-trips and over-fetching every time. With many client apps (iOS, Android, web), the per-screen endpoint sprawl gets out of hand.

The Solution: GraphQL exposes one schema across many backing services. The client writes a query that describes the exact shape of the response it wants — one request, no over-fetching, no under-fetching.

GraphQL solves the over-fetching and under-fetching problem that REST has when many client form factors share the same backend. It is also a strong fit for API aggregation — one query that pulls together data from N microservices — via federation (Apollo Federation, GraphQL Mesh).

The schema

type Product {
    id: ID!
    name: String!
    price: Float!
    stock: Int!
    reviews: [Review!]!
}

type Review {
    id: ID!
    rating: Int!
    comment: String
    user: User!
}

type User {
    id: ID!
    name: String!
    email: String!
}

type Query {
    product(id: ID!): Product
    products(limit: Int = 20): [Product!]!
}

type Mutation {
    createProduct(name: String!, price: Float!, stock: Int!): Product!
}

The query and the response have the same shape

# Client query — ask for exactly the fields you need.
query {
    product(id: "1") {
        name
        price
        reviews {
            rating
            user { name }
        }
    }
}

# Response — mirror image of the query.
{
    "data": {
        "product": {
            "name": "Laptop",
            "price": 999.99,
            "reviews": [
                {"rating": 5, "user": {"name": "Jane"}}
            ]
        }
    }
}

GraphQL’s sharp edges

  • The N+1 problem. A naive resolver for Product.reviews fires one DB query per product. Use DataLoader (or your stack’s equivalent) to batch within a single request.
  • Caching. HTTP caches don’t help — everything is a POST to /graphql. You either build query-aware caching at the gateway (persisted queries + Apollo CDN) or live with cache misses.
  • Query cost. A malicious or careless client can ask for deeply nested data and DoS your DB. Enforce query depth limits, field-cost analysis, and timeouts.
  • Schema is a single source of truth and a single point of failure. Federation helps, but the discipline overhead is real — teams must agree on entity ownership.

If you have one or two clients hitting one backend, GraphQL is overkill. If you have a dozen client apps consuming a hundred microservices, GraphQL (federated) is often the only sane way to keep the API surface coherent.

Asynchronous Messaging

Why Async at All

The Problem: Synchronous calls fail synchronously. If three services downstream of your checkout each take 200 ms, your checkout takes 600 ms minimum — and one of them being down means checkout is down.

The Solution: Push work that doesn’t need an immediate answer onto a queue or event stream. The producer is decoupled from consumer health, traffic spikes are absorbed, and adding a new consumer doesn’t require touching the producer.

Two broad shapes dominate, and they’re not interchangeable:

ShapeToolsSemanticsUse For
Message queue (work distribution)RabbitMQ, AWS SQS, Google Pub/SubOne message, one consumer. Acked & deleted on success.Background jobs, email, billing, retries.
Event stream (broadcast log)Apache Kafka, AWS Kinesis, RedpandaOne message, many consumer groups. Retained for days/weeks.Event sourcing, analytics, fan-out, replay.

For deep coverage of broker selection, queue patterns (work queue, pub/sub, routing, topics), and consumer group semantics, see Messaging Patterns. For event-driven architecture as a system shape (event sourcing, CQRS, sagas), see Event-Driven Architecture. The summary below is the part you need to choose between sync and async at the call-site level.

RabbitMQ producer & consumer (the “work queue” shape)

import pika, json

# Producer — publish a durable message and walk away.
conn = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
ch = conn.channel()
ch.queue_declare(queue="order_processing", durable=True)

ch.basic_publish(
    exchange="",
    routing_key="order_processing",
    body=json.dumps({"order_id": "ORD-12345", "total": 1999.98}),
    properties=pika.BasicProperties(delivery_mode=2),  # persistent
)

# Consumer — one job at a time, ack on success, requeue on failure.
def handle(ch, method, props, body):
    order = json.loads(body)
    try:
        process_order(order)
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except Exception:
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)

ch.basic_qos(prefetch_count=1)         # fair dispatch
ch.basic_consume(queue="order_processing", on_message_callback=handle)
ch.start_consuming()

Kafka producer & consumer (the “event log” shape)

from kafka import KafkaProducer, KafkaConsumer
import json

producer = KafkaProducer(
    bootstrap_servers=["localhost:9092"],
    value_serializer=lambda v: json.dumps(v).encode(),
    key_serializer=lambda k: k.encode() if k else None,
)

# Key determines partition — ordering is per-key, not global.
producer.send(
    "order-events",
    key="ORD-12345",
    value={"event_type": "OrderCreated", "order_id": "ORD-12345", "total": 1999.98},
)
producer.flush()

# A consumer group reads the topic; many groups can read the same topic independently.
consumer = KafkaConsumer(
    "order-events",
    bootstrap_servers=["localhost:9092"],
    group_id="billing-service",
    value_deserializer=lambda m: json.loads(m.decode()),
    auto_offset_reset="earliest",
)
for msg in consumer:
    handle_event(msg.value)

Quick decision rule

  • One consumer group, work distribution, low retention → RabbitMQ or SQS.
  • Many consumer groups, replay, high throughput, ordered per key → Kafka.
  • You don’t know yet? Start with the simplest queue your platform offers (SQS on AWS, Pub/Sub on GCP). Migrate to Kafka the day you actually need replay.

Communication Reliability

Why Reliability Is a First-Class Concern

The Problem: Every network call has three failure modes: it doesn’t arrive, it arrives slowly, or it arrives twice. Naive code assumes none of those happen and is therefore wrong in production.

The Solution: Combine timeouts, retries with backoff, idempotency, and circuit breakers. None of these is optional once you have more than two services.

This section is a fast tour. For the full treatment of circuit breakers, retries, bulkheads, and chaos engineering, see Circuit Breaker & Resilience.

Timeouts: pick numbers and write them down

Every outbound call needs an explicit timeout. The default in most HTTP clients is “forever,” which is exactly the timeout that turns a slow downstream into a cascading outage.

Call typeReasonable timeoutWhy
Internal microservice (in-region)200 ms – 2 sSame-DC latency is sub-ms; anything above 2s is a sick service.
Database query1 s – 10 sMost reads are <100 ms; long tail covers locks and slow scans.
External SaaS API5 s – 30 sYou don’t control their P99; budget for it but cap it.
Async job processing30 s – 5 minPer-message visibility timeout in the queue.

The timeout for any service should be shorter than the timeout of whoever calls it. Otherwise the upstream gives up first and the work it asked for keeps running anyway — pure waste.

Retry with exponential backoff and jitter

import random, time
from typing import Callable, TypeVar

T = TypeVar("T")

def retry_with_backoff(
    fn: Callable[[], T],
    max_attempts: int = 5,
    base_delay: float = 0.2,
    max_delay: float = 5.0,
    retryable: tuple = (TimeoutError, ConnectionError),
) -> T:
    last = None
    for attempt in range(max_attempts):
        try:
            return fn()
        except retryable as e:
            last = e
            if attempt == max_attempts - 1:
                break
            # Full jitter: pick a random delay in [0, exp_backoff)
            backoff = min(max_delay, base_delay * 2 ** attempt)
            time.sleep(random.uniform(0, backoff))
    raise last

Never retry a non-idempotent POST without an idempotency key

A retried POST /charge can charge the customer twice. The HTTP-level retry happens because the network swallowed the response, not because the work didn’t happen. Either:

  • Only retry idempotent verbs (GET, PUT, DELETE).
  • Require an Idempotency-Key header on every write so the server deduplicates.
  • Move the work to an async queue and let the broker’s at-least-once semantics push the deduplication problem onto the consumer.

Stripe’s idempotency-key model is the industry reference — it stores the response keyed by the client-supplied UUID for 24 hours, so retrying the exact same request returns the exact same response.

Circuit breakers

Wrap every outbound call to an external dependency in a circuit breaker. After a configured failure rate is exceeded the breaker trips OPEN, and subsequent calls fail immediately instead of hanging on the timeout. This is what prevents one slow downstream from eating all of your service’s threads. See Circuit Breaker & Resilience for implementation, tuning, and observability of breakers, retries, and bulkheads.

Error Handling Across Services

Why Errors Need a Schema

The Problem: “500 Internal Server Error” tells the caller nothing actionable. They retry, they fail again, they page someone. Worse: when a 4xx error becomes a 5xx (or vice versa) at a gateway, callers do the wrong thing.

The Solution: Treat errors as data. Every error has a stable type, a category, and structured fields the caller can branch on without parsing prose.

Classify errors at the source

ClassHTTPgRPCCaller should
Bad request from caller400, 422INVALID_ARGUMENTFix the request. Do not retry.
Unauthorized / forbidden401, 403UNAUTHENTICATED, PERMISSION_DENIEDRe-auth or escalate. Do not retry.
Not found404NOT_FOUNDTreat as legitimate empty result.
Conflict / business rule409, 422FAILED_PRECONDITIONShow the user; don’t retry.
Rate-limited429RESOURCE_EXHAUSTEDBackoff and retry; honor Retry-After.
Server bug500INTERNALRetry once; log; alert.
Dependency timeout / down503, 504UNAVAILABLE, DEADLINE_EXCEEDEDRetry with backoff; trip breaker if persistent.

RFC 7807 problem details — the structured error body

# Content-Type: application/problem+json
{
    "type": "https://errors.example.com/insufficient-stock",
    "title": "Insufficient stock",
    "status": 409,
    "detail": "Requested 50 units of SKU-1234; only 12 available.",
    "instance": "/v1/orders/abc-123",
    "sku": "SKU-1234",
    "requested": 50,
    "available": 12,
    "correlation_id": "01HXYZ..."
}

Three rules for cross-service errors

  1. Don’t leak internal errors. A 500 from your DB shouldn’t propagate as a 500 from your public API — map it to a generic 503 with a stable type so callers can react.
  2. Carry a correlation ID through every hop. Errors without a correlation ID are unsolvable. Inject X-Correlation-ID at the edge, propagate it on every outbound call, and log it on every line.
  3. Error budgets are a real budget. Your SLO defines how many errors are acceptable. Resilience patterns (retry, fallback, circuit breaker) spend that budget — track them as carefully as you track success rate.

Real-World Examples

Stripe: REST done right

Stripe’s public API is REST + JSON, with a few opinionated extensions that the industry has steadily copied:

Google: gRPC at planet scale

Google built gRPC on top of an internal RPC framework (Stubby) used for over a decade across thousands of services. Every internal Google service-to-service call is gRPC over HTTP/2 over a custom transport, with Protobuf contracts checked into a single monorepo. The .proto files are the API surface; SDKs in every language are generated by the same toolchain. This is why gRPC is opinionated about things like deadlines (context.Deadline) and metadata propagation — those are Google’s production lessons turned into protocol features.

Shopify: GraphQL Admin API

Shopify’s public Admin API is GraphQL. The reason is the surface area: tens of thousands of third-party apps, each needing different slices of merchant data — orders, customers, products, fulfillment, inventory. With REST that’s either hundreds of endpoints (each one a backwards-compat liability) or massive over-fetching. With GraphQL each app asks for exactly what it needs. Shopify enforces a query-cost limit (calculated from the query AST) so a careless app can’t blow up the database; the cost is published as part of the schema so app authors can budget against it.

Uber: a mixed RPC stack

Uber’s service mesh is a deliberate mix:

The lesson is that real production stacks are polyglot at the protocol layer. There is no “one true API style” — pick the right shape per call.

Best Practices

The short list

  • Default to REST. Reach for gRPC or GraphQL when you have a concrete reason — not because it’s fashionable.
  • Every outbound call has a timeout. Without exception. The default timeout in your HTTP client library is wrong.
  • Every write has an idempotency key. Either client-supplied (REST/Stripe model) or broker-mediated (queue + dedup).
  • Errors are types, not strings. Adopt RFC 7807 (or your stack’s equivalent) and stop returning bare 500s.
  • Correlation IDs everywhere. Generate at the edge, propagate on every hop, log on every line. Without this, distributed debugging is guesswork.
  • Pick async aggressively. If the caller doesn’t need an answer, don’t make them wait. Async absorbs spikes, isolates failures, and lets you add subscribers without redeploying the producer.
  • Contract-test the boundaries. Pact, Spring Cloud Contract, or schema diffing in CI. The contract is the only artifact two teams share.
  • Bound your fan-out. A single user request that explodes into 50 internal calls is a tail-latency disaster. Aggregate at the gateway (see API Gateway) or use GraphQL field-level batching.

Common anti-patterns

Anti-patternWhy it hurtsWhat to do instead
Synchronous cascade (A → B → C → D)Latency adds; failures multiply; one slow node blocks everyoneAsync after the first hop where possible; aggregate at gateway
No timeout on outbound callsSlow downstream eats the caller’s threads — cascading outageExplicit per-call timeout shorter than upstream’s
Retrying non-idempotent POSTDuplicate orders, double chargesIdempotency keys, or move to async with broker dedup
Sharing a JSON model across teams without versioningOne team’s rename breaks every consumerOpenAPI / Protobuf contracts, semantic versioning, contract tests
Using REST for high-throughput internal RPCJSON parsing alone burns 10–30% CPU at scalegRPC + Protobuf for hot internal paths
Using GraphQL on a single-client backendAdds N+1 risk, query-cost overhead, tooling taxPlain REST — come back when you have many clients
Treating queue messages as fire-and-forget “notifications”Lost work, no audit trail, no replayDurable broker, explicit acks, dead-letter queue

The single most useful sentence about service communication

Your protocol choice is a contract between the two teams that own the two services — and it will outlive both of them. Choose for the next five years of operations, not for this sprint’s feature.