gRPC for Microservices

When two services need to talk a million times a second, JSON over HTTP/1.1 is the wrong tool. gRPC gives you HTTP/2, Protocol Buffers, and four streaming patterns — the contract is the source of truth, and the wire is binary.

Medium25 min read

Why gRPC Matters

Why gRPC Matters

The Problem: REST + JSON is wonderful for humans — you can curl it, paste it into Postman, and read it. But for service-to-service traffic at scale, JSON is 3–10x larger on the wire than it needs to be, parsing burns CPU, every team invents its own field naming, and there is no machine-checkable contract. The cost of “readable” shows up in your latency budget and your AWS bill.

The Solution: gRPC is a schema-first RPC framework: you define services and messages in a .proto file, code-generate clients and servers in 11+ languages, and ship binary Protocol Buffers over HTTP/2. The contract is the source of truth, the wire is small, and breaking changes are caught at compile time instead of 3 AM.

Real Impact: Google runs trillions of internal gRPC calls per second. Netflix, Square, Dropbox, and Cloudflare moved core internal traffic from JSON to gRPC and reported double-digit drops in CPU and latency.

Real-World Analogy

Imagine two warehouses that need to exchange inventory updates. There are two ways:

  • JSON style: Every clerk writes a free-form note — “the blue widget”, “widget #4 (blue)”, “BLU-WID”. The other end has to guess what each clerk meant. Sometimes the guess is wrong.
  • gRPC style: Both warehouses agree on a shared dictionary — product 4271 always means “widget, blue, large”. Every message uses the dictionary. There is no “what does this field mean” conversation, ever.

That dictionary is the .proto file. Once you have a shared schema, the wire format gets smaller, parsers get faster, and ambiguity disappears.

Microservices multiply the cost of every protocol decision. A request that crosses 8 services on its way to a database pays the JSON tax 8 times. Multiply that by your QPS and the difference between “fine” and “painful” is measured in racks of servers. gRPC was built at Google specifically because at their scale the savings were unignorable, and the same math applies to anyone whose internal east-west traffic is in the millions of RPS.

What gRPC actually gives you

The Building Blocks

Why Three Layers

The Problem: RPC frameworks of the past (CORBA, SOAP, Thrift) shipped their own transport, their own format, and their own service IDL. Every layer reinvented something the network already had.

The Solution: gRPC reuses standards. HTTP/2 carries the bytes. Protocol Buffers describes the messages. The service block describes the methods. Each layer is independently swappable and well-understood.

gRPC is a stack of three things that fit together cleanly. Knowing what each layer does makes debugging dramatically easier — you can say “the framing is fine but the proto schema drifted” instead of waving at the whole stack.

LayerWhat It DoesWhat You Get
HTTP/2Multiplexed binary transportMany concurrent RPCs on one TCP connection, header compression (HPACK), server push, flow control
Protocol BuffersSchema and wire format for messagesCompact varint encoding, codegen in 11+ languages, forward and backward compatible if you follow the rules
gRPC service IDLservice and rpc declarationsStrongly typed methods, four streaming patterns, generated stubs and skeletons

HTTP/2 in one paragraph

HTTP/1.1 opens one TCP connection per concurrent request (or pipelines poorly). HTTP/2 opens one TCP connection and multiplexes thousands of independent streams over it. Headers are compressed with HPACK so you don’t pay for repeating Authorization on every call. Streams have flow control, so a slow consumer can’t drown a fast producer. For an internal mesh making millions of calls between the same two pods, this is the difference between “TCP setup is half my latency” and “TCP setup is invisible.”

Protocol Buffers in one paragraph

Protobuf encodes each field as a small integer tag plus a varint-encoded value. There are no field names on the wire, no quotes, no whitespace. A 1 KB JSON object is often 200 bytes of protobuf. The schema is mandatory at compile time but invisible at runtime — both sides need the .proto to make sense of the bytes. That mandatory schema is the source of half the value.

Protocol Buffers Crash Course

Why Field Numbers Matter

The Problem: Without a stable identifier per field, you can’t evolve a schema without breaking old clients.

The Solution: Each field gets a permanent integer tag (the field number). The name can change, the type can change in narrow ways, but the tag is sacred — that is what protobuf actually serializes.

This is the minimum proto3 you need to read and write production schemas. Save it as greeter/v1/greeter.proto:

syntax = "proto3";

package greeter.v1;

option go_package = "github.com/example/greeter/v1;greeterv1";

// Reserve numbers and names from removed fields so they can never be reused.
message Greeting {
  reserved 4, 7;
  reserved "old_field";

  string name           = 1;
  string language       = 2;   // e.g. "en", "ja"
  int32  enthusiasm     = 3;   // 1..10
  repeated string tags  = 5;   // arbitrary labels
  Mood   mood           = 6;

  oneof contact {
    string email = 8;
    string phone = 9;
  }

  optional string nickname = 10; // proto3 explicit presence
}

enum Mood {
  MOOD_UNSPECIFIED = 0;        // always reserve 0 for the default
  MOOD_HAPPY       = 1;
  MOOD_NEUTRAL     = 2;
  MOOD_GRUMPY      = 3;
}

message SayHelloRequest  { Greeting greeting = 1; }
message SayHelloResponse { string message = 1; }

service Greeter {
  // Unary
  rpc SayHello(SayHelloRequest) returns (SayHelloResponse);

  // Server streaming
  rpc SayHelloRepeatedly(SayHelloRequest) returns (stream SayHelloResponse);

  // Client streaming
  rpc SayHelloToCrowd(stream SayHelloRequest) returns (SayHelloResponse);

  // Bidirectional
  rpc ChatGreetings(stream SayHelloRequest) returns (stream SayHelloResponse);
}

Things this snippet shows you

  • Field numbers (1, 2, 3…) are the only stable identifier. Field 1 is encoded with tag 0x08 for varint types — that’s what actually goes on the wire.
  • repeated is protobuf’s name for “list of”. Lists are zero-or-more.
  • oneof means “exactly one of these fields will be set” — a tagged union. Setting a new oneof field clears the others.
  • optional in proto3 brings back “was this field explicitly set or not” — useful for partial updates.
  • enum values are integers. The 0 value is the implicit default, so it must mean “unspecified”.
  • reserved blocks future authors from re-using a removed field number or name. Always reserve when deleting.

Scalar types worth knowing

Proto TypeWire EncodingWhen to Use
int32 / int64VarintDefault integer; cheap for small values, expensive for large negatives
sint32 / sint64Zigzag varintIntegers that are often negative
fixed32 / fixed64Fixed 4 / 8 bytesIDs, hashes — values that are usually large
bool1 byteBooleans
stringUTF-8 length-prefixedText. Always UTF-8 — binary goes in bytes
bytesLength-prefixedBinary blobs, embedded images, opaque tokens
google.protobuf.TimestampWell-known typeTime. Don’t roll your own — libraries already know how to convert

Never reuse a field number

Once a .proto with field 5 meaning repeated string tags has shipped to a single client — even an old mobile app you forgot about — that tag is forever. If you re-use field 5 for an int32, the old client’s bytes will silently decode into the new field as garbage. That is a corruption bug, not an error. The only safe move is reserved 5; and pick a new number.

The Four RPC Patterns

Why More Than Just Request/Response

The Problem: Plenty of real workflows aren’t one-shot. Tailing logs, uploading a file in chunks, a multiplayer chat — each wants something different from “send a request, get a response”.

The Solution: HTTP/2 streams give you bidirectional flow for free, so gRPC exposes four patterns instead of one. Each pattern uses the same underlying machinery.

The Four gRPC Patterns Unary Client Server Server stream Client Server N responses for 1 request Client stream Client Server N requests, 1 final response Bidi stream Client Server Independent send/receive on one stream Same HTTP/2 connection. Different stream framing.
PatternShapeUse Case
Unary1 req → 1 respMost CRUD calls. The default.
Server streaming1 req → N respTail logs, server-sent events, pagination as a stream, push updates
Client streamingN req → 1 respChunked uploads, telemetry batching, file ingest
BidirectionalN req ↔ N respChat, collaborative editing, control planes, full-duplex pipelines

Go server implementing all four

package main

import (
    "context"
    "io"
    "log"
    "net"

    "google.golang.org/grpc"
    greeterv1 "github.com/example/greeter/v1"
)

type server struct {
    greeterv1.UnimplementedGreeterServer
}

// 1. Unary
func (s *server) SayHello(ctx context.Context, req *greeterv1.SayHelloRequest) (*greeterv1.SayHelloResponse, error) {
    return &greeterv1.SayHelloResponse{Message: "hello " + req.GetGreeting().GetName()}, nil
}

// 2. Server streaming
func (s *server) SayHelloRepeatedly(req *greeterv1.SayHelloRequest, stream greeterv1.Greeter_SayHelloRepeatedlyServer) error {
    for i := 0; i < 5; i++ {
        if err := stream.Send(&greeterv1.SayHelloResponse{Message: "hello again"}); err != nil {
            return err
        }
    }
    return nil
}

// 3. Client streaming
func (s *server) SayHelloToCrowd(stream greeterv1.Greeter_SayHelloToCrowdServer) error {
    var names []string
    for {
        in, err := stream.Recv()
        if err == io.EOF {
            return stream.SendAndClose(&greeterv1.SayHelloResponse{
                Message: "hello to " + joinNames(names),
            })
        }
        if err != nil {
            return err
        }
        names = append(names, in.GetGreeting().GetName())
    }
}

// 4. Bidirectional
func (s *server) ChatGreetings(stream greeterv1.Greeter_ChatGreetingsServer) error {
    for {
        in, err := stream.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }
        if err := stream.Send(&greeterv1.SayHelloResponse{
            Message: "echo: " + in.GetGreeting().GetName(),
        }); err != nil {
            return err
        }
    }
}

func main() {
    lis, _ := net.Listen("tcp", ":50051")
    s := grpc.NewServer()
    greeterv1.RegisterGreeterServer(s, &server{})
    log.Fatal(s.Serve(lis))
}

Python client

import grpc
from greeter.v1 import greeter_pb2, greeter_pb2_grpc

def main():
    # Channels are long-lived. One per (host, port) per process.
    with grpc.insecure_channel("localhost:50051") as channel:
        stub = greeter_pb2_grpc.GreeterStub(channel)

        # Unary
        resp = stub.SayHello(
            greeter_pb2.SayHelloRequest(
                greeting=greeter_pb2.Greeting(name="Robbie", language="en"),
            ),
            timeout=2.0,
            metadata=[("x-request-id", "abc-123")],
        )
        print(resp.message)

        # Server streaming
        for r in stub.SayHelloRepeatedly(
            greeter_pb2.SayHelloRequest(greeting=greeter_pb2.Greeting(name="Robbie"))
        ):
            print(r.message)

        # Client streaming
        def gen():
            for name in ["Ada", "Linus", "Grace"]:
                yield greeter_pb2.SayHelloRequest(
                    greeting=greeter_pb2.Greeting(name=name),
                )
        print(stub.SayHelloToCrowd(gen()).message)

Schema Evolution

Why Schema Evolution Is the Hard Part

The Problem: Real systems can’t deploy clients and servers atomically. There is always a window where v1 callers talk to v2 servers and v2 callers talk to v1 servers. Get the rules wrong and that window is silent data corruption.

The Solution: A small set of rules that protobuf was specifically designed to support — if you follow them, both sides remain compatible across many releases.

Protobuf’s wire format was designed for the “tolerant reader” principle: unknown fields are silently kept and re-emitted, missing fields take the default value, and field numbers are the only identity. That gives you a long list of safe changes:

ChangeSafe?Notes
Add a new field with a new numberYesOld clients ignore it; new clients see default for old payloads
Rename a field (keep number)Yes on the wireSource-incompatible — consumers using the old name in code will break
Delete a fieldYes if you reserved itReserve both number AND name to prevent reuse
Change a field’s typeAlmost neverA few narrow conversions are wire-compatible (e.g. int32uint32); most are not
Reuse a deleted field numberNeverSilent corruption with old clients
Add a value to an enumYesOld clients see UNSPECIFIED; design code to handle unknown values
Move a field into a oneofNoWire-compatible but presence semantics change
Bump package greeter.v1v2Yes (full break)Run both side-by-side; migrate callers; retire v1

The Tolerant Reader, in practice

Servers and clients alike should:

  • Treat unknown fields as “don’t panic, keep the bytes, re-emit on serialization”. The library does this for you.
  • Treat missing fields as the documented default. Don’t encode “the field is missing” as a magic value.
  • Treat unknown enum values as FOO_UNSPECIFIED and have a sensible fallback — never crash.
  • Use required = nothing. Proto3 has no required keyword on purpose. If your code requires a field, validate at the application layer with a clear error.

Buf for schema CI

Buf is the tool you almost certainly want for managing .proto at scale. The relevant pieces:

Interceptors and Cross-Cutting Concerns

Why Interceptors Exist

The Problem: Auth, logging, metrics, retries, deadline propagation, request IDs — you don’t want any of this in your business handlers, and you certainly don’t want to copy-paste it into every method.

The Solution: Interceptors wrap every RPC at a single point. There’s a server-side flavor and a client-side flavor; you stack them like middleware.

An interceptor sees every request and response on its way through. The four canonical uses:

Client interceptor adding auth metadata (Go)

func AuthInterceptor(token string) grpc.UnaryClientInterceptor {
    return func(
        ctx context.Context,
        method string,
        req, reply interface{},
        cc *grpc.ClientConn,
        invoker grpc.UnaryInvoker,
        opts ...grpc.CallOption,
    ) error {
        ctx = metadata.AppendToOutgoingContext(ctx,
            "authorization", "Bearer "+token,
            "x-client-version", build.Version,
        )
        return invoker(ctx, method, req, reply, cc, opts...)
    }
}

conn, _ := grpc.Dial("orders:50051",
    grpc.WithUnaryInterceptor(AuthInterceptor(token)),
    grpc.WithStatsHandler(otelgrpc.NewClientHandler()), // OTel built-in
)

Server interceptor enforcing deadlines and tracing

func DeadlineInterceptor(min time.Duration) grpc.UnaryServerInterceptor {
    return func(
        ctx context.Context,
        req interface{},
        info *grpc.UnaryServerInfo,
        handler grpc.UnaryHandler,
    ) (interface{}, error) {
        deadline, ok := ctx.Deadline()
        if !ok {
            return nil, status.Error(codes.InvalidArgument, "deadline required")
        }
        if time.Until(deadline) < min {
            return nil, status.Error(codes.DeadlineExceeded, "insufficient time budget")
        }
        return handler(ctx, req)
    }
}

s := grpc.NewServer(
    grpc.ChainUnaryInterceptor(
        otelgrpc.UnaryServerInterceptor(),       // trace + metrics
        DeadlineInterceptor(50*time.Millisecond),    // budget guard
        AuthServerInterceptor(jwtVerifier),       // authn
        LoggingInterceptor(logger),               // structured logs
    ),
)

Deadline propagation is the unsung hero

If service A is given 200 ms to respond, and it calls B which calls C, the deadline must shrink along the chain — B should not be allowed to spend more than the time A has left. gRPC does this automatically when you pass ctx through. Most distributed timeout incidents come from someone using context.Background() mid-chain and resetting the budget to infinity.

gRPC vs REST

Neither is universally better. The honest answer is “different tools, different boundaries.”

DimensiongRPCREST + JSON
Wire sizeSmall (binary varint)3–10x larger (text)
CPU to encode/decodeLowHigher (string parsing, allocation)
SchemaMandatory .protoOptional (OpenAPI, JSON Schema)
Language supportExcellent in 11+ languagesUniversal
Browser supportNeeds gRPC-Web or ConnectNative
StreamingFirst-class (4 patterns)SSE / WebSocket bolted on
DebuggabilityNeeds grpcurl, Wireshark proto plugincurl + your eyes
Caching at HTTP layerHard (POST-shaped)Easy (GET + ETag)
Public API ergonomicsSteep onboardingFamiliar to every developer
Internal mesh fitExcellentGood but pricier

The boundary heuristic

  • Internal east-west traffic (service ↔ service): gRPC. The performance and contract benefits compound.
  • Public APIs for arbitrary developers: REST. Lower onboarding cost; integrates with everything.
  • Browser-facing APIs you own end-to-end: Connect or gRPC-Web is now realistic. JSON if you want zero ceremony.
  • Mobile clients: gRPC pays off — battery, bandwidth, and latency all improve.

Browser and Edge

Why the Browser Was the Holdout

The Problem: Browsers don’t expose raw HTTP/2 frames or trailers to JavaScript. Plain gRPC is unreachable from fetch().

The Solution: Two protocols that adapt gRPC to what browsers can actually do — gRPC-Web (the original, needs an Envoy proxy) and Connect (the newer, wire-compatible alternative from Buf that runs natively on standard HTTP).

Three options for hitting gRPC from a browser

TypeScript Connect client

import { createPromiseClient } from "@connectrpc/connect";
import { createConnectTransport } from "@connectrpc/connect-web";
import { Greeter } from "./gen/greeter/v1/greeter_connect";
import { Greeting } from "./gen/greeter/v1/greeter_pb";

const transport = createConnectTransport({
    baseUrl: "https://api.example.com",
    interceptors: [
        (next) => async (req) => {
            req.header.set("authorization", `Bearer ${getToken()}`);
            return await next(req);
        },
    ],
});

const client = createPromiseClient(Greeter, transport);

// Unary — works in any modern browser via fetch
const res = await client.sayHello({
    greeting: new Greeting({ name: "Robbie", language: "en" }),
});
console.log(res.message);

// Server streaming — iterate over async results
for await (const reply of client.sayHelloRepeatedly({
    greeting: new Greeting({ name: "Robbie" }),
})) {
    console.log(reply.message);
}

Connect vs gRPC-Web in one line

If you’re starting a new browser-facing API today, use Connect — it works without Envoy, debugs as plain HTTP in the browser’s network tab, and stays wire-compatible with gRPC for your internal mesh.

Real-World Examples

Google built Stubby in the early 2000s as the internal RPC framework that glued together every service in the company. gRPC, open-sourced in 2015, is Stubby with the Google-isms removed and the proto schema published. Internally, every microservice call at Google — trillions per second — flows through this stack.

Netflix migrated significant chunks of its internal communication to gRPC for the same reason it built Hystrix and Eureka before it: at Netflix scale, percent-level CPU savings translate to actual money and headroom. They contribute to the gRPC ecosystem and use it heavily for service mesh data planes.

Square rolled gRPC out across its mobile and backend teams, and built Wire, an alternative protobuf runtime optimized for Android. The driving force was mobile bandwidth: smaller payloads make the app faster on bad networks.

Dropbox wrote about their move from a homegrown RPC system to gRPC, citing the language polyglot support (they use Python, Go, Rust) and the existing tooling for retries, deadlines, and observability as the killer features — not raw performance.

Buf publishes Connect, the modern alternative protocol, and runs the Buf Schema Registry. Their bet is that the future of cross-org RPC isn’t plain gRPC but a wire-compatible superset that includes the browser path natively.

Cloudflare, Lyft, Uber, and Slack all use gRPC heavily for service-to-service communication. The thread is consistent: high-QPS internal traffic where the constants matter.

Best Practices

The short list

  • Version your packages. Always foo.v1, never bare foo. Run v2 alongside v1 when you need to break things.
  • Reserve deleted fields. Both number and name. Add buf breaking to CI so you can’t forget.
  • Always have an UNSPECIFIED = 0 for enums. The zero value is the on-the-wire default, and “unspecified” is the only safe meaning.
  • Use deadlines on every call. A gRPC call with no deadline is a thread leak waiting to happen. Refuse them at the server with an interceptor.
  • One channel per upstream, not per call. Channels are expensive to set up and cheap to share. Multiplexing is the whole point of HTTP/2.
  • Don’t put business outcomes in gRPC error codes. Use OK with a domain status field, not FAILED_PRECONDITION, for “cart already checked out”. Reserve UNAVAILABLE and friends for infra.
  • Record the proto, not the JSON, in your service mesh. Linkerd, Istio, and Envoy all understand gRPC framing — let them see codes and methods, not opaque POSTs.
  • Use Buf for lint, breaking-change checks, and codegen. Hand-rolled protoc invocations rot fast.
  • Pair gRPC with the same resilience patterns as REST. Circuit breakers, retries with jitter, bulkheads — all still apply, and the gRPC client config supports them declaratively.
  • Choose the boundary deliberately. Internal: gRPC. Public: REST or Connect. Browser: Connect.

The single most useful sentence about gRPC

If you remember one thing

The value of gRPC isn’t the binary wire format — it’s that the schema is mandatory, versioned, and shared. Once your services agree on a contract that compiles, half the production bugs that used to come from “the field changed shape” just stop happening.