Messaging Patterns

A broker is just plumbing. The pattern you choose — queue or topic, at-least-once or effectively-once, ordered or sharded — is what decides how your system behaves when the network blinks. Pick deliberately.

Medium25 min read

Why Messaging Patterns Matter

Why Messaging Patterns Matter

The Problem: Two services talking through a broker look identical on the diagram whether you used a queue or a topic, whether delivery is at-most-once or at-least-once, whether messages are ordered or not. The diagram lies. The choice you didn’t think about is the one that’s going to wake you up at 3 AM.

The Solution: Treat the messaging pattern as a first-class design decision. Decide explicitly — on paper, in the design doc — what happens when a consumer crashes mid-handler, when the broker drops a connection, when a producer writes the same message twice, when one partition lags by an hour.

Real Impact: The same Apache Kafka cluster can run a payments pipeline that must never lose a message and a click-stream pipeline where dropping 0.1% is fine. The broker doesn’t know the difference. Your config does.

Real-World Analogy

Think about how the postal service ships a package:

  • Standard mail — cheap, fast, no proof. If it’s lost you’ll never know. That’s at-most-once.
  • Registered mail — the carrier scans every hop. It might be delivered twice if a re-attempt overlaps a successful drop. That’s at-least-once.
  • Certified with return receipt — you get a signed acknowledgment. Closer to a request-reply pattern with idempotency on the recipient.
  • Mailing list — one envelope, many copies, every subscriber receives one. That’s pub/sub.
  • P.O. Box — one address, whoever shows up first picks it up. That’s a queue with competing consumers.

You don’t pay for certified mail to send a birthday card and you don’t use a postcard to send a passport. The pattern matches the consequence of failure. Same with messaging.

Every messaging system on this planet is some combination of three primitives: a producer writes, a broker stores, a consumer reads. The interesting part is what each of those steps does on the unhappy path — the producer’s ack, the broker’s replication, the consumer’s commit. Patterns are how we name the combinations.

This tutorial is about those patterns themselves — the trade-offs, the failure modes, the configuration knobs. It is not about which broker to pick (that’s in the comparison table) and it is not about event-driven architecture as a whole (that’s the next tutorial). The lens here is narrow on purpose: given that you’re using a broker, what does correct use look like?

Point-to-Point vs Publish-Subscribe

Why It’s the First Decision

The Problem: “Send the OrderCreated event somewhere” is two completely different designs depending on whether one consumer should process it or every interested service should.

The Solution: Queues for work distribution (one logical consumer, scaled out). Topics for notification (many independent consumers, each with their own offset).

The two shapes look similar in code — produce a message, consume a message — but the broker is doing something fundamentally different in each.

Point-to-Point vs Publish-Subscribe Point-to-Point (Queue) Producer Queue [m1][m2][m3] Worker A → m1 Worker B → m2 Worker C → m3 Each message goes to ONE worker Publish-Subscribe (Topic) Producer Topic [m1][m2][m3] Billing → m1,m2,m3 Search → m1,m2,m3 Audit → m1,m2,m3 Each message goes to ALL subscribers
PropertyPoint-to-Point (Queue)Publish-Subscribe (Topic)
DeliveryOne message → one consumer in a groupOne message → every subscriber
Scaling modelAdd workers to drain fasterAdd subscribers to add behaviors
CouplingProducer knows the queueProducer knows the topic; consumers come and go
Typical useBackground jobs, task workersDomain events, notifications, change feeds
Brokers that lean this wayRabbitMQ classic queues, AWS SQS, Redis listsApache Kafka, AWS SNS, GCP Pub/Sub, NATS

Kafka does both

Apache Kafka is technically a topic log, but a consumer group over a topic gives you queue semantics: each partition is consumed by exactly one member of the group. Want pub/sub on top of the same topic? Use a different consumer group ID. The pattern is encoded in how you read, not how you write — which is one of the reasons Kafka eats so many use cases that started on RabbitMQ.

When to use which

Request-Reply over Messaging

Why You Sometimes Want Sync Over Async

The Problem: The caller actually needs the answer — a price quote, a fraud score, an enriched record — but you don’t want a tight HTTP coupling because the responder is slow, scales independently, or lives in another network.

The Solution: Send the request to a queue, include a reply-to address and a correlation ID, then wait on the reply queue for a message with that ID.

Request-reply over messaging is RPC dressed up in async clothing. It has its place — but it’s also where most messaging-as-RPC anti-patterns live.

# Python + pika (RabbitMQ): request-reply with a temporary reply queue
import uuid, time, pika, json

connection = pika.BlockingConnection(pika.ConnectionParameters("broker"))
channel = connection.channel()

# Exclusive, auto-delete reply queue scoped to this client
result = channel.queue_declare(queue="", exclusive=True)
reply_queue = result.method.queue

responses = {}

def on_reply(ch, method, props, body):
    responses[props.correlation_id] = json.loads(body)

channel.basic_consume(queue=reply_queue, on_message_callback=on_reply, auto_ack=True)

def call(payload, timeout=5.0):
    correlation_id = str(uuid.uuid4())
    channel.basic_publish(
        exchange="",
        routing_key="price.quote.requests",
        properties=pika.BasicProperties(
            reply_to=reply_queue,
            correlation_id=correlation_id,
            expiration=str(int(timeout * 1000)),  # server-side TTL on the request
        ),
        body=json.dumps(payload),
    )
    deadline = time.monotonic() + timeout
    while correlation_id not in responses and time.monotonic() < deadline:
        connection.process_data_events(time_limit=0.1)
    return responses.pop(correlation_id, None)

When NOT to do request-reply over a broker

  • Latency-sensitive paths. Adding a broker hop adds milliseconds and a third failure mode. Use HTTP or gRPC.
  • The caller can’t do anything useful with a delayed answer. If a 30-second wait kills the user experience, async-with-callback isn’t the answer — rethink the boundary.
  • Fan-out replies. “Ask 5 services and aggregate” over a broker is a scatter-gather pattern that’s painful to operate. An API gateway or service-mesh sidecar handles this better.
  • Anywhere a circuit breaker would’ve been simpler. If you’re reaching for messaging because HTTP is “too tightly coupled,” what you actually want is a circuit breaker, a timeout, and a fallback — not a queue.

Request-reply does shine when you genuinely benefit from buffering: long-running enrichment jobs, ML inference where queue depth signals when to autoscale workers, batch RPC against a slow legacy system that you don’t want to drown.

Delivery Guarantees

Why “Exactly-Once” Is a Trap

The Problem: Marketing copy on every broker promises “exactly-once delivery.” In a network where ACKs can be lost, exactly-once is provably impossible. Believing the brochure leads to designs that quietly double-charge customers.

The Solution: Pick at-least-once for anything that matters, then make every consumer idempotent. The combination is what brokers really mean when they say “exactly-once effectively.”

GuaranteeWhat Actually HappensCostWhen to Pick It
At-most-onceProducer fires and forgets; consumer ACKs before processing. Lost messages just disappear.Lowest latency, no de-dup workMetrics, traces, click-stream — anywhere a few % loss is acceptable
At-least-onceProducer retries until ACKed; consumer ACKs after processing. Duplicates are normal.Idempotency burden on consumersAlmost everything else — payments, orders, notifications
Exactly-once (effective)At-least-once on the wire + transactional or idempotent consumer.Lower throughput, more stateStream processing where double-counting is unacceptable (Kafka Streams, Flink)

Exactly-once is a marketing claim. Idempotency is the engineering answer.

Kafka’s “exactly-once” works only inside a Kafka-to-Kafka topology with transactions enabled. The moment you write to an external system — a database, an HTTP API, a downstream service — you are back to at-least-once and you need an idempotent consumer. Design for duplicates from day one. Use natural idempotency keys (order ID, request ID), record them, and short-circuit on a repeat.

Idempotent Kafka consumer in Python

from confluent_kafka import Consumer, KafkaError
import json, psycopg2

consumer = Consumer({
    "bootstrap.servers": "broker:9092",
    "group.id": "order-processor",
    "enable.auto.commit": False,           # commit ourselves, after the work
    "auto.offset.reset": "earliest",
    "isolation.level": "read_committed",
})
consumer.subscribe(["orders.created"])

db = psycopg2.connect("dbname=orders")

def handle(msg):
    event = json.loads(msg.value())
    idem_key = event["order_id"]            # natural idempotency key
    with db.cursor() as cur:
        # INSERT ... ON CONFLICT DO NOTHING is the heart of idempotency.
        # If we’ve seen this order_id before, the row already exists,
        # rowcount is 0, and we skip the side effects.
        cur.execute(
            """
            INSERT INTO processed_orders (order_id, processed_at)
            VALUES (%s, NOW()) ON CONFLICT (order_id) DO NOTHING
            """,
            (idem_key,),
        )
        if cur.rowcount == 0:
            return                          # duplicate; safe to skip
        charge_card(event)
        send_confirmation(event)
    db.commit()

try:
    while True:
        msg = consumer.poll(1.0)
        if msg is None: continue
        if msg.error():
            if msg.error().code() != KafkaError._PARTITION_EOF:
                raise Exception(msg.error())
            continue
        handle(msg)
        consumer.commit(msg, asynchronous=False)   # commit AFTER success
finally:
    consumer.close()

Two lines do all the heavy lifting: enable.auto.commit=false and the explicit commit(msg) after processing. With auto-commit on, the offset advances on a timer and a crash mid-handler leaves you with a permanently lost message. With manual commit, a crash means the next consumer re-reads the same message — which is fine, because the idempotency key short-circuits the second attempt.

Ordering

Why Global Ordering Doesn’t Scale

The Problem: “Process events in the order they happened” sounds like a basic requirement. Globally, across millions of events per second, it’s a single-writer bottleneck and the death of throughput.

The Solution: Order what needs to be ordered — usually per-entity, not globally — using a partition key.

Brokers that scale (Apache Kafka, Apache Pulsar, Kinesis, GCP Pub/Sub with ordering keys) all use the same trick: they shard the topic into partitions and guarantee order within a partition, not across. The partition is selected by hashing a key. Pick the key carefully and you get the order you need without giving up parallelism.

What You Want OrderedPartition KeyWhy
All events for one user, in orderuser_idSame user → same partition → ordered. Different users parallelize.
All events for one orderorder_idOrderPlaced → OrderPaid → OrderShipped land in order on one consumer.
Per-account ledger updatesaccount_idMoney math depends on order; sharding by account preserves it.
Per-tenant in a multi-tenant SaaStenant_idOne tenant’s data flows through one consumer; tenants don’t block each other.

Ordering vs throughput is a knob, not a switch

  • 1 partition = total order, throughput capped at one consumer.
  • N partitions = order within key, throughput scales to N consumers.
  • Repartitioning later is painful — messages with the same key may briefly land on different partitions during rebalancing. Pick the key (and a generous partition count) up front.

The hot-key problem

If 80% of your traffic is one tenant, sharding by tenant_id means 80% of the messages land on one partition. The cluster looks idle; one consumer is on fire. Hot keys are the silent killer of partition-keyed systems. Detect them with per-partition lag metrics and either split the key (tenant_id + bucket(item_id, 10)) or accept that a few keys need their own dedicated topic.

Dead-Letter Queues and Poison Messages

Why Every Pipeline Needs a DLQ

The Problem: A poison message — malformed JSON, a schema change, a bug that throws on one specific record — will be redelivered forever in an at-least-once system. The consumer crashes, restarts, reads the same message, crashes again. The partition stops moving. The lag chart goes vertical.

The Solution: After N failed attempts, route the message to a dead-letter queue. The pipeline keeps moving. A human (or an automated reprocessor) deals with the DLQ on their own time.

DLQs are not optional. The first time you skip one, a malformed message takes down a partition for the weekend.

RabbitMQ: dead-letter exchange

# RabbitMQ definitions: a primary queue with a DLX wired in
exchanges:
  - name: orders
    type: direct
    durable: true
  - name: orders.dlx                   # dead-letter exchange
    type: direct
    durable: true

queues:
  - name: orders.process
    durable: true
    arguments:
      x-dead-letter-exchange: "orders.dlx"
      x-dead-letter-routing-key: "failed"
      x-message-ttl: 600000           # 10 min before DLQ on TTL expiry
      x-delivery-limit: 5             # quorum-queue retry budget
  - name: orders.dlq
    durable: true

bindings:
  - source: orders
    destination: orders.process
    routing_key: "new"
  - source: orders.dlx
    destination: orders.dlq
    routing_key: "failed"

The consumer rejects with requeue=false after exhausting retries. The broker auto-routes the message to orders.dlx with the original payload plus an x-death header explaining why it died. That header is gold for debugging — it carries the original queue, the rejection reason, the count.

Kafka: retry topics, not in-place retries

Apache Kafka has no built-in DLQ. The community pattern, popularized by Uber and Confluent, is a chain of retry topics:

# Topic chain for at-least-once with bounded retries
orders.created                # primary; consumer reads here
orders.created.retry.5s       # 5s delay, then re-attempt
orders.created.retry.30s      # 30s delay
orders.created.retry.5m       # 5m delay
orders.created.dlq            # terminal; alarm on depth

# Pseudo-code in the consumer:
def handle(msg):
    try:
        process(msg)
    except RetryableError:
        attempt = int(msg.headers().get("retry-attempt", 0))
        if attempt >= 3:
            producer.produce("orders.created.dlq", msg.value(),
                             headers={"failure-reason": "max-retries"})
        else:
            next_topic = ["retry.5s", "retry.30s", "retry.5m"][attempt]
            producer.produce(f"orders.created.{next_topic}", msg.value(),
                             headers={"retry-attempt": str(attempt + 1)})
    except NonRetryableError as e:
        producer.produce("orders.created.dlq", msg.value(),
                         headers={"failure-reason": str(e)})
    consumer.commit(msg)            # always commit; the message moved on

AWS SQS: a DLQ is a redrive policy

# Terraform: AWS SQS queue with redrive to DLQ after 5 failed receives
resource "aws_sqs_queue" "orders_dlq" {
  name                       = "orders-dlq"
  message_retention_seconds  = 1209600      # 14 days; max
}

resource "aws_sqs_queue" "orders" {
  name                        = "orders"
  visibility_timeout_seconds  = 60           # must exceed handler P99
  message_retention_seconds   = 345600       # 4 days
  redrive_policy = jsonencode({
    deadLetterTargetArn = aws_sqs_queue.orders_dlq.arn
    maxReceiveCount     = 5                  # 5 deliveries, then DLQ
  })
}

# CloudWatch alarm on DLQ depth — the only signal that matters
resource "aws_cloudwatch_metric_alarm" "dlq_depth" {
  alarm_name          = "orders-dlq-not-empty"
  metric_name         = "ApproximateNumberOfMessagesVisible"
  namespace           = "AWS/SQS"
  statistic           = "Maximum"
  period              = 60
  evaluation_periods  = 1
  threshold           = 0
  comparison_operator = "GreaterThanThreshold"
  dimensions = { QueueName = aws_sqs_queue.orders_dlq.name }
}

DLQ rules of the road

  • Alert on depth, not arrival. The first DLQ message is normal; 1,000 in 5 minutes is an incident.
  • Keep the original payload and headers. Add metadata; don’t modify the message you’ll need to reprocess.
  • Build a redrive tool. Reprocessing should be one CLI command, not a manual SQL+broker dance.
  • Set a retention policy on the DLQ. 14 days is plenty. If you haven’t triaged in 14 days, you’re not going to.
  • Distinguish retryable from non-retryable. A bad schema is non-retryable today; replaying it tomorrow won’t help. A 503 is retryable; the dependency might be back.

Backpressure and Flow Control

Why Producers Will Drown You

The Problem: Producers can almost always write faster than consumers can process. Without flow control, the broker buffer balloons, the consumer’s memory pressure climbs, GC pauses snowball, lag goes vertical, and eventually something OOMs.

The Solution: Push the back-pressure signal back to the consumer (limit prefetch), back to autoscaling (lag-based scaling), and ultimately back to the producer (rate limits, queue-depth based shedding).

Different brokers expose flow control differently, but the pattern is the same: bound the in-flight work per consumer, watch the lag, scale.

BrokerHow You Bound In-Flight WorkLag Signal
RabbitMQbasic.qos(prefetch=N) per channelQueue depth (messages_ready)
Apache Kafkamax.poll.records, max.partition.fetch.bytesConsumer lag per partition
AWS SQSLong-poll batch size, visibility timeoutApproximateNumberOfMessagesVisible
NATS JetStreamMaxAckPending on the consumerPending count, ack floor
GCP Pub/Subflow_control.max_messagesSubscription backlog seconds

NATS JetStream: a durable pull consumer with bounded in-flight

// Go: a durable JetStream consumer with strict flow control
package main

import (
    "context"
    "time"
    "github.com/nats-io/nats.go"
    "github.com/nats-io/nats.go/jetstream"
)

func main() {
    nc, _ := nats.Connect("nats://broker:4222")
    js, _ := jetstream.New(nc)
    ctx := context.Background()

    // Stream backs the topic; defines retention and replication
    js.CreateStream(ctx, jetstream.StreamConfig{
        Name:      "ORDERS",
        Subjects:  []string{"orders.>"},
        Replicas:  3,
        Storage:   jetstream.FileStorage,
        Retention: jetstream.WorkQueuePolicy,        // queue semantics
        MaxAge:    72 * time.Hour,
    })

    cons, _ := js.CreateOrUpdateConsumer(ctx, "ORDERS", jetstream.ConsumerConfig{
        Durable:        "order-processor",
        AckPolicy:      jetstream.AckExplicitPolicy,
        MaxAckPending:  100,                          // hard ceiling on in-flight
        AckWait:        30 * time.Second,
        MaxDeliver:     5,                            // then -> DLQ stream
        BackOff:        []time.Duration{1 * time.Second, 5 * time.Second, 30 * time.Second},
    })

    // Pull batches; never accept more than we can chew
    iter, _ := cons.Messages(jetstream.PullMaxMessages(25))
    for {
        msg, err := iter.Next()
        if err != nil { break }
        if err := handle(msg.Data()); err == nil {
            msg.Ack()
        } else {
            msg.NakWithDelay(5 * time.Second)   // negative ack with backoff
        }
    }
}

Lag-based autoscaling is the right primitive

Tie consumer replicas to lag, not CPU. KEDA on Kubernetes does this out of the box for Apache Kafka, RabbitMQ, AWS SQS, NATS, and GCP Pub/Sub. The contract: scale up when lag exceeds a threshold, scale down when lag is near zero. CPU-based autoscaling lags reality — consumer CPU stays low until the GC bites, by which time you’re already on fire.

Broker Comparison

Every broker is a different set of trade-offs. The table below is opinionated — defaults and typical operational profiles, not the absolute upper bounds you can tune to.

BrokerDeliveryOrderingThroughputLatencyDurability / RetentionOps Cost
Apache KafkaAt-least-once; effectively-once with transactionsPer-partitionVery high (M+ msg/s)~5–50 msPersistent log; days to foreverHigh — ZooKeeper/KRaft, partitions, brokers, mirrors
Apache PulsarAt-least-once + effectively-oncePer-partition or per-keyVery high~5–20 msTiered storage; days to foreverHigh — BookKeeper + brokers + ZooKeeper
RabbitMQAt-least-once (manual ack); at-most-once (auto)Per-queue (single consumer)Medium (10s–100s K msg/s)< 5 msUntil consumed (or TTL)Medium — classic; quorum queues simpler
NATS JetStreamAt-least-once; effectively-once with msg IDsPer-stream / per-subjectHigh< 1 ms (core); ~5 ms (JS)Configurable; hours to days typicalLow — single binary, simple cluster
AWS SQSAt-least-once (standard); effectively-once (FIFO)None (standard); per-group (FIFO)Unlimited (standard); 3K msg/s (FIFO)~10–100 msUp to 14 daysZero — fully managed
AWS SNSAt-least-onceNone (standard); per-group (FIFO)Very high fan-out~10–100 msNot retained — fan-out onlyZero — fully managed
AWS EventBridgeAt-least-onceNoneMedium~100 msNot retained; archive optionalZero — fully managed
GCP Pub/SubAt-least-once; exactly-once optional per-subscriptionNone by default; per-key with ordering enabledVery high~50–200 msUp to 7 daysZero — fully managed
Azure Service BusAt-least-once; effectively-once via session + de-dup windowPer-sessionMedium~10–50 msDays; archive via Event HubsLow — fully managed
Redis StreamsAt-least-once with consumer groupsPer-streamVery high in-memory< 1 msTrim by length or timeLow — if you already run Redis

How to actually choose

  • You need a durable log other systems can replay later — Apache Kafka or Apache Pulsar.
  • You need traditional work queues with rich routing — RabbitMQ.
  • You’re on AWS and want zero ops — AWS SQS for queues, AWS SNS or AWS EventBridge for fan-out, MSK if you genuinely need Kafka.
  • You’re on GCP — GCP Pub/Sub for everything, with ordering keys when you need it.
  • You want low-latency, low-ops, edge-friendly — NATS JetStream.
  • You already run Redis and the data fits in memory — Redis Streams is fine for many small services. Don’t make it your durable system of record.

Best Practices

The short list

  • Design for at-least-once. Make every consumer idempotent. Stop chasing exactly-once.
  • Pick the partition key in the design doc, not in the code review. It’s the single hardest thing to change later.
  • Manual commit, always, for anything that matters. Auto-commit + a crash mid-handler equals a lost message.
  • Every queue gets a DLQ. Every DLQ gets an alarm on depth.
  • Bound in-flight work. Prefetch limits, MaxAckPending, batch sizes — they’re all the same idea.
  • Monitor lag, not just CPU. Lag is the real-time signal of whether consumers are keeping up.
  • Version your message schemas. Add fields, don’t rename them. Avro/Protobuf with a registry beats raw JSON for anything long-lived.
  • Don’t use messaging as a database. Topics are change feeds, not query stores. If you need to look up a record by ID, that’s a database.
  • Tag messages with a trace ID. The first thing you’ll want during an incident is to follow one user’s event across five services.
  • Test the unhappy path. Kill the consumer mid-handler in CI. Re-deliver the same message twice in a test. If your code can’t handle it, you’ll find out in production instead.

The single most useful sentence about messaging

The broker doesn’t solve your distributed-systems problems — it relocates them. Whatever was hard about correctness over HTTP (timeouts, retries, duplicates, ordering, idempotency) is still hard over a queue. The patterns in this tutorial are how you make them tractable.