Skip to content

Event-Driven Architecture for Logistics Operations

A
abemon
| | 12 min read | Written by practitioners
Share

Why logistics is inherently event-driven

A shipment isn’t a database record. It’s a sequence of events: picked up, in transit, at customs, cleared, out for delivery, delivered. Or, more frequently: picked up, in transit, held at customs, documentation incomplete, held at customs (second notification), documentation provided, cleared, out for delivery, recipient absent, second attempt, delivered.

The operational reality of logistics is that exceptions aren’t exceptions. They’re the normal path. In a typical logistics operator, between 15% and 30% of shipments experience at least one incident: customs delays, incorrect addresses, damaged goods, vehicle breakdowns, port closures. Each incident triggers a cascade of actions: notify the client, recalculate deadlines, reassign routes, generate alternative documentation.

Classic request-response architecture isn’t designed for this. A REST API that says “give me the shipment status” returns a static snapshot. It doesn’t tell you what just changed, and it doesn’t automatically trigger the necessary actions. For that, you need an architecture where events flow and systems react.

Anatomy of a logistics event

A well-designed logistics event captures not just what happened, but enough context for any downstream system to act without querying additional sources:

{
  "eventId": "evt_7f3a8b2c",
  "eventType": "shipment.customs.held",
  "timestamp": "2025-05-22T14:32:00Z",
  "source": "customs-gateway",
  "shipmentId": "SHP-2025-0847",
  "data": {
    "customsOffice": "ES002801",
    "holdReason": "DOCUMENTATION_INCOMPLETE",
    "missingDocuments": ["T1", "DUA"],
    "estimatedClearanceDelay": "PT48H",
    "currentLocation": {
      "port": "ESALG",
      "warehouse": "ALGECIRAS-ZAL-3"
    }
  },
  "metadata": {
    "clientId": "CLT-1234",
    "serviceLevel": "express",
    "originalETA": "2025-05-24T09:00:00Z",
    "correlationId": "flow_abc123"
  }
}

Several principles at work here:

Immutable and self-contained events. The event carries all information needed for processing. A consumer shouldn’t need to call the shipments API to find out which documents are missing. If the event includes it, processing is local, fast, and resilient to failures in other services.

Strict typing. shipment.customs.held is specific. Not shipment.updated with a generic status field. Consumers subscribe to the events they care about, not to everything and then filter.

Correlation ID. Connects all events within the same business flow. When you need to reconstruct a shipment’s complete history, the correlation ID is the thread.

Business metadata. serviceLevel: "express" isn’t a technical detail; it’s information that determines response priority when an exception occurs. An express shipment held at customs triggers a different flow (more urgent, more expensive) than a standard one.

Event topology for logistics

Event organization into topics (Kafka) or channels follows the business domain structure:

shipments.lifecycle    -> created, picked-up, in-transit, delivered
shipments.customs      -> submitted, held, cleared, rejected
shipments.exceptions   -> delay, damage, address-invalid, lost
shipments.financial    -> invoiced, payment-received, disputed
vehicles.tracking      -> position-update, eta-recalculated
warehouse.inventory    -> received, stored, picked, dispatched

Each topic groups events from the same domain. Consumers subscribe to relevant topics. A client notification system subscribes to shipments.lifecycle and shipments.exceptions. The financial system subscribes to shipments.financial. The TMS subscribes to vehicles.tracking.

The temptation is to create a single shipments topic with everything. Works when you have 3 consumers. Stops working when you have 15 and each processes 10% of events, discarding the rest. Separating by domain reduces processing load and simplifies topic ownership.

Automated exception handling

This is where event-driven architecture shines in logistics. Exceptions aren’t just passive events that get logged; they’re triggers that activate automatic compensation flows.

Pattern: Saga for customs incidents

When a shipment is held at customs, a saga (sequence of steps with compensation) activates:

  1. Event: shipment.customs.held with reason DOCUMENTATION_INCOMPLETE.
  2. Action: The documentation service checks which documents are on file and which are missing. If they exist in the system but weren’t sent, it resends them automatically.
  3. Action: The notification service alerts the client with the missing documents and a deadline.
  4. Action: The planning service recalculates the ETA based on estimated delay and updates all chained shipments.
  5. Timeout: If the client doesn’t provide documents within 24 hours, escalate to a human agent.
  6. Compensation: If customs definitively rejects, activate return flow or temporary bonded warehouse storage.

Each step is an event that triggers the next. If a step fails, the saga has a defined compensation path. Without this orchestration, every customs incident requires manual intervention: someone checks email, calls the client, sends documents, updates the system. Multiplied by 50 daily incidents, that’s 2-3 full-time people just for customs management.

Automatic re-routing

Re-routing is a use case where latency matters:

Event: vehicle.breakdown
  -> Route service: recalculate routes for affected shipments
  -> Fleet service: assign nearest alternative vehicle
  -> Notification service: alert recipients about ETA change
  -> Financial service: calculate additional cost
All in < 5 minutes from the initial event.

In a request-response system, this flow requires a human operator to detect the problem, query multiple systems, make decisions, and update manually. With events, the flow executes automatically. The human operator reviews and validates but doesn’t execute.

Legacy TMS and WMS integration

The elephant in the room. Most logistics companies operate with a TMS (Transportation Management System) and/or WMS (Warehouse Management System) that’s 10-20 years old, doesn’t expose events, and runs on nightly batch processes.

You have to coexist with them. Replacing them is a multi-million, multi-year project. The strategy: Change Data Capture (CDC) with Debezium to convert database changes in the legacy system into events.

CDC with Debezium

Debezium connects to the TMS/WMS database (MySQL, PostgreSQL, Oracle, SQL Server), monitors the transaction log, and publishes each change as an event to Kafka. No modifications to the legacy system required.

TMS (Oracle) -> Debezium -> Kafka topic: tms.shipments.changes
  -> Transformer: map legacy structure to domain events
  -> Kafka topic: shipments.lifecycle

The transformer is key. Legacy TMS data won’t match your domain event structure. A field STATUS_CD = 'DLV' in the SHIPMENTS table needs to map to a shipment.delivered event with the correct structure. This mapping is system-specific and is where integration effort concentrates.

Real-world CDC problems in logistics:

Duplicate events: Debezium guarantees at-least-once delivery. Your consumer needs to be idempotent. If a shipment.delivered event arrives twice, it must not generate two invoices.

Incomplete data: The legacy TMS stores information differently from how you need it. The event may require enrichment from other sources (client master data, tariffs) before it’s useful. This is solved with an enrichment service that queries additional sources and completes the event.

Latency vs consistency: CDC from the transaction log has seconds of latency. If you need immediate consistency (the invoice generates at the exact moment of delivery), you need direct signaling from the operational process, not from CDC.

APIs as an intermediate gateway

For legacy systems that don’t allow transaction log access (some proprietary TMS platforms), the alternative is building an API layer over the legacy system:

  1. An API wrapper exposing TMS data via REST.
  2. A polling service querying the API every N seconds for changes.
  3. The polling service publishes detected changes as events to Kafka.

Less elegant than CDC and higher latency (depends on polling interval). But it works when CDC isn’t viable. The critical aspect: the polling service must be resilient — maintaining a cursor of the last processed change and recovering from failures without losing or duplicating events.

Processing patterns

Event Sourcing for traceability

In logistics, traceability isn’t a nice-to-have; it’s a legal requirement (especially for hazardous goods, food products, and international trade). Event sourcing stores shipment state as a sequence of events instead of a mutable record:

SHP-2025-0847:
  [1] shipment.created      (2025-05-20T10:00:00Z)
  [2] shipment.picked-up    (2025-05-20T14:30:00Z)
  [3] shipment.in-transit    (2025-05-20T15:00:00Z)
  [4] shipment.customs.held  (2025-05-22T14:32:00Z)
  [5] shipment.customs.cleared (2025-05-23T09:15:00Z)
  [6] shipment.delivered     (2025-05-24T11:20:00Z)

Current state is derived by replaying events. Any auditor (or client) can see the complete shipment history. When someone asks “why was this shipment delayed?”, the answer is in the event sequence, not in a delay_reason field that someone may or may not have filled in manually.

CQRS for operational queries

The CQRS (Command Query Responsibility Segregation) pattern separates writes (events) from reads (materialized views). It’s particularly useful in logistics because read and write patterns differ dramatically:

  • Write: One event each time something happens. High frequency, low complexity.
  • Read: An operator wants all in-transit shipments, filtered by route, sorted by ETA, with delay alerts. High complexity, low latency tolerance.

The materialized view (a read-optimized table, updated asynchronously by events) serves operational queries without impacting the event pipeline’s performance.

Broker selection

Apache Kafka is the de facto standard for event streaming in logistics. Reasons: high throughput (millions of events/second), configurable retention (events are stored, they don’t disappear after consumption), ordering guarantees within a partition, and a mature ecosystem (Kafka Connect, Kafka Streams, ksqlDB).

Amazon Kinesis / Google Pub/Sub / Azure Event Hubs: Cloud-native alternatives that reduce operational burden. Less flexible than Kafka but easier to operate if you’re already on a cloud provider.

RabbitMQ: Better as a message broker (one consumer processes each message) than as an event broker (multiple consumers process the same event). If your case is more “send tasks to workers” than “publish events for whoever wants to listen,” RabbitMQ is simpler.

For logistics operations with multiple systems consuming the same events, Kafka is the right choice. The operational investment is justified by the flexibility and performance.

Metrics and observability

An event-driven architecture without observability is a black box. Critical metrics:

  • Consumer lag: How many events are unprocessed per consumer. If lag grows, the consumer can’t keep up with production. This is the most important metric.
  • End-to-end latency: Time from event production to last consumer processing. For client notifications, this should be < 30 seconds.
  • Error rate per topic: Events failing during processing. A rate > 1% indicates a systemic problem.
  • Dead letter queue size: Events that failed repeatedly and were sent to the error queue. Each event in the DLQ is an unresolved operational incident.

Tools: Kafka UI (or Confluent Control Center) for cluster monitoring. Prometheus + Grafana for application metrics. Alerts on consumer lag and DLQ size are the two that deserve waking someone at 3am.

Start small

You don’t need to migrate all your operations to event-driven at once. The pattern that works:

  1. Choose one flow: Client tracking notifications. Visible, immediate value, and doesn’t require modifying core systems.
  2. Implement CDC on the existing TMS to capture shipment status changes.
  3. Publish events to Kafka.
  4. Build the notification service as an event consumer.
  5. Measure: Notification latency, delivery rate, reduction in call center volume.
  6. Expand: Add automated exception flows, re-routing, invoicing.

The first flow’s value is proving the architecture works and quantifying impact. With real data, it’s much easier to justify the investment to expand to subsequent flows.

If real-time shipment tracking is your primary use case, we detail the architecture in our article on real-time shipment tracking. Logistics doesn’t need to be time-consuming and manual. The events are happening; the question is whether your systems are listening or ignoring them.

About the author

A

abemon engineering

Engineering team

Multidisciplinary engineering, data and AI team headquartered in the Canary Islands. We build, deploy and operate custom software solutions for companies at any scale.