Falsehoods Software Developers Believe About Event-Driven Systems · Blog

When building a distributed system, a common design pattern is to follow the event-driven approach. Event-driven systems can range from a simple in-memory queue to a serverless AWS Lambda with a preceding queue, or even connected Kafka clusters. when reviewing code implementing an even driven architecture, I see common mistakes that cause toil or even operational incidents once deployed to production.

Here are unordered misconceptions developers have about event-driven architectures. Use this as a checklist for design and code review.

Message ordering

Events will arrive in order
Events will arrive in order, even with a single consumer
Events will arrive in order, even if specified by the producer contract
Events will arrive in order, even with days between messages
Events can always be ordered

Message duplication

Events won't be duplicated
Events won't be duplicated, in at-most-once delivery queues
Events won't be duplicated, even if specified by producer contract
Events won't be duplicated, even with de-duplication upon arrival

Idempotency

Adding an idempotency key ensures idempotency
Equal idempotency keys mean identical payloads
Event timestamp is a valid idempotency key
Writing idempotent code is easy
Maintain and improve idempotent code is easy
Idempotency can be solved via adding a distributed lock and an idempotency key

Load management

Low TPS systems are not subject to backlog
Low TPS systems are not subject throttling
Processing timeout of XX seconds is sufficient
Processing timeout of XX minutes is sufficient
Processing timeout of XX hours is sufficient
Upstream dependencies' timeouts are properly configured
Retry policy is properly configured
Event processing time is constant and performance will remain consistent under load

Producer contract

Event producer can be trusted to always produce valid events
Event producer can be trusted to generally produce valid events
Event producer can be trusted to produce non-conflicting events
Event producer can be trusted to not overload the consumer
Event producer can be trusted to send message on time
Event producer can be trusted to never fail
Event producer can be trusted to rarely fail
Event producer can be trusted, even if it is an internal process

Consumer contract

Event consumer is simple enough to never fail
Event consumer does not need a scaling strategy
Event consumer downstream dependencies support idempotent calls
Events will never be dropped

Dead letter queues

Dead letter queues are not necessary
Dead letter queue is properly configured
Only a handful events will end up in the dead letter queue
Even if there are many events in the DLQ, there are only a few representative error categories
At least it will be easy to sort out the different error categories

Recovery

System does not need manual recovery
Manual recovery won't require modifying events
Manual recovery won't require the producer to regenerate messages
Manual recovery won't coincide with another system failure
Manual recovery will be completed within minutes
Manual recovery will be completed within hours
Manual recovery will be completed within days
Pending manual recovery, events can simply stay in the queue

Architecture

Orchestrated architectures are better and simpler
Choreographed architectures are better and simpler
Unbounded queues are better than bounded queues
Event-driven architecture are simpler to reason about

[list to be updated]