Event-driven architectures creates resilient and fail-tolerant software

I do find event-driven architectures to be the pinnacle of resilience and fail-tolerance in software. In fact, that’s pretty much how the real world works: for every action, there’s a reaction.

In synchronous communication, we have an orchestration of interactions between services. The orchestrator needs to know every single service and the order it should be called. High coupling: consumers and producers of information need to constantly know each other. Adding or removing services, or upgrade their interfaces are hard so the owning teams need constantly syncs with each other to know what breaking changes need to be prepared for. It guarantees data is updated after the action is executed.

In asynchronous communication, we have a choreography of services reacting to each other’s events. There’s no orchestrator, no single point of failure (theoretically; realistically there’s the event broker). Adding or removing services can be done at any time, without disrupting other services. Data will eventually be consistent, but there’s no guarantee when exactly that’ll be.

And here’s the kicker: complex systems will most probably use a mix of both. And it’s very healthy that that happens. We have processes that can be eventually consistent, and asynchronous, but there are other systems that need to do things now and need to do it well, or everything must fail. And even in success there needs to be idempotency guarantees.