← Back to Blog

Distributed Transactions Are a Lie

Why two-phase commit fails in production distributed systems, and what engineers actually use instead: sagas, the outbox pattern, and eventual consistency.

At some point in the evolution of every microservices system, someone realizes they have a problem. They need to do two things atomically. Maybe it's creating an order and deducting inventory. Maybe it's charging a payment and provisioning an account. Maybe it's writing to a database and publishing an event. Two operations, two services, one logical unit of work. And someone, usually with the best intentions, says: we need a distributed transaction.

The idea is appealing because it maps onto something we already understand. Database transactions work. You begin, you write, you commit. Either everything happens or nothing does. ACID guarantees. Rollback on failure. Clean. Predictable. Safe.

Distributed transactions promise the same semantics across multiple services or databases. The pitch is that you get atomicity without redesigning your system. But the more I've worked with these systems in production, the more I've come to believe that distributed transactions don't actually solve the problem. They relocate it, obscure it, and make it someone else's problem until it isn't.

What Two-Phase Commit Actually Does

Two-phase commit (2PC) is the classic protocol for distributed transactions. It involves a coordinator and a set of participants. The coordinator drives two phases: first, it asks all participants to prepare, locking resources and indicating whether they can commit. Second, based on the responses, it tells all participants to either commit or abort.

On paper, this gives you atomicity. If any participant says it can't prepare, everyone aborts. If all participants say they're ready, everyone commits. No partial writes.

In practice, three things happen that make this unworkable at scale.

First, the protocol is blocking. During the prepare phase, participants hold locks on the resources they've tentatively modified. They hold those locks until they hear back from the coordinator with a commit or abort decision. If the coordinator crashes between the two phases, after participants have prepared but before they've been told to commit, the participants are stuck. They can't commit because they haven't received the decision. They can't abort because the coordinator might have decided to commit before crashing. They hold their locks indefinitely, waiting for a coordinator that may never return.

This isn't a theoretical failure mode. Coordinators crash. Networks partition. In a system that runs continuously, every failure mode that can happen will happen. The question is always when, not if.

Second, the latency is multiplicative. A distributed transaction requires at least two round trips to every participant before anything commits. In a monolith with a single database, a transaction commits in milliseconds. Add three services across different availability zones with a coordinator in the middle, and now you're waiting on network round trips multiplied by phases multiplied by participants. Under load, with retries, with coordinator elections after a failure, this latency compounds quickly. I've seen systems where a 2PC-backed write path added 800ms to operations that users expected to feel instant.

Third, the correctness guarantees are weaker than they look. 2PC guarantees atomicity if the coordinator is available. It says nothing about what happens when the coordinator crashes mid-protocol. Some implementations try to handle coordinator recovery, but those implementations are famously hard to get right. The original paper describing 2PC was published in 1978. Databases have had decades to implement it carefully, with full control over the underlying storage and recovery log. Implementing it correctly across independently deployed services with independent failure domains is a different problem entirely.

The Saga Pattern

When you give up on distributed transactions, you need a way to coordinate multi-step operations that can fail partway through. The saga pattern is the answer most teams eventually land on, and it's worth understanding it clearly because it's not magic. It just moves the complexity somewhere explicit.

A saga is a sequence of local transactions, each of which updates a single service and publishes an event or message to trigger the next step. If a step fails, the saga executes compensating transactions to undo the work done by previous steps.

Let's say you're placing an order. In a traditional distributed transaction, you'd lock inventory and charge the payment in one breath. The order record is created and the whole thing commits atomically. With a saga, it looks like this:

Step one: create the order in PENDING state. Step two: reserve inventory. If that fails, cancel the order. Step three: charge the payment. If that fails, release the inventory reservation and cancel the order. Step four: confirm the order and send the confirmation email.

Each step is a local transaction. Each step has a corresponding compensating transaction that can undo it. The saga either runs to completion or runs compensations back to the beginning.

This is honest about what's actually happening. There is no global lock. Between step two and step three, inventory is reserved but the order isn't paid for. That inconsistency is real and exists for however long step three takes. The system is temporarily inconsistent. Sagas don't hide this. They embrace it and make it manageable.

There are two ways to implement sagas. Choreography, where each service publishes events and other services react to them, with no central coordinator. And orchestration, where a central saga orchestrator tells each service what to do and handles failures and compensations explicitly.

Choreography scales well and has no single point of failure, but it's hard to follow. The saga logic is distributed across multiple services. To understand what happens when a payment fails, you have to read the code in the payment service and the inventory service while keeping the order service's flow in your head. Debugging a failed saga means correlating events across multiple services and event streams.

Orchestration is easier to reason about since the saga logic lives in one place, but the orchestrator becomes a dependency. I've built both, and for anything complex I reach for orchestration first. The centralized visibility is worth the added coordination. When a saga fails in production at 2am, I want to look at one place, not five.

Compensating Transactions Are Not Rollbacks

This is the part people underestimate when they first adopt sagas. Compensating transactions are not the same as database rollbacks. A rollback is instantaneous: it undoes uncommitted changes before they're visible. A compensating transaction executes after a step has already committed and been visible to the rest of the system.

What that means is: between the time a step commits and the time its compensation runs, the world has seen the intermediate state. Other systems may have read it. External services may have acted on it. An email may have been sent. A webhook may have fired. You can't un-send an email with a compensating transaction.

In practice, compensating transactions are business operations, not technical rollbacks. "Release the inventory reservation" is a business action with its own edge cases. What if the inventory was already consumed by someone else between the reservation and the release? What if the service is down when you try to compensate? You need to handle all of this, which means compensating transactions need to be idempotent, retryable, and durable.

This is why saga implementations need persistent state. If your orchestrator crashes while running compensations, it needs to resume where it left off when it comes back. If it retries a compensation step, the step needs to handle being called twice without causing a double-refund or a double-release. This is real engineering work. It's not as simple as catching an exception and calling an undo method.

I've seen teams implement sagas with in-memory state, where the saga runs as a single HTTP request chain, and if anything crashes, the state is gone. This doesn't work. Sagas need to be persisted so they can survive crashes, deployments, and network failures. The state machine that represents "where is this saga in its sequence of steps" needs to live in a database, and every state transition needs to be durable before the next step is triggered.

Eventual Consistency Is Not Eventual Incorrectness

The deeper shift that sagas require is accepting eventual consistency. The system will be temporarily inconsistent between steps. That's the tradeoff. And a lot of engineers who come from relational database backgrounds treat this as unacceptable.

The thing is, many systems that people think require strong consistency actually don't. They just haven't thought carefully about what they actually need. Strong consistency means every read sees the most recent write. Eventual consistency means all reads will eventually see all writes, but there's a window of time where they might not.

For most business operations, the window matters more than the guarantee. An e-commerce order that takes 200ms to fully propagate across three services is fine. A financial transaction that leaves a ledger in an inconsistent state for 50ms while two records update is fine if no other operation can read that inconsistent intermediate state. The question is not "is there a consistency window" but "can anything go wrong during that window."

When I review systems that claim to need distributed transactions, I usually find one of three things. First, the operations are genuinely independent and the perceived coupling is artificial, meaning you don't actually need them in the same transaction. Second, atomicity is asked for, but the correctness requirement is actually satisfied with idempotency and compensations, no locks needed. Third, the operations genuinely need to be atomic, which usually means they shouldn't be in separate services.

That third case is worth examining. If two operations genuinely cannot tolerate any inconsistency window between them, that's a signal that they might belong in the same service sharing the same database. The unit of strong consistency in a distributed system is the individual service and its database. If you need strong consistency between two things, co-locate them.

The Outbox Pattern for the Transition Points

There's one specific problem that sagas don't directly solve: the gap between writing to a database and publishing an event. After step one of a saga commits locally, something needs to trigger step two. Usually that means publishing a message. But publishing a message is a separate I/O operation and it can fail independently of the database write. If you commit to the database and then fail to publish the message, your saga is stuck.

The transactional outbox pattern solves this. Instead of publishing directly to a message broker, you write the message to an outbox table in the same database transaction as the business operation. A separate process, either a poller or a CDC connector like Debezium, reads from the outbox table and publishes to the actual broker.

The database transaction guarantees that either both the business write and the outbox write happen, or neither does. Publishing from outbox to broker can fail and retry independently without affecting the business data. Eventually the message lands, exactly once from the perspective of the broker.

This pattern does mean at-least-once delivery. The outbox poller might publish a message, crash before marking it as published, and publish it again when it restarts. Consumers need to handle duplicate messages idempotently. But idempotent consumers are something you should be building anyway in any distributed system. If your system can't handle receiving the same message twice, you have a fragility problem that's separate from transactions.

What I Actually Do Now

When I'm designing a system that needs multi-step consistency, I follow a simple sequence.

First, I ask whether the operations actually need to be in the same transaction or whether they only feel like they should be. A lot of perceived atomicity requirements dissolve under scrutiny. Creating a user and sending a welcome email don't need to be atomic. The email can fail and be retried independently. The user is created. That's the source of truth.

Second, if the operations genuinely need coordinated execution, orchestrated sagas with a persistent state machine are where I go. I'm not proud of the complexity this adds, but I'm honest about it. The complexity doesn't come from choosing sagas. It was always there, hidden inside the assumption that distributed transactions would work.

Third, the outbox pattern goes at every service boundary where a database write needs to trigger an event. I treat it as a standard part of the integration pattern, not an optimization to add later.

Fourth, I design consumers to be idempotent from the start. Every message handler checks whether it has already processed a given message before doing work. This is cheap to implement and eliminates an entire class of bugs that appear at the worst possible moments: during retries after failures, during deployments, during the incidents you're already scrambling to resolve.

And fifth, when I find two operations that genuinely cannot tolerate any consistency window between them, I put them in the same service. That's not a failure of microservices. That's recognizing that service boundaries should follow consistency boundaries, not just organizational ones.

Distributed transactions sound like they give you the best of both worlds: distributed systems with transactional guarantees. What they actually give you is a coordinator that can block your entire system during failures, latency that compounds with every participant, and a false sense of safety that discourages you from thinking clearly about what your system actually needs.

Sagas are harder to think about. Eventual consistency requires more careful design. Compensations require real engineering effort. But they're honest about what distributed systems are: a collection of independent parts that can fail independently, and they let you build systems that behave predictably when they do.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 9 min read

Comments (0)