← Back to Blog

Why Your Distributed Lock Doesn't Lock

Distributed locks don't provide mutual exclusion. Fencing tokens, GC pauses, clock drift, and why the lock you wrote is actually a polite hint at best.

The lock that wasn't

A team I worked with had a Redis distributed lock guarding a billing job. The job processed customer invoices, debited their stripe accounts, and marked the invoice as paid. Only one worker should run at a time. They used SET NX with a 60-second TTL. Standard pattern from every Redis tutorial.

One Tuesday in March, a customer was charged twice for the same invoice. The team checked the logs. Two workers had held the lock at the same time. The Redis console confirmed only one lock had been set. Both workers had it. Both worked on the same invoice. Both billed the customer.

This was not a Redis bug. Redis behaved correctly. The lock fired correctly. What the lock does not provide is mutual exclusion in the way the team assumed. No distributed lock does.

What a lock is supposed to do

A mutex inside a single process guarantees mutual exclusion. When you hold the mutex, no one else holds it. The OS thread scheduler enforces that. If your code panics, the mutex is released. If your code holds the mutex for an hour, no one else gets in for an hour. The guarantee is hard.

A distributed lock tries to offer the same guarantee across machines. It cannot. The reason is that the process holding the lock can be paused, partitioned, or lied to without knowing it is paused, partitioned, or being lied to. By the time it resumes and tries to act on the lock, the lock may have expired and been handed to someone else. The process does not know. It still thinks it has the lock.

The GC pause that hands the lock to two processes

Here is the canonical failure, originally described by Martin Kleppmann.

Process A acquires a lock with a 60-second TTL. It reads the resource it is supposed to mutate. Then, before writing, A's JVM enters a stop-the-world GC pause. The pause lasts 90 seconds. Meanwhile, Redis (or whatever lock service) sees the lock has expired and starts handing it out again. Process B acquires the lock, reads the resource, writes a new value. A's GC pause ends. A does its write. Two writes. Two processes that both believed they held the lock.

Replace "JVM GC pause" with "OS scheduler pause," "VM live migration," "container paused during health check failure," or "kernel page fault hitting a slow disk." All produce the same outcome.

Redlock does not save you

Redis's Redlock algorithm tries to harden the single-Redis case by acquiring the lock from a majority of independent Redis nodes. The pitch: even if one Redis dies, the lock is safe. That pitch is true for the failure mode it addresses (a single Redis crashing), and false for the failure mode that actually matters.

The Kleppmann critique, summarized: Redlock relies on bounded clock drift and bounded request latency to be safe. Neither assumption holds under real network conditions. The algorithm assumes time progresses the same way on every node. Time does not.

Antirez (Redis's author) wrote a thoughtful response. Read both threads if you want the full picture. The short version: Redlock is no worse than other distributed locks. Solving the underlying problem is not what any of them do.

Fencing tokens: the part nobody implements

The fix Kleppmann proposes is fencing tokens. Each time a process acquires the lock, the lock service hands back a monotonically increasing number (a token). When the process performs the write, it includes the token. The resource being protected, not the lock, checks the token against the highest token it has seen. Older tokens are rejected.

Walk through the GC pause again with fencing tokens. Process A acquires the lock, gets token 42. It pauses. Meanwhile, Process B acquires the lock, gets token 43, performs the write, and the resource records "highest token seen: 43." When A wakes up and tries its write with token 42, the resource rejects it because 42 < 43.

Two things have to be true for this to work: the lock service has to issue increasing tokens, and the resource (the database, the file, the API) has to enforce the token. The second part is the one nobody implements. It requires the storage layer to know about the lock, which means the lock cannot be opaque to the resource.

Most distributed lock implementations in the wild do not have fencing tokens. SET NX in Redis does not. A simple Postgres row lock does not. The thing you call "a distributed lock" in your codebase almost certainly does not. Which means it does not actually provide mutual exclusion.

Postgres advisory locks: better, with a caveat

Postgres advisory locks (pg_advisory_lock) are better than Redis SETNX for one reason: they are tied to a session. If the process holding the lock dies, the connection drops, and Postgres releases the lock. There is no TTL race, because there is no TTL. The lock lives as long as the connection lives.

The caveat is that this only helps if your process is healthy enough to maintain the connection. A GC pause does not close the connection. The connection still appears healthy to Postgres while your process is frozen. Same race as before, just a different layer.

For most use cases where the work is short and the process is well-behaved, Postgres advisory locks are the right answer. They are simple, transactional, and the failure modes are bounded. If you are reaching for Redis SETNX, reach for pg_advisory_xact_lock instead. It is the same primitive with better failure semantics.

Zookeeper, etcd, Consul: less wrong, not right

Consensus-based systems (Zookeeper, etcd, Consul) provide locks with better semantics than Redis. They handle leader election, session timeouts, and ordering correctly. Ephemeral nodes give you connection-tied locks similar to Postgres advisory locks.

What they still do not solve is the process-pause problem. A node that pauses during a GC, or gets partitioned and then heals, can still hold a stale lock that the system thinks it has handed off. Consensus does not fix this. The application has to.

What these systems give you is a set of primitives (session IDs, version numbers, watch counters) that you can use to build fencing yourself. Their disadvantage is that you have to know to do it.

Lock vs lease: same idea, different vocabulary

The honest framing is that distributed "locks" are leases. A lease is a time-bounded reservation. It expires. You do not own anything across the expiration boundary. If you act on data after your lease expires, you are doing so without the lease's protection. The naming "lock" implies the OS-style guarantee. The semantics are nothing like an OS lock.

If you internalize "lease, not lock," a lot of patterns fall out automatically. You do not assume the lease still holds when your code resumes from a long operation. You re-check. You build retry semantics that assume the lease may have lapsed. You write idempotent operations at the resource level.

The pattern that actually works

The shortest path to correctness in distributed-lock-heavy systems is to stop relying on the lock for correctness. Use the lock as a performance hint, not a safety boundary. Then make the resource itself the boundary.

For the billing-job war story at the top: the fix was a unique constraint on (invoice_id, status='paid') in the database. Writing 'paid' to an already-paid invoice fails outright, regardless of how many workers think they hold the lock. Workers can still race for the lock and waste cycles, because the lock keeps the system from doing twice the work most of the time. But the constraint is what prevents the double-charge.

This is the pattern: a soft lock for performance, an idempotent resource for correctness. The soft lock can be Redis SETNX. Idempotent resources come in many shapes: a unique constraint, an upsert, a token check, a state-machine transition rule, or a transactional CAS. Either alone is wrong. Together they are correct.

What I reach for now

By default, Postgres advisory locks. Free, transactional, no TTL race, the failure semantics match the database's failure semantics. If the work I am protecting touches Postgres anyway (which is usually true), I do not need a second system.

For coordination across services that do not share a database, etcd or Consul. The session-ephemeral primitive is the closest thing to a real lock you can get.

For Redis specifically: only when latency matters more than correctness, and only paired with an idempotent resource. Redis SETNX is fine as a performance optimization for jobs that are already safe to run twice. It is dangerous as a correctness boundary for jobs that are not.

And the final guidance, which is older than any of these systems: design the operation so it does not need a lock at all. Idempotent writes. Conditional updates. Compare-and-swap. If you can avoid the question "who holds the lock right now," you avoid every failure mode this post lists. That is the engineering work distributed systems actually reward.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 6 min read

Comments (0)