It happened around 2:30am. Our worker queue started backing up. Every batch was failing with the same error: deadlock detected, transaction aborted. The code had been running for two years. Nothing had changed. We were on call, half asleep, staring at logs that all said the same thing. The fix turned out to be eleven characters: ORDER BY id in a SELECT statement buried inside a 200-line transaction.
Deadlocks are one of those database problems that look exotic until they happen, and obvious in hindsight. Not really about concurrency in the abstract. About lock order. Two transactions that need the same locks in different orders will eventually deadlock. The database notices, picks one to kill, and your application sees an exception. If you are lucky, the retry succeeds. If you are not, you have built a system where deadlocks are the steady state.
What a Deadlock Actually Is
A deadlock is a cycle in the wait-for graph. Transaction A holds lock X and wants lock Y. Transaction B holds lock Y and wants lock X. Neither can proceed. Neither will release. Without intervention, both would wait forever.
The database does intervene. PostgreSQL detects deadlocks and aborts one of the transactions, picking whichever it considers the cheaper victim. The other continues. Your application gets a specific error code (40P01, the SQLSTATE for deadlock_detected) and has to decide what to do.
Most people get this part wrong. A deadlock error is not a corruption. Not a bug in the database. It is the database doing exactly what it should: refusing to wait forever. The bug is almost always on your side, in the order you acquire locks.
The Pattern: Lock Order Disagreement
The classic deadlock looks like this:
-- Transaction A
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
-- Transaction B (running at the same time)
UPDATE accounts SET balance = balance - 50 WHERE id = 2;
UPDATE accounts SET balance = balance + 50 WHERE id = 1;A locks row 1, wants row 2. B locks row 2, wants row 1. Cycle. Deadlock. One of them dies.
This pattern is everywhere. Anywhere you touch multiple rows in a transaction, and the order is determined by something other than a fixed sort, you have a potential deadlock. Money transfers between accounts. Inventory updates across products. Permission grants across users. The application logic looks completely benign in code review. It looks fine in tests. It deadlocks in production under load, sometimes once a week, sometimes once an hour, almost never in a way you can reproduce on demand.
The Less Obvious Pattern: Indexes and Gap Locks
The other category of deadlock comes from locks you did not realize you were taking. Foreign key checks in PostgreSQL acquire share locks on referenced rows. SELECT FOR UPDATE on a parent row blocks any concurrent insert that references it. UPDATE on a row that has a unique index can block on the index, not just the row. Predicate locks at SERIALIZABLE isolation can lock ranges you never explicitly named.
This means a transaction that only mentions one table can deadlock against a transaction that does not mention that table at all, because they are both locking on the same parent row through a foreign key relationship. I have seen this happen with users updating their profile while a background job updates their account record, both transactions implicitly touching a shared parent row through different paths.
You cannot reason about deadlocks just by looking at your queries. You have to know what locks PostgreSQL is actually taking, which means reading the locking documentation and the chapter on transaction isolation. The docs are dense but worth the time. The single most useful thing in there is the table mapping each statement type to the lock mode it acquires. If you do not internalize that mapping, your deadlock investigations are guesswork.
Just Retry It (Yes, Really)
The first instinct when you see a deadlock error is to redesign your locking strategy. Sometimes that is the right answer. Most of the time, it is not. The realistic answer for most applications is: catch the deadlock exception, retry the transaction with a small backoff, and move on.
@Retryable(
retryFor = DeadlockLoserDataAccessException.class,
maxAttempts = 3,
backoff = @Backoff(delay = 50, multiplier = 2.0)
)
public void transferFunds(Long fromId, Long toId, BigDecimal amount) {
accountRepository.debit(fromId, amount);
accountRepository.credit(toId, amount);
}Two reasons this works. First, deadlocks are usually rare. If you are getting more than a percent or two of your transactions deadlocking, you have a structural problem and retries will not save you. If you are getting one in a thousand, retries are nearly free. Second, the retry runs against a slightly different state of the database, with different concurrent transactions in flight. The exact conditions that caused the deadlock are usually gone by the time the retry runs.
Hiding deadlocks completely is the mistake. You should retry, but you should also log every deadlock with enough context (the involved tables, the queries, the transaction IDs) to spot patterns. A handful of deadlocks per day is fine. A spike to thousands per hour means something changed and you need to know.
The Order-By-Id Rule
For deadlocks caused by lock order disagreements, there is a simple rule that fixes most of them: lock things in a deterministic order. Almost always, that means by primary key, ascending.
-- Bad: order depends on which id was passed first
UPDATE accounts SET balance = balance - 100 WHERE id = ?;
UPDATE accounts SET balance = balance + 100 WHERE id = ?;
-- Good: lock the smaller id first, no matter the direction of transfer
SELECT * FROM accounts WHERE id IN (?, ?) ORDER BY id FOR UPDATE;
-- Then do the math in app code and update both rowsThis is the eleven-character fix from the opening. The deadlock at 2:30am was not really in our application code. It was in a SELECT FOR UPDATE inside a transaction, returning rows in whatever order the planner chose. Adding ORDER BY id forced every concurrent transaction to take its locks in the same order, and the deadlock disappeared. That ORDER BY had been missing for two years. The deadlock only started appearing when traffic crossed a threshold.
The same principle applies to bulk updates. If you are updating a batch of rows, sort them by primary key in the application before you start the transaction. Every concurrent batch will then acquire locks in the same order, and lock-order deadlocks become statistically impossible.
SKIP LOCKED for Queues
If you are using your database as a job queue (which an existing post on this site argues you should not, but everyone does anyway), the deadlock pattern shifts. Multiple workers trying to claim the next job will fight for the same rows. The naive implementation deadlocks under load.
Fix it with FOR UPDATE SKIP LOCKED, available in PostgreSQL since 9.5:
SELECT * FROM jobs
WHERE status = 'pending'
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED;Each worker picks up a different unlocked row. No contention, no deadlocks, no fighting. This single feature is the reason a lot of people get away with using their database as a queue at moderate scale. Without it, you are inventing your own locking, and you will get it wrong.
What You Cannot Fix in Code
Some deadlocks come from the locking model itself and there is nothing your application can do about them. The most common in PostgreSQL is at SERIALIZABLE isolation, where predicate locks turn read-write conflicts into deadlock-style serialization failures (40001, not 40P01, but the retry pattern is identical). If you are seeing a steady stream of these and your queries look correct, the answer is usually dropping to READ COMMITTED, which has its own tradeoffs (the isolation levels post on this site goes deep on those).
Another thing often out of your control is the deadlock detector's choice of victim. Most databases pick whichever transaction has done less work, but the heuristic is rough. If you have a long-running transaction that keeps getting picked as the victim, splitting it into smaller transactions will help. Not really a deadlock fix. A transaction design fix that happens to reduce the cost of deadlocks when they do occur.
Deadlocks are one of those problems where the right mental model is "this will happen, design for it" instead of "make this never happen." You cannot make them never happen unless you serialize all writes, which kills performance. You can make them rare, recoverable, and visible. That is the actual goal.
Comments (0)