← Back to Blog

Database Isolation Levels Are a Contract, Not a Dial

Isolation levels define which anomalies you tolerate, not how much correctness you get. The SQL standard and what databases implement diverged decades ago.

We shipped a feature that gave users unique discount codes. Each code was supposed to be claimed exactly once. Two users claimed the same code within the first hour.

The claim logic was wrapped in a transaction. That should have been enough, I thought. A coworker suggested changing the isolation level to SERIALIZABLE. One word in the connection config. Redeploy. Done.

Within a day, our error dashboard was full of exceptions I had never seen before. could not serialize access due to concurrent update. The application was crashing at random, mostly during checkout, which is the worst possible place. I had fixed the double-claim bug and introduced a worse one. The problem was that I did not understand what isolation levels actually do, and changing one without understanding the tradeoff is how you trade a rare bug for a common one.

What READ COMMITTED Actually Guarantees

Most databases default to READ COMMITTED. PostgreSQL does. Oracle does. SQL Server does. Most developers write their code against this default without thinking about it.

The guarantee READ COMMITTED provides is small. It says that within your transaction, you will not see data from another transaction that has not yet committed. That is it. Everything else is fair game. If another transaction commits while yours is running, you might see their changes. If you read the same row twice in the same transaction, you might get different values. If you run the same query twice, new rows might appear between the runs.

This is fine for most operations. Read a user, update a field, commit. The transaction is short and the anomalies rarely matter. But the moment you write a transaction that makes a decision based on reading data, then writes something based on that decision, READ COMMITTED is not protecting you the way you think it is.

That was the bug. We read the discount code, checked that no one had claimed it, then inserted a claim record. Two transactions did this concurrently. Both read the code as unclaimed. Both inserted a claim. Both committed. The database never saw a conflict because the isolation level does not check for one.

The Anomalies

The SQL standard defines isolation by the anomalies it prevents, not by the mechanisms. Three anomalies are named in the standard.

A dirty read is when you read data from another transaction before it has committed. If that transaction rolls back, you read data that never officially existed. No sane default allows this anymore, but READ UNCOMMITTED does, and it still exists in some database engines as an option. Do not use it.

A non-repeatable read is when you read the same row twice in the same transaction and get different values because another transaction updated and committed between your reads. READ COMMITTED allows this. REPEATABLE READ does not.

Run the same query twice in the same transaction and the second run returns rows that were not there the first time, because another transaction inserted rows that match your WHERE clause. That is a phantom read. REPEATABLE READ in the SQL standard allows this. SERIALIZABLE does not.

There is a fourth anomaly the standard forgot: write skew. Two transactions read the same data, make decisions based on it, then write different rows. Neither conflict is detected by any isolation level weaker than SERIALIZABLE. The discount code bug was exactly this. Each transaction read that the code was unclaimed, and each transaction wrote a different row (a different claim record). No isolation level below SERIALIZABLE catches it.

The Standard Is Not What Your Database Implements

Here is where it gets worse. The SQL standard defined isolation levels decades ago. Databases have diverged from it since.

PostgreSQL's REPEATABLE READ is not actually repeatable read. It is snapshot isolation. Snapshot isolation prevents non-repeatable reads and phantom reads, which is stronger than what the standard requires. But it does not prevent write skew. So PostgreSQL's REPEATABLE READ is stronger than the standard in some ways and still weak in the ways that usually matter.

MySQL's InnoDB default is REPEATABLE READ, but the implementation is different from PostgreSQL's. MySQL uses gap locks to prevent phantom reads at REPEATABLE READ, which PostgreSQL does not need because of snapshot isolation. The practical consequence is that MySQL transactions can deadlock in scenarios where PostgreSQL transactions just retry.

Oracle does not have true REPEATABLE READ at all. It has READ COMMITTED and SERIALIZABLE, and that is it. If your code runs SQL that works on Oracle and you test it on PostgreSQL or MySQL, the concurrency behavior is not identical.

SQL Server's SERIALIZABLE uses range locks, which can cause contention that PostgreSQL's predicate locking does not. PostgreSQL implements serializable snapshot isolation, which is an optimistic approach: the database lets transactions run, then detects conflicts at commit time and aborts one. That is where the serialization errors I saw came from. The database was doing its job correctly.

Same SQL, same isolation level name, different behavior on every database. Names are a compatibility layer that lets you write portable SQL. Actual concurrency semantics? Not portable.

The Hidden Cost of SERIALIZABLE

Changing the isolation level to SERIALIZABLE seemed like a one-line fix. It was not.

SERIALIZABLE gives you the guarantee that concurrent transactions behave as if they ran one at a time, in some order. This is what most developers intuitively want. The cost is that the database can no longer silently resolve conflicts. When two transactions would produce an inconsistent result if both committed, the database aborts one. Your application has to retry.

This is not a bug. It is part of the contract. SERIALIZABLE says: I will make your transactions behave as if they were serial, but I will sometimes reject transactions that cannot be made serial. You must handle the rejection.

In practice this means wrapping every serializable transaction in retry logic. Something like:

for attempt in range(MAX_RETRIES):
    try:
        with db.transaction(isolation='SERIALIZABLE'):
            run_business_logic()
        return
    except SerializationError:
        continue

The application also needs to tolerate the possibility that a transaction gets aborted and retried multiple times, which means it must be idempotent. Side effects inside the transaction (sending emails, calling external APIs) are not rolled back by a SERIALIZABLE retry. Those side effects will execute once per attempt. You need to move them outside the transaction or use an outbox pattern.

This was the part I missed. I bumped the isolation level without adding retry logic or auditing the code for non-idempotent operations inside transactions. The result was that every time two users hit checkout simultaneously, one got a cryptic error and the order failed entirely. I had replaced a rare double-claim with a common checkout failure.

What to Actually Pick

For most operations, READ COMMITTED is correct. Read a record, update a field, commit. The default works. There is no reason to change it.

For operations where you read a row, make a decision, and write based on that decision, you need to decide how the write is protected. The simplest option is SELECT FOR UPDATE, which takes a row lock and holds it for the duration of the transaction. Other transactions that try to read the same row for update will block until you commit. This is the right tool for things like inventory decrement, discount code claims, account balance updates. It is explicit, obvious to the reader, and works correctly at READ COMMITTED.

For operations that span multiple rows where the invariant is about relationships between rows (the classic "two doctors cannot both go off-call at the same time" example), SERIALIZABLE is the right tool. But only if you have retry logic and your transactions are idempotent. Otherwise you are trading one bug for another.

REPEATABLE READ is rarely the right choice. It is stronger than READ COMMITTED but weaker than SERIALIZABLE, and in PostgreSQL it gives you snapshot isolation but not write skew protection. If you need the guarantees it provides, you usually need SERIALIZABLE. If you do not, READ COMMITTED is faster and simpler.

READ UNCOMMITTED should never be used. It allows dirty reads, which means you might read data from a transaction that later rolls back. Your application will make decisions based on state that officially never existed. There is no legitimate reason to use this level.

The Closer

Isolation levels are not a dial you turn up for more correctness. They are a contract between you and the database about which anomalies you will tolerate, and what the database is permitted to do in response when concurrency threatens to produce an anomaly you will not tolerate.

READ COMMITTED says: I will tolerate non-repeatable reads and phantoms, so do not slow me down looking for them. SERIALIZABLE says: I will not tolerate anomalies, so reject my transaction if you cannot guarantee one. Each is correct for different work. Neither is a default you should change without understanding the code that runs under it.

The discount code bug got fixed by adding SELECT ... FOR UPDATE on the code row before the claim insert. Isolation level went back to READ COMMITTED. Serialization errors disappeared. Double-claims stopped happening. Three lines, not one word, but they were the right three lines.

When you find yourself about to change the isolation level, stop and ask what anomaly you are actually trying to prevent. Name it. Then ask whether an explicit lock or a different transaction structure would prevent the same anomaly without changing the default behavior for every other transaction in your application. Usually it would. The isolation level is a blunt tool. Use the sharp ones first.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 7 min read

Comments (0)