Somewhere in your schema there is a table with id UUID PRIMARY KEY DEFAULT gen_random_uuid(). It was a reasonable choice. The service generates IDs on its own without round-tripping to a sequence, the keys are globally unique so you can merge data from anywhere, and nobody can guess the next one by adding one. Then the table grew, and writes got slower, and the cause never showed up in a slow query log because no individual query was slow.
The problem is not UUIDs. The problem is that gen_random_uuid() returns a fully random version 4 UUID, and a random value is the worst possible thing to use as the key Postgres orders an index by.
What random costs in a B-tree
A Postgres primary key is a B-tree. The B-tree keeps its entries sorted by key. When you insert a row, Postgres finds the leaf page where that key belongs and writes the entry there.
With a sequential key like bigint from an identity column, every new key is larger than the last. Every insert lands on the same rightmost page until it fills, then a fresh page is appended. The hot part of the index is one page wide. It stays in memory. Inserts are close to an append.
With a random key, the next value is equally likely to belong anywhere in the index. Insert a row and Postgres jumps to a random leaf page, reads it in if it is not already cached, writes the entry, and moves on. The next insert jumps somewhere else. The hot part of the index is the entire index.
That single difference is where the cost comes from, and it shows up in several places at once.
The actual line items
Cache misses on write. To insert into a random leaf page, that page has to be in shared_buffers. While the index fits in cache, this is free. Once the index outgrows cache, every insert has a real chance of pulling a cold page off disk before it can write. A sequential key never has this problem because it only ever touches the one page it is already sitting on.
Page splits everywhere. Random inserts fill leaf pages unevenly and trigger splits scattered across the whole tree. A sequential key splits only at the right edge, predictably, and packs pages near full. Random keys leave pages half empty, so the same number of rows needs more pages, which means a larger index that fits in cache even less well. The problem compounds.
WAL volume. The first time a page is touched after a checkpoint, Postgres writes the entire page to the WAL, not just the row. This is the full-page write. Random inserts touch far more distinct pages between checkpoints than sequential inserts do, so they generate far more full-page writes. Your WAL grows faster, your replicas have more to apply, and replication lag tracks it.
Sixteen bytes, repeated. A UUID is 16 bytes against 8 for a bigint. That cost is not just the primary key column. Every foreign key that points at this table carries 16 bytes per row, and every secondary index on this table includes the 16-byte primary key in its leaf entries. The width multiplies across the schema.
Why it looks fine until it doesn't
Here is the trap. On a table small enough that its primary key index fits entirely in shared_buffers, none of this is visible. Every leaf page is already in memory, so the random jumps cost nothing, and insert throughput for a random key looks identical to a sequential one. The benchmark on your laptop passes. So does staging.
The cliff is the moment the index stops fitting in cache. A sequential key does not care, because it only needs one page resident. A random key needs all of it resident and starts paying disk latency on a growing fraction of inserts. Throughput for the random key bends downward while the sequential key stays flat. The table did not change. It crossed a size threshold, and the access pattern that was always inefficient finally started costing money.
This is why the regression never lands in a slow query log. No single insert is slow enough to flag. The whole write path is a little slower, all the time, and it gets worse as you grow.
UUIDv7 keeps the upside and drops the tax
UUIDv7 is a version of the UUID format whose leading bits are a Unix millisecond timestamp, followed by random bits. Two IDs generated in the same millisecond differ in their random tail. Two IDs generated seconds apart sort in time order.
That ordering is the whole point. A UUIDv7 generated now is larger than one generated a minute ago, so inserts land near the right edge of the B-tree, the same way a sequence does. You get back insert locality, the hot page stays hot, page splits move back to the right edge, and full-page writes drop. The benefits you wanted from a UUID are untouched. The value is still globally unique, still generatable on any node without coordination, still safe to merge across systems.
Before Postgres 18 you got this from an extension such as pg_uuidv7, or by generating the ID in the application. Most modern UUID libraries can already produce a v7. Postgres 18 ships it in core:
-- Postgres 18, built in
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT uuidv7(),
customer_id BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);Swapping gen_random_uuid() for uuidv7() on a new table is a one-line change with no downside for most workloads.
What it does not fix
UUIDv7 is not free, and it is not always the right answer.
It is still 16 bytes. The width cost across foreign keys and secondary indexes is exactly the same as v4. If your reason for reaching for UUIDs was anything other than distributed generation, a bigint identity column is smaller and faster on every axis.
It leaks a timestamp. Anyone holding a UUIDv7 can read the millisecond it was created. For an order ID that is usually harmless. For a value you expose publicly where creation time is sensitive, it is a real information disclosure, and you should think about it before exposing the ID.
It is more predictable than v4. The random tail is still wide enough that you cannot guess a specific ID, but a UUID was never a security token. If you are relying on unguessable IDs for access control, that bug exists with v4 too. Fix the authorization, do not lean on the ID format.
Migrating an existing v4 column is real work. Changing the default only affects new rows. Converting the historical data means rewriting the column and every foreign key that references it, which on a large table is a full rewrite under lock unless you stage it carefully. The easy win is on new tables. Existing ones are a migration project.
The decision
If a single writer owns the table and you do not need to generate IDs anywhere but the database, use a bigint identity column. It is 8 bytes, it is sequential by nature, and it has none of the problems above.
If you need to generate IDs in the application, on multiple nodes, or on a client before the row reaches the database, use UUIDv7. You keep distributed generation and you keep insert locality.
The one combination to stop reaching for is a random version 4 UUID as the primary key of a table you expect to grow. It works on the small table where you tested it, and it bills you later, on the production table, in a currency that never appears in a query plan.
Comments (0)