← Back to Blog

The Composite Index Nobody Can Read

Your composite index covers every column but EXPLAIN still shows a full scan. Column order, not column presence, determines whether PostgreSQL uses the index.

I added a composite index on three columns. The query was still slow. I checked the index existed. I checked the columns matched the WHERE clause. I checked the statistics were fresh. The query was still slow.

EXPLAIN ANALYZE showed a sequential scan. The planner ignored my index entirely. I had built it. I had paid the storage. I had paid the write overhead on every INSERT and UPDATE. And the query that was supposed to benefit was reading the whole table.

The cause was column order. I had created an index on (status, created_at, user_id). The query was WHERE user_id = ? AND status = 'active'. It was not filtering on the leading column with a useful pattern, so the index was useless. The fix was to drop it and create (user_id, status) instead, which the planner found and used immediately.

This is the trap with composite indexes. They look like they cover the columns. The columns are right there in the index. But the order is everything, and getting it wrong means you have paid the cost of an index that nobody uses.

What a Composite Index Actually Is

A composite index is a single B-tree sorted on multiple columns, left to right. If you index (a, b, c), the database stores the rows sorted first by a, then within each value of a sorted by b, then within each (a, b) pair sorted by c. It is the same way you might sort a contact list by last name, then first name, then middle initial. The hierarchy of sort keys is what makes the index useful.

This sort order is everything. It dictates which queries the index can answer and which queries it cannot. Column ordering is not interchangeable. Data is laid out on disk in a specific way, and the planner can only use the index in ways that respect that layout.

The Leftmost Prefix Rule

The leftmost prefix rule says: a composite index can answer queries that filter on a contiguous prefix of its columns starting from the left. An index on (a, b, c) can be used for queries on (a), (a, b), or (a, b, c). It cannot be used for queries on (b), (c), (b, c), or (c, a).

The reason is mechanical. To binary-search a B-tree, you need a starting point. The first column's sort order gives you that starting point. Without filtering on the first column, every entry in the tree is a candidate, and you might as well scan the whole thing.

Here is a concrete example. Suppose you index (user_id, created_at) on an orders table.

CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

Run WHERE user_id = 1234 and the index gets used instantly. The planner jumps to the section of the tree where user_id = 1234 and reads the matching rows. Fast.

Add a date predicate, WHERE user_id = 1234 AND created_at > '2026-01-01', and the index gets used even better. After jumping to user_id = 1234, entries within that section are sorted by created_at, so the planner can binary-search again to find the date range. Very fast.

But WHERE created_at > '2026-01-01' on its own cannot use this index at all. The index is sorted by user_id first, so dates are scattered throughout the tree. There is no efficient way to find rows from a specific date range without scanning the whole thing. The planner falls back to a sequential scan or finds a different index.

This is the rule that catches everyone. You see the index, you see the column you are filtering on, and you assume the index applies. It does not, because the column you care about is buried inside the sort hierarchy with no way to access it directly.

Why (a, b, c) Is Not the Same as (c, b, a)

This is a corollary of the leftmost prefix rule, but it surprises people enough to deserve its own section.

Two composite indexes with the same columns in different orders are different indexes. They cover different queries. They cannot substitute for each other.

(user_id, status, created_at) answers queries on user_id, on (user_id, status), or on (user_id, status, created_at).

(created_at, status, user_id) answers queries on created_at, on (created_at, status), or on (created_at, status, user_id).

These are completely different sets. A query on just created_at can use the second index but not the first. A query on just user_id can use the first but not the second. Same three columns, ordered differently, answering different questions.

So you do not get to choose column order based on what feels logical or alphabetical. Order has to match how you actually query the table. If you mostly filter by user, lead with user. If you mostly filter by date range, lead with date.

Range Conditions Stop the Prefix

There is a subtler rule that catches even people who understand the leftmost prefix. A range condition uses the index up through the range column, but everything after that column is filtered manually after rows come back from the index.

Consider an index on (user_id, status, created_at).

WHERE user_id = 1234 AND status = 'active' AND created_at > '2026-01-01' is the best case. Equality on user_id, equality on status, range on created_at. The index handles all three.

WHERE user_id = 1234 AND status > 'a' AND created_at > '2026-01-01' is different. The range on status means everything past status cannot use the index. Within the section for user_id = 1234, the index is sorted by status, so the range filter scans through it efficiently. But the created_at filter cannot benefit from index ordering anymore, because rows for user_id = 1234 AND status > 'a' are not contiguously sorted by created_at. The database fetches all of them and applies the date filter row by row.

This means the prefix rule is really about contiguous equality predicates. As soon as you hit a range, the rest of the columns cease to be useful for index lookups. The planner uses the index up to and including the first range, and post-filters from there.

The practical consequence: equality predicates should come before range predicates in your column order. If the column you filter with a range is the leading column, you stop being able to use later columns at all. Always put equality first.

How to Choose Column Order

A few rules of thumb that work for most cases.

First, equality before range. Whatever you filter with = or IN should come before whatever you filter with >, <, BETWEEN, or LIKE 'prefix%'.

Second, ORDER BY at the end. If your query both filters and sorts, the sort columns should match the trailing columns of your index. An index on (user_id, created_at) not only filters by user but lets you grab the most recent orders without a separate sort step.

Third, most selective first when both are equality. If both user_id and status are equality predicates, lead with whichever filters more rows. user_id is usually highly selective; status is usually not, since most rows have one of a small handful of values. Lead with user_id.

Fourth, your access patterns dictate everything. If 90 percent of your queries filter by date and the other 10 percent filter by user, you might be better off with two indexes: (created_at, ...) for the common case and (user_id, ...) for the rare case. One index does not have to serve every query.

The mistake people make is creating one big composite index covering every column they ever filter on, in alphabetical order or whatever order seems neat. Composite indexes are tools shaped by your specific queries. They are not a generic solution.

Reading EXPLAIN

EXPLAIN tells you exactly what the planner is doing with your index. There are two key things to look for in PostgreSQL output.

Index Cond: means the planner used the index to filter. The condition listed is what the index was able to evaluate during the scan.

Filter: means the planner returned rows from the index (or a sequential scan) and then applied additional filtering in memory after the fetch.

If your query has a Filter line where you expected an Index Cond, the index is not doing what you thought. Either the column order is wrong, or a range condition stopped the prefix, or the planner decided a sequential scan was cheaper for this case.

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 1234 AND status = 'active' AND created_at > '2026-01-01';

A good plan looks like this:

Index Scan using idx_orders_user_status_date on orders
  Index Cond: (user_id = 1234 AND status = 'active' AND created_at > '2026-01-01')

Everything is in Index Cond. The index handled the entire WHERE clause.

A bad plan looks like this:

Index Scan using idx_orders_user_id on orders
  Index Cond: (user_id = 1234)
  Filter: (status = 'active' AND created_at > '2026-01-01')

The index only filtered on user_id. The other conditions are applied after the fact. If you have ten thousand orders for this user, all of them get fetched and then filtered in memory.

This is the difference between a query that runs in microseconds and a query that runs in seconds, on the same table, with what looks like the same WHERE clause.

What to Take Away

A composite index with the wrong column order is the worst kind of index. It looks right, it covers the columns, it shows up in pg_indexes. But the planner ignores it. You pay the storage cost. You pay the write overhead on every INSERT, UPDATE, and DELETE. The query you built it for is still slow.

The cost of a wrong index is worse than the cost of no index. With no index, at least you know to look. With a useless index, you think the problem is solved and move on, while every query continues to scan the table and the index sits there as expensive ballast.

The fix is always the same: read your query, look at the WHERE clause, work out what order the predicates need to be in. Equality first, range second, ORDER BY columns last. Drop the useless index. Build the right one. Verify with EXPLAIN that the new index is actually used.

The query I started this post with got fixed in five minutes once I understood what was happening. The composite index on (status, created_at, user_id) was useless to me. Reordering to (user_id, status) made the planner pick it up immediately. Same data, same columns, different order, completely different outcome. The leftmost prefix rule will catch you exactly once before you remember it forever.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 8 min read

Comments (0)