← Back to Tutorials

Database Indexes Explained: A PostgreSQL Walkthrough With EXPLAIN ANALYZE

B-tree, composite, partial indexes proved with EXPLAIN ANALYZE on 500k Postgres rows. Plus the three ways EXPLAIN ANALYZE lies to you. Tests in the repo.

An index is a sorted copy of one or more columns that lets the database jump straight to the rows it needs instead of reading every row in the table. That is the whole concept. Everything else - B-trees, composite indexes, leftmost prefix rules - is detail on top of that one idea.

This tutorial proves every claim with EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) against a real Postgres instance with 500,000 seeded rows. All code is in the companion repo: github.com/umur/database-indexes-explained. Clone it, run mvn verify, and watch the assertions confirm what each index does.

How the Tests Prove Anything

Every test runs the same shape:

  1. Execute EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ... against Testcontainers Postgres.
  2. Parse the JSON plan tree.
  3. Assert on the top-level node type (Seq Scan vs Index Scan vs Bitmap Index Scan) and on buffer hit/read counts.

The seed loads 500,000 orders in a single generate_series insert. user_id is uniformly distributed across 50,000 users. status is heavily skewed: 5% active, 75% completed, 20% cancelled. That skew matters - it is what makes partial indexes interesting later.

INSERT INTO orders (user_id, email, status, amount, created_at, deleted_at)
SELECT
    (i % 50000) + 1,
    'user' || i || '@example.com',
    CASE WHEN i % 20 = 0 THEN 'active'
         WHEN i %  5 = 0 THEN 'cancelled'
         ELSE 'completed' END,
    (random() * 1000)::numeric(10,2),
    NOW() - (random() * 365 * INTERVAL '1 day'),
    CASE WHEN i % 10 = 0 THEN NOW() - (random() * 30 * INTERVAL '1 day') ELSE NULL END
FROM generate_series(1, 500000) AS i;
ANALYZE orders;

The ANALYZE at the end is not optional. Postgres won't pick a sane plan without fresh statistics, and that is its own lesson which we'll come back to.

The Baseline: Seq Scan Dominates

No indexes yet. Find every order for a single user:

SELECT id, email, status, amount
FROM orders
WHERE user_id = 42;

Plan node:

Seq Scan on orders
  Filter: (user_id = 42)
  Rows Removed by Filter: 499990
  Buffers: shared hit=4 read=4163
  Execution Time: 38.412 ms

Every page in the table got read to filter ten matching rows. 4,167 buffer reads on a 500k-row table. That is the worst case, and the default any time you query a column the database has no index on.

Test: BaselineSeqScanIT asserts the top node is Seq Scan and that Rows Removed by Filter is greater than 100,000.

B-tree: The Default and 95% of What You Need

When you write CREATE INDEX without specifying a type, you get a B-tree. It handles equality and range queries, sorts, and BETWEEN filters.

CREATE INDEX idx_orders_user_id ON orders (user_id);

Same query, new plan:

Index Scan using idx_orders_user_id on orders
  Index Cond: (user_id = 42)
  Buffers: shared hit=4
  Execution Time: 0.087 ms

From 4,167 buffer reads to 4. From 38 ms to 0.087 ms. The same query, two plans, three orders of magnitude difference.

Test: BTreeIndexIT creates the index, reruns the query, asserts the top node is Index Scan and that buffers.shared.read is zero on the second run.

Composite Indexes and the Leftmost Prefix Rule

A composite index covers multiple columns. The order of those columns is everything.

CREATE INDEX idx_orders_user_status ON orders (user_id, status);

Think of it as a phone book sorted by last name first, then first name. You can find every "Smith," or "Smith, John," but you can't efficiently find every "John" regardless of last name. The index can serve any query that starts at the leftmost column and moves right. It cannot skip columns.

This composite works for:

  • WHERE user_id = 42 - uses the index
  • WHERE user_id = 42 AND status = 'active' - uses the index

It does not work for:

  • WHERE status = 'active' - falls back to Seq Scan

The second case is the one that catches everyone. You have an index that mentions status, and you assume queries on status use it. They don't.

Tests: CompositeIndexIT proves the first two queries use the index; LeftmostPrefixIT proves the third one falls back to Seq Scan.

Partial Indexes

The seed data is 95% non-active orders. Most queries the application runs only care about active ones. Indexing all 500,000 rows when you only ever query 25,000 of them is wasteful. A partial index restricts the index to a subset:

CREATE INDEX idx_orders_active_created
ON orders (created_at)
WHERE status = 'active';

Now every query that matches the predicate gets a tiny, fast index:

SELECT id, created_at
FROM orders
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 50;
Index Scan Backward using idx_orders_active_created on orders
  Index Cond: (status = 'active')  -- enforced by the predicate
  Buffers: shared hit=8
  Execution Time: 0.241 ms

The index file itself is roughly 5% of the size a full index would be. Less to scan, less to maintain on writes, less to keep in cache. The same trick applies to soft deletes: WHERE deleted_at IS NULL on a table where most rows are deleted lets you index only the live ones.

Test: PartialIndexIT creates the partial index, asserts the active-orders query uses it, and asserts that querying for status = 'completed' with the same index does not use it.

The Pitfalls Everyone Hits

An index exists. The query is still slow. EXPLAIN ANALYZE shows a Seq Scan. Every developer has been here. There are three common reasons.

Functions on the indexed column. The index stores raw values, not transformed ones.

-- Index on email won't be used here
SELECT * FROM orders WHERE LOWER(email) = '[email protected]';

Either store the column already lowercased, or create an index on the expression itself: CREATE INDEX ... ON orders (LOWER(email)).

Type mismatches. If your column is BIGINT and your query passes a string, the planner has to cast every value before it can compare. The index stores the raw type, so the cast disqualifies it.

-- Won't use the integer index on user_id
SELECT * FROM orders WHERE user_id = '42';

The fix is mundane: pass the right type from your driver. Make sure your ORM doesn't quietly stringify integers.

Low selectivity. If a query would match 80% of the table, an index scan is slower than a sequential read. The planner knows this and ignores the index. This is correct behavior. WHERE status = 'completed' on this seed matches 75% of rows, so even with an index on status, the planner picks Seq Scan. That is also why a plain index on a low-cardinality column like is_active rarely helps - a partial index almost always does.

Test: IndexPitfallsIT covers the function-on-column and low-selectivity cases. Each one is a passing assertion that the plan is, in fact, Seq Scan despite an index existing.

Three Ways EXPLAIN ANALYZE Lies to You

I deployed a query once that I had verified with EXPLAIN ANALYZE three times. Two milliseconds every time. The endpoint timed out forty minutes after the deploy. I ran EXPLAIN ANALYZE in production. Two milliseconds. The query was simultaneously fast and eight seconds slow, and both were true.

Lie one: row estimates are fiction. The planner doesn't know how many rows your query will return. It guesses based on statistics from the last ANALYZE run. When statistics are stale, the guess can be off by orders of magnitude, and a wrong guess gives you a wrong plan.

Nested Loop  (cost=0.43..1842.50 rows=52 width=128)
               (actual rows=487231 loops=1)

The planner thought 52 rows. The result was 487,231. A Nested Loop join is fast when the inner side is small and catastrophic when it isn't. The fix: read the gap between estimated and actual at every node. Anything off by more than a factor of ten is a smell. Anything off by a factor of a thousand is a problem you cannot fix with another index.

Lie two: the timing is cached. When you run EXPLAIN ANALYZE, data gets pulled into Postgres' shared buffers and the OS page cache. Run it again to read the plan, and the data is already warm. The cold disk reads that real users hit are invisible.

Seq Scan on orders
  Buffers: shared hit=1240 read=37181
  Execution Time: 4823.112 ms

The read=37181 is the truth. Each page is roughly 8KB, so this is about 290MB pulled from disk. The first request paid that cost. The second one didn't. Always run EXPLAIN (ANALYZE, BUFFERS). If you see mostly read on a cold table, the wall-clock time you measured is optimistic.

Lie three: cost is not time. The numbers in the cost field are an abstract unit. They aren't milliseconds. The planner uses them to compare plans against each other based on assumptions like random_page_cost = 4.0 - which made sense for spinning disks, and is wrong for SSDs.

On modern SSD storage, random I/O is barely more expensive than sequential. The default tells the planner to overweight any plan that does random reads, which means it picks Seq Scan when an Index Scan would be faster. Tuning random_page_cost down to 1.1 on SSD is one of the highest-impact Postgres changes you can make, and almost nobody does it.

Quick Reference

Index TypeWhat it coversUse when
B-tree (default)Equality, range, ORDER BY, BETWEEN95% of cases
CompositeMulti-column, leftmost prefix onlyQueries always filter by the same prefix
UniqueEquality + uniqueness constraintEmail, username, external IDs
PartialSubset matching a WHERE predicateOne status dominates, soft deletes
ExpressionResult of a function on a columnYou can't avoid LOWER(), ::date, etc.

The Rule of Thumb

Index columns you actually filter, sort, or join on. Don't speculatively index. Every index costs disk space and slows every write that touches it. Add indexes after you see the slow query, not before. Then verify with EXPLAIN (ANALYZE, BUFFERS) that the planner actually picked it up - the index existing isn't the same as the index being used.

And remember the three lies. Cached timing flatters every plan. Stale statistics turn good plans into bad ones overnight. Cost numbers reflect the planner's assumptions about your hardware, which it has never actually seen. EXPLAIN is a flashlight in a dark room. What's in the beam is what you see. Nothing else.

The companion repo at github.com/umur/database-indexes-explained has a passing test for every claim above. Run mvn verify against Docker and the assertions tell you exactly what each index does, with parsed plan output you can read.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.