← Back to Blog

Database Indexes Explained for People Who Keep Forgetting

You've read about database indexes before. You've forgotten most of it. Here's the practical guide you'll actually remember, with PostgreSQL examples.

You've Googled "how do database indexes work" at least three times this year. I know because I have too. Every time, I read an explanation and nod along, feel like I understand it, then forget the specifics within a month. Then a query slows down, someone asks me why, and I'm back on Google. It's a cycle. This is the post I'm writing so we can both stop doing that. Bookmark it. We'll be back.

The One-Sentence Version

An index is a sorted copy of specific columns that lets the database jump straight to the rows it needs instead of reading every single row in the table.

Think of it like the index at the back of a textbook. If you want to find every mention of "B-tree," you don't read the entire book cover to cover. You flip to the index, find "B-tree," and it gives you page numbers. A database index does the same thing. Without it, the database reads every row. With it, the database looks up the value in the index and follows a pointer to the matching rows. That's the whole concept.

When You Need an Index

The theory is nice, but in practice, you don't think about indexes when you're reading docs. You think about them when something is slow. Here are the moments when you should reach for one:

  • A query in your application logs or APM is taking longer than it should.
  • You have a WHERE clause on a table with more than a few thousand rows.
  • You're joining on a column that isn't a primary key.
  • You're doing ORDER BY on a large result set.
  • You're filtering by status, type, or date range on a table that keeps growing.

The rough rule of thumb: if you're searching or sorting on a column regularly, it probably needs an index.

Here's the kind of query that should trigger the thought:

SELECT id, email, created_at
FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 50;

If the users table has 500,000 rows and there's no index on status or created_at, the database scans all 500,000 rows, filters them, sorts them, then returns 50. That's a lot of wasted work. An index fixes this.

The Types That Actually Matter

PostgreSQL supports a bunch of index types. You'll use maybe three or four of them for 95% of your work. Let's skip the exotic ones and cover what you'll actually reach for.

B-tree (the default)

When you write CREATE INDEX without specifying a type, you get a B-tree. It stores data in a sorted tree structure. Lookups, range queries, and sorting are all fast because the data is already ordered.

B-tree indexes work well for:

  • Equality checks: WHERE email = '[email protected]'
  • Range queries: WHERE created_at > '2026-01-01'
  • Sorting: ORDER BY created_at DESC
  • BETWEEN, <, >, <=, >=

This covers the vast majority of queries you'll write. The syntax is simple:

CREATE INDEX idx_users_email ON users (email);

That's it. The database now maintains a sorted structure on the email column. Lookups by email go from scanning the entire table to a quick tree traversal.

Composite Indexes

A composite index covers multiple columns. This is where things get interesting, and where most people (myself included) keep getting tripped up.

CREATE INDEX idx_orders_user_status ON orders (user_id, status);

This index is useful for queries that filter by user_id alone, or by user_id AND status together. But here's the thing everyone forgets: column order matters, and it follows a left-to-right rule.

An index on (user_id, status) works for:

  • WHERE user_id = 42 (uses the index)
  • WHERE user_id = 42 AND status = 'shipped' (uses the index)

It does NOT work for:

  • WHERE status = 'shipped' (cannot use the index)

Think of it like a phone book sorted by last name, then first name. You can look up everyone named "Smith." You can look up "Smith, John." But you can't efficiently look up everyone named "John" regardless of last name. Sorting starts from the left.

This is called the leftmost prefix rule. Any query that starts from the leftmost column and moves right gets to use the index. It can't skip columns. For queries that filter on status alone, you need a separate index on status.

When designing composite indexes, look at your actual query patterns. If you run WHERE user_id = ? AND status = ? a lot, put user_id first (assuming it's more selective). If you also run WHERE user_id = ? without the status filter, the same index covers both cases.

Unique Indexes

Unique indexes do two things: they speed up lookups and they enforce uniqueness at the database level.

CREATE UNIQUE INDEX idx_users_email_unique ON users (email);

You probably already use unique indexes without thinking about it. Primary keys are unique indexes. When you add a UNIQUE constraint to a column in PostgreSQL, it creates a unique index behind the scenes automatically.

Use these for columns where duplicates would break your application: email addresses, usernames, API keys, external IDs. Let the database enforce this rule so you don't have to rely on application-level checks that can fail under concurrent writes.

Partial Indexes

This is the one most people don't know about, and it's surprisingly useful. A partial index only covers rows that match a condition.

CREATE INDEX idx_orders_active ON orders (created_at)
WHERE status = 'active';

Say your orders table has 10 million rows. 95% of them have a status of completed or cancelled. You almost never query those. Your application constantly queries for active orders. A full index on status would include all 10 million rows. A partial index with WHERE status = 'active' only includes the 500,000 active rows. It's smaller, faster to search, and faster to maintain.

The query planner in PostgreSQL is smart enough to use this index when your query includes WHERE status = 'active'. You don't need to do anything special in your SQL. Just write the query as you normally would, and the planner picks the right index.

Partial indexes are great for tables with a status column where you mostly care about one or two statuses. They're also useful for soft deletes. If you have a deleted_at column and you always filter for WHERE deleted_at IS NULL, a partial index on that condition keeps the index small and fast.

The Mistakes Everyone Makes

I've made all of these. Some of them more than once.

Indexing Every Column

Indexes aren't free. Every index takes up disk space, and every INSERT, UPDATE, or DELETE has to update all relevant indexes. If you have eight indexes on a table, every write operation does eight extra maintenance tasks. On a high-write table, this adds up fast.

Only index columns you actually filter, sort, or join on. Don't speculatively add indexes "just in case." Look at your queries first, then add indexes to support them.

Getting Composite Index Order Wrong

I covered this above, but it's common enough to repeat. If your queries filter by created_at but your composite index starts with status, and you're not filtering by status in that query, the index won't be used. Match your index column order to your query patterns. The leftmost prefix rule is easy to understand and easy to forget. I've forgotten it at least twice this year.

Not Checking If the Index Is Actually Used

You created an index. You feel good about it. The query is still slow. What happened?

Several things can cause the query planner to ignore your index:

  • Small tables: If the table has a few hundred rows, a sequential scan is faster than an index lookup. The planner knows this.
  • Type mismatches: Your column is an integer, but your query passes a string. The planner can't use the index because it needs to cast every value.
  • Functions on indexed columns: WHERE LOWER(email) = '[email protected]' won't use an index on email. The index stores the original values, not the lowercased versions. You'd need an index on LOWER(email) instead.
  • Low selectivity: If the query matches 80% of the table, the planner decides a full scan is cheaper than bouncing between the index and the table. It's usually right.

Always verify with EXPLAIN. Don't assume the index is working just because it exists.

Indexing Low-Cardinality Columns Alone

An index on a boolean column like is_active is almost never useful on its own. The column has two possible values. If half the table is true and half is false, the index doesn't narrow things down enough for the planner to bother using it. A sequential scan is about the same speed.

The exception is partial indexes. An index on WHERE is_active = true is useful if only 5% of rows are active. Tiny index, and the planner will use it.

Forgetting About Index Maintenance

PostgreSQL uses MVCC (Multi-Version Concurrency Control), which means updates and deletes don't remove old row versions immediately. They create dead tuples. Your indexes still point to these dead tuples until VACUUM cleans them up.

On tables with heavy updates and deletes, indexes can bloat over time. They get bigger and slower even though the table size stays roughly the same. PostgreSQL runs autovacuum by default, but on very busy tables, the default settings might not keep up. If you notice an index growing without the table growing, look into VACUUM settings and consider running REINDEX during a maintenance window.

How to Check If Your Index Works

This is the part you'll actually copy-paste into your terminal. Use EXPLAIN ANALYZE to see how PostgreSQL executes your query.

EXPLAIN ANALYZE
SELECT id, email, created_at
FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 50;

Without an index, you'll see something like:

Limit  (cost=25432.10..25432.23 rows=50 width=52) (actual time=487.219..487.238 rows=50 loops=1)
  ->  Sort  (cost=25432.10..25934.72 rows=201048 width=52) (actual time=487.217..487.228 rows=50 loops=1)
        Sort Key: created_at DESC
        Sort Method: top-N heapsort  Memory: 32kB
        ->  Seq Scan on users  (cost=0.00..18834.00 rows=201048 width=52) (actual time=0.017..312.684 rows=198472 loops=1)
              Filter: (status = 'active')
              Rows Removed by Filter: 301528
Planning Time: 0.095 ms
Execution Time: 487.312 ms

See that Seq Scan? That means the database read every row in the table. 487ms for a query that should be instant.

Now add an index:

CREATE INDEX idx_users_status_created ON users (status, created_at DESC);

Run the same EXPLAIN ANALYZE again:

Limit  (cost=0.42..2.89 rows=50 width=52) (actual time=0.031..0.118 rows=50 loops=1)
  ->  Index Scan using idx_users_status_created on users  (cost=0.42..9938.47 rows=201048 width=52) (actual time=0.029..0.108 rows=50 loops=1)
        Index Cond: (status = 'active')
Planning Time: 0.182 ms
Execution Time: 0.147 ms

Index Scan instead of Seq Scan. Execution time dropped from 487ms to 0.15ms. That's over 3000x faster. The database went straight to the matching rows using the index instead of reading the whole table.

Here's what to look for in the output:

  • Seq Scan: Full table scan. On a large table with a WHERE clause, this usually means you're missing an index.
  • Index Scan: The database used an index to find the rows. This is what you want.
  • Bitmap Index Scan: The database used the index to build a bitmap of matching rows, then fetched them. Also good. PostgreSQL uses this when an index scan would hit too many rows to fetch one by one.
  • actual time: The real execution time. This is the number you care about.

For more detail, you can use:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ... ;

The BUFFERS option shows how many disk pages were read. Useful when you want to understand I/O impact, not just timing.

The quick diagnostic: if you see Seq Scan on a table with tens of thousands of rows and a WHERE clause that should be selective, you probably need an index. Run EXPLAIN ANALYZE to confirm. Add the index. Re-run EXPLAIN ANALYZE and verify the planner switched to an index scan.

That's the Whole Thing

Bookmark this page. You'll be back in six months when you forget the leftmost prefix rule. I'll probably be rereading it too.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 8 min read

Comments (0)