← Back to Blog

The Postgres Index That Never Gets Used

Postgres indexes accumulate. Every perf push adds one. Almost no team removes any. How to find the unused ones and why their write cost is the bigger problem.

Every team that has run a Postgres performance push has added indexes. Almost no team has removed any. The result, on every long-running cluster I've ever inherited, is a database carrying a tax it does not collect. Indexes from a sprint nobody documented. Duplicates of each other in different column orders. The one a contractor added in 2022 to fix a ticket whose number nobody can find.

The cleanup is one query and one Friday. Teams skip it because the cost of an unused index is invisible until it isn't.

What an unused index actually costs

Disk is where every team starts, but it is not the main cost. The actual line items, in rough order of how much they matter:

Write amplification. Every INSERT, UPDATE, and DELETE updates every index that includes the affected columns. A table with six indexes on the hot path takes six index updates per write, plus the heap write. A table with eleven indexes takes eleven. The math is linear. Drop three unused indexes on a write-heavy table and the write path gets meaningfully faster.

VACUUM work. Autovacuum has to walk every index when reclaiming dead tuples. More indexes equals longer vacuum equals more bloat catching up between runs. The cluster that complains about autovacuum lag often has an index problem masquerading as a vacuum problem.

Shared buffers. Index pages compete with data pages for cache. Cold index pages still get pulled into memory during writes and during autovacuum. The unused index is evicting a heap page that the planner would actually use.

WAL volume. Every index update produces a WAL record. Replication lag scales with WAL volume. Every replica is paying for the unused index in extra apply work.

Planner overhead. Postgres considers every applicable index when planning a query. The marginal cost per query is small. At a hundred thousand queries per second across many tables, the marginal cost adds up.

Disk. Less important than people think, but a 50 GB table with 100 GB of indexes is not unusual. The index does not have to be loaded to occupy that space.

How to find them

Start with the catalog view pg_stat_user_indexes. Its idx_scan column counts how many times the planner picked the index for a query, summed since the last stats reset (often since the cluster started).

SELECT
  schemaname,
  relname              AS table_name,
  indexrelname         AS index_name,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

This returns every index with zero scans, biggest first. Start with the big ones, because their write cost is the worst.

What not to drop

idx_scan = 0 does not mean "safe to drop." Four cases burn people:

Primary key indexes. They enforce row uniqueness. They may also be used rarely by the planner because most queries hit a different index. You cannot drop them without changing the table's structure.

UNIQUE constraint indexes. Same reason. The constraint depends on them.

Foreign key supporting indexes. When you DELETE or UPDATE a parent row, Postgres uses the child's index to enforce the foreign key. The lookup is not counted as an idx_scan in older Postgres versions, so the index can look idle while doing real constraint work.

Indexes used only by rare jobs. A monthly billing run. A quarterly reconciliation report. A DR drill. If your stats window is shorter than the rare job's interval, the index looks unused. It is not.

The filter that handles the first two:

SELECT
  s.schemaname, s.relname, s.indexrelname, s.idx_scan,
  pg_size_pretty(pg_relation_size(s.indexrelid)) AS size
FROM pg_stat_user_indexes s
JOIN pg_index i ON i.indexrelid = s.indexrelid
WHERE s.idx_scan = 0
  AND NOT i.indisunique
  AND NOT i.indisprimary
ORDER BY pg_relation_size(s.indexrelid) DESC;

The third and fourth cases need human judgment. Search the schema for foreign key relationships involving the columns the index covers. Search the code and the cron schedule for the rare jobs. The cleanup is not a SQL problem at this step. It is a code-archaeology problem.

The duplicate index trap

A composite index on (a, b, c) can serve queries that filter on a, on (a, b), and on (a, b, c). A second index on (a, b) is a duplicate. Postgres will not warn you when you create it. The planner happily picks one or the other; the writes pay for both.

Finding them takes one query:

SELECT pg_size_pretty(SUM(pg_relation_size(idx))::bigint) AS size,
       (array_agg(idx))[1] AS idx1,
       (array_agg(idx))[2] AS idx2
FROM (
  SELECT indexrelid::regclass AS idx,
         (indrelid::text || E'\n' || indclass::text || E'\n' ||
          indkey::text || E'\n' || COALESCE(indexprs::text, '') || E'\n' ||
          COALESCE(indpred::text, '')) AS key
  FROM pg_index
) sub
GROUP BY key
HAVING COUNT(*) > 1;

Run this once per cluster. Its output is your "pick one and drop the other" list. Pick the more specific one (longer column list) and drop the prefix.

The workflow that does not break anything

The mistake is reading pg_stat_user_indexes once and dropping the zeros. The right flow is slower and boring:

Reset stats at a known moment: SELECT pg_stat_reset();. (For a specific table use pg_stat_reset_single_table_counters(oid).) Note the moment. Let it run for at least one full business cycle. For most apps, a week. For systems with monthly batch jobs, a month. For systems with quarterly reports, a quarter.

Run the filtered candidate query. For each candidate, search application code, ORM repositories, and the job scheduler for column combinations that would use the index. False zeros happen when stats reset between uses.

Drop one at a time. Wait a week between drops. Monitor pg_stat_statements for plan changes on the affected table. If a query gets slower, recreate the index; the recreate takes a one-time hit, and the wrong drop is reversible.

The discipline that matters: drop slowly, and write down what you dropped. Six months later, when somebody asks "why is this report slow," your audit log is the difference between five minutes of investigation and an afternoon.

Replicas and DR

pg_stat_user_indexes only reports for the database it runs on. If you have read replicas serving production traffic, an index that looks unused on the primary may be heavily used on a replica running different queries. Run the audit on every node in the topology before dropping anything.

The failover question is sharper. If you fail over to a replica that runs a reporting workload, the indexes you dropped on the primary are gone from the replica too (DDL replicates). The right question before dropping is "where is this index used across the whole cluster, and what happens during failover?" The answer involves talking to whoever runs the replica's workload.

What it actually saves

Concrete numbers from a Postgres 16 cluster I cleaned up earlier this year. Heavy write workload, sustained at about 8,000 inserts per second across the day. An audit table with 14 indexes, of which 4 had been touched in the previous 30 days.

Dropping the 10 unused indexes, one per week over six weeks:

Average write latency on the audit table:  12 ms  -> 6 ms
WAL volume on the table:                  320 GB -> 180 GB / day
Autovacuum duration:                       28 min -> 9 min
Replica lag p95:                          200 ms -> 60 ms
Disk used by table + indexes:             240 GB -> 165 GB

The disk savings were a footnote. The real win was the write path getting twice as fast at the tail, and the replica lag dropping below the SLO with comfortable headroom.

Why this keeps happening

Indexes accumulate because the marginal cost of adding one is invisible at the moment of adding. The slow query gets faster. No dashboard shows write amplification. Replica lag bumps on insert-heavy paths show up two months later and get blamed on holiday traffic, a new feature, a recent dependency upgrade.

Make the discipline calendar-based. Every index added during a performance push gets a calendar entry to re-evaluate in 30 days. If idx_scan is still zero or trivially low, drop it. If it is in active use, keep it and update the documentation. Track each one. Re-evaluate. Drop on a schedule, not by accident.

Indexes are not a free-storage problem. They are debt with interest, paid on every write. The database is patient about that debt until the day it isn't, which is usually the day a write-heavy job pushes the cluster to the edge of its replica lag SLO. By then the cleanup is a panic. Do it now, before you need to.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 6 min read

Comments (0)