Preface
I didn’t set out to write a book about Postgres. I set out to figure out why a service that flew through ten thousand requests per second on Monday was paging me at three in the morning on Wednesday. Same code. Same hardware. Same query plan, even. The only thing that had changed was the database. Or rather, the database had changed without telling me. Vacuum had fallen behind. Bloat had crept in. The planner’s idea of the world no longer matched the world. By the time I understood what had happened, I’d learned more about Postgres than I’d thought a backend engineer needed to know. It turns out we all do.
That’s the story behind this book. Not theory. Not a tour of every Postgres feature. A focused build-up of the mental model that separates expert Postgres users from everyone else, organized around a single idea you can hold in one hand: MVCC changes everything.
What This Book Is
A practical, opinionated guide to PostgreSQL for engineers who already use it. Twenty-eight chapters that build on each other, each anchored to a real cinetrack workload, each example written in pure SQL. The book covers the parts most teams discover the hard way: how MVCC actually works, how the planner thinks, why connections are expensive, what vacuum is doing in the background, why your replica falls behind, and what to do about all of it.
The goal is not to teach you everything Postgres can do. That’s the documentation, and it’s longer. The goal is to teach you the model that makes the documentation useful, the handful of patterns that pay back every week, and the operational habits that keep your database from waking you up.
Who This Book Is For
Senior backend engineers, tech leads, and SREs who use Postgres daily. You know how to write a join, you’ve run EXPLAIN, and you’ve been on the wrong end of a deadlock at least once. You don’t need to be a Postgres internals expert. You do need to be willing to read a query plan honestly, even when it disagrees with your hunch.
This is not an introduction. If you’ve never written a SQL query, please put this book down and pick up something gentler.
How to Use This Book
Part I builds the foundation: MVCC, storage, WAL, processes and memory, indexes from first principles. Read it in order. The rest of the book stops making sense if you skip Part I.
Part II is querying like an expert. Part III is concurrency and the production patterns that follow from MVCC. Part IV operates Postgres at scale: replication, HA, backups, tuning, capacity, and the runbooks that close the book.
Every example uses pure SQL against the cinetrack dataset, the same movie-and-reviews schema used in the author’s other books. There is no application framework code. The book is the database side. Framework concerns belong elsewhere.
A Note on Postgres Versions
Examples target Postgres 17 by default. Behavior that differs in 15 or 16 is called out in callouts. Where Postgres 18 introduces something stable enough to mention, it appears as a forward-looking note, not a default.
The Code
Each chapter has a companion project in the db-book-example repository. The base setup is a Postgres-only docker-compose.yml with the cinetrack schema and a MovieLens 25M load. Chapters that need extra parts (pgbouncer, replicas, Patroni, pgBackRest) add their own containers in their snapshots.
git clone https://github.com/umur/db-book-example
cd db-book-example/chapter-01
docker-compose up -d
psql -h localhost -U cinetrack -d cinetrackReproducibility is the contract. Pinned versions. Pinned data. If you can’t rerun the queries in this book against the seed, the book isn’t doing its job.
Acknowledgments
The bugs are mine. The good ideas are everyone else’s, often without attribution because I forgot where I first heard them. If you recognize one of yours, please write to me; I would like to credit you in the next edition.
1 The Postgres Way

1.1 Overview
“One size does not fit all.”
Michael Stonebraker, One Size Fits All: An Idea Whose Time Has Come and Gone (2005)
A team I know shipped what they called a “small Postgres upgrade.” The schema was the same. The indexes were the same. The traffic was the same. They added one column and pushed it on a Tuesday afternoon. By Wednesday morning the read replicas were thirty minutes behind, the primary was burning IOPS like firewood, and one query that used to take 12ms was timing out at 30 seconds. Nothing in the application had changed. Postgres had done what Postgres always does: kept old row versions around until vacuum could prove nobody needed them, written every modification to the WAL, and rebuilt the planner’s view of the world from statistics that no longer matched reality.
They didn’t have a Postgres bug. They had a Postgres model gap. The database was behaving exactly as designed, and the design assumed they understood it. They didn’t. Most teams don’t.
We’re not going to tour features. We’re going to look at the two architectural decisions that make Postgres act the way it does, the philosophy that makes it the database people keep choosing, and the cinetrack sandbox that the rest of the book runs on.
By the end of this chapter, you’ll understand:
- What “expert Postgres” means and why it’s a different skill than “Postgres user”
- The two big architectural bets (MVCC and process-per-connection) that shape every chapter ahead
- Why “Postgres has X” is true for almost any X, and what extensibility costs you
- How this book is structured, and how to use the cinetrack sandbox you’ll write queries against
- Which mental models to drop on day one if you came from MySQL, Oracle, or SQLite
1.2 What makes Postgres different
Most engineers learn one relational database first and assume the rest are dialects. They aren’t. The lineage of MySQL, Oracle, SQL Server, SQLite, and Postgres splits at architectural decisions made decades ago, and those decisions still leak through every query you write. If you carry MySQL instincts into Postgres, you’ll get bitten. If you carry Oracle instincts into Postgres, you’ll get bitten in different places. If you carry SQLite instincts, you’ll get slower than you need to be.
1.2.1 Postgres is not MySQL with more types
MySQL teams reach for Postgres expecting a familiar engine with a richer column catalog. They get something else. MySQL’s InnoDB stores rows by clustering them around the primary key, which makes primary-key reads fast and secondary-index reads a hop slower. Postgres has no clustered index. Every table is a heap. The primary key is a B-tree pointing into the heap, identical in shape to any other index. A heap is a pile of rows in no particular order on disk. Think of it as a warehouse where every box has a coordinate (ctid), and indexes are forklifts that know which coordinate to drive to.
That single difference cascades. In MySQL, a covering query on a secondary index still has to follow pointers if the index isn’t covering, but it has the primary key clustered nearby. In Postgres, every index lookup may force a heap visit unless the visibility map says it can skip one. The visibility map is a small bitmap per relation that records which pages contain only tuples visible to all transactions. Index-only scans in Postgres are conditional, not free.
The bigger gap is concurrency. Postgres uses snapshot-based MVCC at every isolation level, including READ COMMITTED and REPEATABLE READ; readers see a consistent version of the world without taking range locks. MySQL’s InnoDB at its default REPEATABLE READ reaches for gap locks on index ranges to prevent phantoms, which changes how concurrent inserts and updates contend. Postgres still lets lost updates happen under READ COMMITTED, the same way Oracle and InnoDB do; you reach for SERIALIZABLE or explicit row locking when you need to rule them out. The MySQL playbook for “how to handle concurrent updates” does not transfer.
1.2.2 Postgres is not Oracle with worse tooling
Oracle veterans show up looking for the equivalents of materialized view logs, autonomous transactions, and MERGE with WHEN NOT MATCHED BY SOURCE. Some of those map. Some don’t. Postgres has materialized views but they don’t refresh incrementally on their own. It has MERGE (since 15) but historically you reached for INSERT ... ON CONFLICT, the upsert, which has better semantics for many cases anyway.
The deeper Oracle expectation that breaks is the undo log. Oracle keeps old versions in a separate tablespace and lets vacuum-like cleanup happen invisibly. Postgres keeps old versions in the table itself, marked dead, and waits for a process called autovacuum to walk through and reclaim them. Autovacuum is a background worker that scans tables, marks dead tuples reusable, and updates statistics. The reader who has tuned Oracle for twenty years will, on Postgres, feel the operational tax of MVCC sooner than expected.
1.2.3 Postgres is not SQLite with a network port
SQLite is a beautiful single-process database where the whole file is the database. It uses a write-ahead log and a single writer at a time, and it scales to a remarkable amount of work for a process-embedded engine. People use it as a mental model for “small Postgres,” and that model leads them to write code that doesn’t pool connections, doesn’t think about concurrency, and runs INSERT in a loop without a transaction.
Postgres is multi-process and multi-connection. Every connection is its own OS process. Every transaction sees a consistent snapshot of the data, isolated from other in-flight transactions. You can have a hundred writers in flight if you can afford the connections, and the answer to “can I afford the connections” is almost always “fewer than you think.”
1.2.4 The “Postgres has X” pattern
Ask a Postgres veteran whether the database supports something. The answer is almost always yes, possibly through an extension. JSON queries with indexes? jsonb plus GIN. Time-series? pg_partman or TimescaleDB. Geographic data? PostGIS. Vector search? pgvector. Full-text search? Built in. Custom types, operators, and index methods? All first-class.
This is not an accident. Postgres ships with an extension API that lets new types, operators, index access methods, and procedural languages plug in as first-class citizens. When someone says “Postgres can do that,” they’re describing the database’s culture, which is to absorb features the way a sponge absorbs water.
1.2.5 What Postgres does not do well
The honest version of “Postgres has X” comes with a list of things Postgres does badly, and pretending otherwise will get you in trouble:
- High-throughput OLTP at extreme write rates with single-node writes is hard. The WAL is one log. The primary is one process tree. There is no native multi-master, despite a decade of half-attempts. If your sustained write rate is in the tens of thousands of write transactions per second on a single primary, depending on hardware and tuning, you’re at the edge of what one Postgres node does, and you should know that going in.
- Cross-shard transactions are not a thing in core Postgres. Citus exists, FDW-based sharding exists, and both make trade-offs.
- Connection scaling is a known weakness. Each connection is a process; thousands of connections is thousands of processes. pgbouncer is mandatory in any production deployment of meaningful scale, and that requirement tells you something about the architecture.
- Schema-on-read flexibility the way some NoSQL stores do it is not idiomatic. JSONB gets you most of the way there, but the planner still works best on typed columns.
A book that pretends Postgres is the right answer to every problem is a marketing pamphlet. This isn’t. We’ll call out the cliffs as we walk past them.
1.3 The MVCC bet
Most database design comes down to a question: when two transactions touch the same row at the same time, who waits? POSTGRES had a no-overwrite store as far back as the mid-80s. Snapshot-based MVCC, multi-version concurrency control as we know it today, arrived in version 6.3 in 1997, with the write-ahead log following in 7.1 in 2001. The combination of those two changes is what shapes every line of source code since. Readers don’t block writers. Writers don’t block readers. Each transaction sees a consistent snapshot of the database that nobody else can disturb. The price is that the database keeps multiple versions of every row long enough for every active reader to find the version it needs.
That’s a sentence to sit with. Multiple versions of every row. Not occasionally, as a side effect. Always, by design.
1.3.1 What MVCC actually is
Picture a row in a Postgres table the way you’d picture a parking spot. When you UPDATE the row, Postgres doesn’t repaint the spot. It draws a new spot next to it, parks the new version there, and puts a sign on the old spot saying “expired as of transaction 12345.” The old spot stays there until every transaction that started before 12345 has either committed or died. Only then can vacuum come along, paint over the sign, and reuse the asphalt.
In Postgres terms: an UPDATE is an INSERT of a new tuple plus a flag on the old one saying “deleted by transaction X.” The old tuple is dead but visible to anyone whose snapshot started before X committed. The new tuple is alive but invisible to anyone whose snapshot started before X committed. Each tuple carries a small header with two transaction IDs: xmin (the transaction that created it) and xmax (the transaction that deleted or updated-out-of-existence it). The combination of those IDs and a snapshot tells the database which version any given reader sees.
You can see this directly. Every Postgres table exposes the system columns to a query that knows to ask:
SELECT xmin, xmax, ctid, id, body
FROM reviews
LIMIT 5;xmin is the inserting transaction. xmax is usually zero on a row that has never been touched by a writer or locker; SELECT FOR UPDATE and SELECT FOR SHARE can stamp xmax on a still-live row as a lock marker. When multiple sessions hold a lock on the same row at once, xmax actually stores a MultiXact ID, a small structure that lists all the locking transactions. ctid is the physical location of this version: a (page, offset) pair that points into the heap.
1.3.2 Why this is the bet that defines everything
MVCC is the foundation. Once you accept that every row can have multiple physical versions, the rest of Postgres becomes inevitable:
- Indexes have to point at versions, not rows. Each index entry references a
ctid. If a row gets a new version, that’s a new heap entry, and every relevant index has to learn about it. This is why heavy update workloads bloat indexes as much as they bloat tables. - Vacuum is mandatory. Old versions don’t clean themselves up. A process has to scan the heap, identify dead tuples that no transaction can still see, and mark their space reusable. Without vacuum, the database fills with corpses. Autovacuum exists to keep this from being your job.
- Replication is built on the WAL of versions, not the WAL of statements. Because each modification creates a physical tuple version, the write-ahead log captures physical changes, and replicas replay them as physical changes. This is why Postgres replication is fast and identical-byte-for-byte, and why logical replication needs a separate decoder to translate physical changes back into row-level events.
- Long transactions are toxic. A transaction that started two hours ago is still holding a snapshot. Vacuum cannot reclaim any tuple version that transaction might still need to see. A single long-running query on a busy table can prevent autovacuum from making progress, with bloat scaling with the duration of the open snapshot.
- Locking changes shape. Readers don’t take row locks for normal reads. Writers don’t block readers. The contention surface narrows to writer-writer conflicts and to the explicit locks you ask for.
Once you internalize MVCC, every “weird” Postgres behavior makes sense. Why does my table grow even though the row count is flat? Bloat. Why is my replica falling behind? WAL volume from version churn. Why did my UPDATE on a table with twelve indexes take 30ms instead of 3ms? Every index had to learn about the new version. Why is count(*) slower than you’d expect on a churning table? Even with an index-only scan, the visibility map has to certify every page as all-visible to skip the heap, and a write-heavy table keeps invalidating those bits.
1.3.3 The cost of the bet, stated honestly
MVCC isn’t free. The cost shows up in three places:
- Space. Old versions occupy disk until vacuum reclaims them. A table with a steady update rate runs perpetually larger than the live row count suggests.
- CPU. Vacuum is real work. Visibility checks during reads are real work. Multi-version row headers are bigger than single-version row headers would be.
- Operational complexity. Autovacuum tuning is a real skill. Long-running transactions are a real risk. The transaction-ID counter wrapping around is a real thing that has taken real production databases offline.
Important
The single most common production failure mode I see in junior Postgres deployments is “vacuum couldn’t keep up.” Not deadlocks, not slow queries, not bad indexes. Vacuum lag. If you remember one operational rule from this book before you finish Chapter 1, remember that autovacuum is not optional, and the default settings are tuned for a database from 2008.
1.3.4 Other databases bet differently
Oracle uses MVCC too, but stores old versions in undo segments rather than inline in the table. SQL Server has a switchable model. MySQL InnoDB uses MVCC but the implementation differs in important ways that make MySQL feel different at the seams. The point isn’t that Postgres is unique in using MVCC; it’s that Postgres’s flavor of MVCC, with old versions inline in the heap and a vacuum process to clean them, is the architectural decision that drives the rest of the design.
Postgres treats row versions as first-class objects on disk and lets background work clean up later, in exchange for keeping reads and writes out of each other’s way at runtime. Every chapter that follows is downstream of that choice.
1.4 Process per connection
Postgres makes a second architectural bet that’s almost as consequential as MVCC, and almost nobody discovers it until it bites them. Every database connection is its own operating-system process. Not a thread. Not a fiber. A fork(), a full address space, a place in the kernel’s process table.
When you open a connection from your application, here is what actually happens. Your TCP packet arrives at the postmaster, a long-running parent process that does nothing except accept new connections. The postmaster fork()s a child immediately and hands the socket off; that child, your backend, is the one that runs ClientAuthentication and decides whether to let you in. Once you’re authenticated, the backend owns your session, parses your queries, executes them, and eventually exits when you close the connection. The postmaster never serves your query. It hands you off to a child and goes back to listening.
That model is older than Linux. Postgres inherited it from POSTGRES95, which inherited it from the Berkeley research code that predates the modern multi-threaded database. It has stuck around because it’s simple, durable under load, and isolates failures: a backend that crashes does not take down the database. The postmaster restarts everything else and life goes on.
Simple, durable, and expensive.
1.4.1 The cost of a connection
Spinning up a connection is not free. A new backend process needs:
- An OS process and its kernel data structures (file descriptor table, signal mask, page tables)
- Roughly 5 to 10 megabytes of resident memory (RSS) before it does any real work, much of which is shared via copy-on-write with the postmaster; unique private memory per backend is typically a few MB
- A piece of shared-memory state to participate in the buffer cache and lock manager
- A
PGPROCslot in shared memory’sProcArray, whose size is fixed at server start bymax_connectionsplus auxiliary slots
Multiply that by your connection count. A thousand idle Postgres connections is on the order of 10 GB of process footprint, plus a thousand entries in the OS run queue. The kernel’s context-switch overhead between thousands of mostly-idle processes is real and measurable. I’ve personally watched a Postgres server with 4,000 idle connections spend 40% of its CPU on context switches before any query ran.
Compare this to MySQL or SQL Server, where connections are threads inside a single process and idle connections cost a kilobyte and a stack page. Or to a stateless service behind a connection pool, where 10,000 client connections fan into 50 server connections. The Postgres model assumes you do not have 10,000 backends. It cannot handle 10,000 backends. It is built for tens to low hundreds of active backends, and the gap between client connections and server backends is filled by something external.
1.4.2 Why this matters before you write a single query
Most teams discover the connection problem during a traffic spike. The application’s connection pool was sized assuming connections are cheap. Postgres is happily serving 200 backends, then 400, then 800, and somewhere around 1,500 the box runs out of memory or max_connections and the application starts seeing “remaining connection slots are reserved” errors.
The fix is not to raise max_connections. The fix is connection pooling. pgbouncer is the de facto solution: a tiny, fast process that sits between your application and Postgres, multiplexing many client connections onto a small number of database backends.
A few numbers worth keeping in your head:
- A working Postgres backend sits around 5 to 10 MB of resident memory before it does anything useful.
- Active backends should fit under your CPU count times some small factor.
- Client connections can be in the thousands. Server backends should be in the dozens.
The mismatch is what pgbouncer fixes. Pretending the mismatch doesn’t exist is what page-3 incidents are made of.
1.4.3 Per-backend memory adds up
The backend’s baseline cost is the floor. Each backend also reserves memory for its own work:
work_memfor sorts and hash operations within a querytemp_buffersfor temporary tablesmaintenance_work_memfor index builds and vacuum
A work_mem of 64 MB sounds modest. Multiply by 200 active backends running multi-stage queries, where each query node can use work_mem independently, and you can theoretically commit gigabytes of memory beyond what shared_buffers reserves. The point now is that “I’ll just bump work_mem” is one of those settings that looks innocent and explodes under concurrency.
-- A live snapshot of every backend the server is currently running.
-- One row per process. The `state` column tells you whether the backend
-- is doing work or holding open an idle connection.
SELECT pid, usename, application_name, state, query_start
FROM pg_stat_activity
WHERE backend_type = 'client backend'
ORDER BY backend_start;Run that on any production database. You’ll see the shape of your real connection load: how many backends are active, how many are idle in transaction (the worst kind, because they hold snapshots and prevent vacuum), and how many are idle.
Warning
“Idle in transaction” is the most dangerous backend state in Postgres. It means a connection has an open transaction, has stopped doing work, and is still holding a snapshot. Every old row version those snapshots can see is locked away from vacuum. Set idle_in_transaction_session_timeout to a sensible value in production. The default of zero means forever, and forever is a long time.
1.4.4 What this enables
The process model is not all cost. Process isolation is one of the reasons Postgres is famously hard to crash. A backend that hits a bug, runs out of memory, or trips an assertion dies on its own. The postmaster notices, terminates the affected children to clean shared state, and brings the system back. Other databases that share more state across worker threads can fail in louder, more correlated ways.
The model also makes background workers a clean concept. Autovacuum workers, the WAL writer, the checkpointer, and logical replication apply workers are all separate processes with well-defined lifetimes. You can see them in ps aux | grep postgres on any running server, and that maps cleanly to the pg_stat_activity view.
The process model is a deliberate trade. Postgres bought isolation and crash safety; it sold cheap connections. The book treats this as a fact about the world, not a defect to apologize for. We design around it.
1.5 Extensibility as philosophy
I once joined a team whose “simple” Postgres install pulled in 14 extensions. Ten were essential. Two were holding the upgrade hostage. That tension, between Postgres’s ability to absorb anything and the operational tax of doing so, comes straight from the original 1986 POSTGRES research paper at Berkeley, which argued that databases should be extensible by users, not just by the vendor. Stonebraker’s bet was that nobody could anticipate every type, operator, or index method a real domain would need, so the database should let people add them without recompiling the engine. Forty years later, that bet is what gives Postgres its character, and its sharpest edges.
You don’t see this from the outside as a design philosophy. You see it as a feature catalog that keeps growing. Geographic types. Time-series partitions. Vector embeddings. Full-text search. JSON path queries. B-tree, Hash, GIN, GiST, SP-GiST, BRIN, and Bloom indexes (Bloom ships as a contrib extension). Foreign tables that read from MySQL, Redis, S3, or another Postgres. Custom procedural languages so you can write functions in Python, Perl, or JavaScript. Almost none of those started in the core engine. They started as extensions, became popular, and either stayed as extensions or got absorbed into core.
1.5.1 The four extension surfaces
Extensibility in Postgres has a specific technical meaning. There are four places where an extension can plug into the database without forking it:
- Types.
CREATE TYPElets you define a new column type with its own storage layout, input/output functions, and operators. PostGIS addsgeometry.pgvectoraddsvector. JSON started life as a type before becoming a default. - Operators and functions.
CREATE OPERATORandCREATE FUNCTIONextend SQL itself. The&&overlap operator on ranges. The<->distance operator on vectors. None of those are core SQL; they’re all defined operators that the planner treats as first-class. - Index access methods. This is the one that surprises people. Postgres lets extensions register entirely new index implementations through the index access method API. Bloom filters, GiST for spatial data, GIN for inverted indexes. The planner uses them through the same machinery as the built-in B-tree.
- Procedural languages.
PL/pgSQLis the default language for stored procedures, butPL/Python,PL/Perl,PL/V8(JavaScript), and others slot in through the procedural language API. Functions in any of them are first-class to the planner.
Together those four surfaces explain why someone can build PostGIS or TimescaleDB or pgvector without changing one line of Postgres core. The core ships with hooks. The community fills them.
-- Loading an extension is one statement.
-- Most extensions are header-and-shared-object packages installed via the OS,
-- then enabled per-database with CREATE EXTENSION.
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS btree_gin;pg_stat_statements records query latency by query template. pg_trgm adds trigram-based similarity matching. btree_gin lets you build GIN indexes on scalar types. Three lines. Three superpowers.
1.5.2 The trade-off nobody talks about
Extensibility is not free, and the book will not pretend otherwise.
The first cost is operational surface area. Every extension is a shared library loaded into the Postgres backend process. A buggy extension can crash the database. A poorly-written extension can leak memory or hold locks longer than expected. RDS and Cloud SQL deliberately restrict which extensions you can install, because the cloud provider doesn’t want a third-party shared object running in their fleet. If you’re targeting RDS, check the supported extensions list before you architect around one.
The second cost is upgrade friction. Most major Postgres upgrades require a corresponding update to every extension. If your stack depends on PostGIS, your upgrade path is gated by PostGIS releasing a compatible version. The cinetrack stack in this book uses only extensions that ship with the core distribution and a small set of well-maintained ones.
The third cost is mental load. The same problem can be solved three ways, and “what’s idiomatic” is harder to answer than in databases that ship with one solution. Full-text search is a great example: built-in tsvector plus GIN, the pg_trgm extension, or an external Elasticsearch. All three work. Two of them are in the book.
1.5.3 What you should install on every cinetrack database
These three earn their keep on almost any production Postgres:
pg_stat_statements: aggregated query latency and call counts per query template. The single most useful extension.pg_trgm: trigram similarity for fuzzy text search andLIKE '%foo%'acceleration.btree_ginorbtree_gist: lets you build composite indexes that mix scalar columns with arrays, ranges, or JSON. Helpful for surprising queries.
If you’re targeting a self-hosted server, also install pg_repack for online table reorgs. On managed services, check what’s available before you depend on it.
Tip
You can ask the running database which extensions are loaded versus available, with one query:
SELECT
e.extname,
e.extversion AS installed,
a.default_version AS available
FROM pg_extension e
JOIN pg_available_extensions a ON a.name = e.extname;1.5.4 Extensibility shapes how the book is structured
The book treats core Postgres and the most-installed extensions as one stack. JSONB and GIN are inseparable. Trigram search is covered. Monitoring without pg_stat_statements is monitoring with one eye closed. The book never asks you to install something obscure. It assumes you’ll install the boring, battle-tested set.
The flip side: the book does not cover every cool extension. pgvector is out of scope. PostGIS gets mentioned but isn’t a dependency. The line is drawn at “extensions every working Postgres engineer should know.”
1.6 How to read this book
The book has four parts. They build on each other in a specific order, and skipping ahead is more dangerous in this book than in most.
1.6.1 What each part assumes
Part I: Foundations. Chapters 1 through 6. MVCC, storage, the WAL, processes and memory, indexes from first principles. This is the model the rest of the book runs on. Skip Part I and the chapters on vacuum, replication, and concurrency stop making sense. Read it in order.
Part II: Querying like an expert. Chapters 7 through 14. The planner and statistics, reading EXPLAIN, B-tree strategy, specialty indexes (GIN, GiST, BRIN, hash), joins and rewrites, JSONB at scale, full-text search, and schema design with partitioning. This part assumes Part I.
Part III: Concurrency and production patterns. Chapters 15 through 21. Isolation, locking, the patterns expert teams reach for instinctively (SKIP LOCKED, advisory locks, optimistic concurrency, idempotency), Postgres-as-a-queue and the outbox, vacuum and bloat, connection pooling, monitoring, and zero-downtime migrations. The book stops being about “how Postgres works” and starts being about “how do I keep this thing running.”
Part IV: Replication, HA, and operations. Chapters 22 through 28. Streaming replication, logical replication, Patroni for HA, pgBackRest for backups, tuning, capacity planning, and the runbook collection that closes the book. Part IV reuses MVCC vocabulary directly.
The book ends on a runbook chapter, because at that point you have a working operational vocabulary and a model that lets you reason about the failures you haven’t met yet.
1.6.2 How the chapters are structured
Every chapter is the same shape. Hero image, overview, substantive subchapters that each end on a hook into the next, a common-mistakes section, and a summary whose last bullet previews the next chapter. Code is interleaved with prose, never separate. Every code snippet is pure SQL. No Spring, no JDBC, no JPA, no Hibernate. If your application uses an ORM, the book is the database side of that conversation.
1.6.3 The cinetrack sandbox
The whole book runs against one example schema. Cinetrack is a movie-review and watchlist application: users rate movies, write reviews, build watchlists, and trigger view events. It’s the same domain used in the author’s other books, so the queries here can sit alongside the application-level chapters in the Spring Boot books.
Every chapter has a self-contained code project in the db-book-example repository under chapter-NN/. Each project includes a docker-compose.yml for Postgres (and, in later chapters, pgbouncer, replicas, Patroni, or pgBackRest), an init.sql, a seed.sql, and a queries.sql of annotated examples.
The chapter-1 sandbox is the smallest version: about a hundred movies, fifty users, five hundred ratings, fifty reviews. Subsequent chapters extend it. Chapter 10 needs a hundred million view_events rows for BRIN demos; Chapter 14 needs partitioned tables; Chapter 26 needs a WAL archive. Each chapter’s README.md says what’s new.
git clone https://github.com/umur/db-book-example
cd db-book-example/chapter-01
docker compose up -d
psql -h localhost -U cinetrack -d cinetrack -f init.sql
psql -h localhost -U cinetrack -d cinetrack -f seed.sqlThree commands, under a minute on a fresh Docker image. You’re now sitting at a psql prompt in front of a working cinetrack database, and every query in this chapter runs from here.
1.6.4 Your first MVCC peek
Run this:
-- Look at the system columns Postgres exposes on every row.
-- xmin: transaction that inserted this version.
-- xmax: transaction that deleted/updated this version (0 if alive).
-- ctid: physical location, formatted as (page, offset).
SELECT xmin, xmax, ctid, id, user_id, movie_id
FROM reviews
LIMIT 5;You’ll see xmax = 0 on every row, because no review has been deleted or updated. Now update one:
UPDATE reviews SET body = body || ' (edited)' WHERE id = 1;
SELECT xmin, xmax, ctid, id
FROM reviews
WHERE id = 1;The xmin is now your update transaction. The ctid changes too: a new tuple is always written at a new physical location. Postgres has an optimization called HOT (Heap-Only Tuple) that, on a small update with free space on the same page, skips writing new index entries; the index keeps pointing at the old line-pointer slot, which redirects to the new tuple. Either way, the row was not edited in place. The old version is still there, marked with the deleting xmax, and vacuum will reclaim it later.
1.6.5 How to use the code companion
A few habits make the companion repo earn its keep.
Rule one: actually run the queries. A book on Postgres that you read without running queries is a book on something else. Every snippet in the chapter prose is in queries.sql, copy-paste-runnable, with comments that match the chapter’s narration.
Rule two: read the EXPLAIN output. Most production Postgres mistakes are made by people who don’t read query plans. Run EXPLAIN on every query the book asks you to run, even if we haven’t formally introduced it yet. You’ll start absorbing the shape.
Rule three: break things. The cinetrack sandbox is yours. Drop indexes. Insert ten million rows. Hold a transaction open and watch vacuum stop working. The book will ask you to do all of these eventually. The faster you start treating the database as something to experiment on, the faster expert Postgres becomes a reflex.
Note
The cinetrack code lives in a public companion repo. Each chapter snapshot is independent. You won’t accidentally contaminate later chapters by experimenting in earlier ones.
1.7 Common Mistakes
Treating Postgres as MySQL with more types. The schema migrates fine. The mental model doesn’t. Postgres has no clustered index, every table is a heap, and concurrent writes resolve through MVCC rather than gap locks. The query you tuned for InnoDB is not the query you should run on Postgres.
Assuming connections are cheap. They aren’t. Each connection is an OS process with an RSS baseline of 5 to 10 MB (much of which is shared via copy-on-write; unique private memory is typically a few MB). A pool sized for “lots of headroom” will eat a server before it serves traffic. Production Postgres always has a connection pooler in front of it.
Leaving long transactions running. A transaction that’s been open for hours is preventing vacuum from reclaiming any tuple version those snapshots can see. The table grows. The replicas fall behind. Set
idle_in_transaction_session_timeout. Setstatement_timeout. Stop trusting clients to close their work.Skipping autovacuum tuning. The default autovacuum settings are tuned for a database from the early 2010s. On any modern workload with sustained write traffic, the defaults are too conservative. The fix is per-table tuning. The mistake is assuming “Postgres handles vacuum automatically” means “I don’t need to think about it.”
Running raw
psqlagainst an unpooled production database. It’s a one-line tool that will happily open a backend that survives until you exit. People leave them intmuxsessions, forget about them, and find them weeks later still holding connection slots. Use a pooler, or usepsql --no-psqlrc -c 'SELECT ...'and exit immediately. The expert pattern is to set timeouts on the connection itself:PGOPTIONS="-c statement_timeout=30s -c idle_in_transaction_session_timeout=60s" psql --no-psqlrc -c 'SELECT ...'. Never leave a long-running interactive session connected directly to the primary.Believing “Postgres has X” means X is free. It usually does. It usually isn’t. Every extension is a shared object running inside your backend; every feature has an operational cost. Pick the smallest set of extensions that solves your problem. Audit them on every upgrade. Postgres being able to do anything is the same property that lets you accidentally architect yourself into a corner the upgrade path doesn’t fit through.
1.7.1 Summary
- Postgres is not MySQL with more types or Oracle with worse tooling. Different lineage, different concurrency model, different physical layout. The skills you brought from another database transfer about half the time and bite you the other half.
- MVCC is the bet that defines everything. Every row can have multiple versions on disk. Readers and writers don’t block each other. The price is dead-tuple cleanup, the operational tax of vacuum, and a planner that has to think about visibility on every read.
- Process per connection is the other defining bet. Each connection is an OS process. Connections are expensive. Production Postgres uses pgbouncer.
- Extensibility is a culture, not a feature. Custom types, operators, index methods, and procedural languages plug in through documented APIs. “Postgres has X” is almost always true; the cost is operational surface area and upgrade gravity.
- The book is one path, not a reference. Read Part I in order. The cinetrack sandbox is yours to break. Run every query. Read every plan.