JSONB operators, GIN vs expression indexes, and the cost of JSONB over typed columns - all proved with EXPLAIN ANALYZE on 200k Postgres rows. Tests in the repo.

JSONB looks like the path of least resistance. You have a payload with a few fixed fields and a flexible blob of metadata. You drop the blob into a jsonb column, ship it, and move on. Six months later you are paginating that column. Then indexing it. Then updating it. The abstractions you assumed are all leaking.

This tutorial proves every claim about JSONB with EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) against a real Postgres 16 instance with 200,000 seeded rows split across two parallel tables (one JSONB, one typed columns). All code is in the companion repo: github.com/umur/postgres-jsonb-deep-dive. Clone it, run mvn verify, and watch all 23 assertions confirm what each section here claims.

How the Tests Prove Anything

Every test runs the same shape:

Execute EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ... against Testcontainers Postgres.
Parse the JSON plan tree.
Walk the plan and assert on whether any node uses an index, which index it picked, and how many shared buffers were touched.

The walker matters. Postgres parallelizes wide sequential scans into Gather > Seq Scan, so the top-level node is "Gather" even though the work underneath is a Seq Scan. Asserting only against the top node gives false positives. ExplainResult.usesIndex() recurses through the entire plan tree, looking for any Index Scan, Index Only Scan, or Bitmap Index Scan node.

The seed loads 200,000 products. Each row has structured columns (id, sku, category, price) plus a JSONB attributes blob. A second table, products_columnar, holds the same data flattened into typed columns. Brand has 7 distinct values (low cardinality, useful for GIN containment tests). Model has 1,000 distinct values (high cardinality, selective enough for B-tree expression indexes to be picked over a sequential scan).

The Operator Cheat Sheet

Four JSONB operators do most of the work. They look similar and behave very differently.

-- -> key returns the value at the key as jsonb.
SELECT attributes -> 'specs' FROM products LIMIT 1;
-- {"weight_kg": 2.5, "wifi": true, "battery_h": 8}

-- ->> key returns the value at the key as text.
SELECT attributes ->> 'brand' FROM products LIMIT 1;
-- 'acme'

-- @> structural containment: true when the right side is a subset of the left.
SELECT * FROM products WHERE attributes @> '{"brand":"acme"}';

-- ? key existence (escaped as ?? in JDBC).
SELECT * FROM products WHERE attributes ?? 'specs';

The choice between -> and ->> is not cosmetic. -> stays in JSONB, so you can chain into nested objects. ->> coerces to text, which is what you want when comparing with TEXT operators. The wrong choice usually shows up as a query that returns nothing or a query that does the wrong comparison.

Without a JSONB-aware index, every one of these triggers a sequential scan. The 200k-row table reads more than 1,000 buffers per query, every time. The whole point of GIN and expression indexes is to avoid that.

GIN vs Expression Index

Postgres has two ways to index JSONB. Each one serves a different shape of query.

GIN with default ops

CREATE INDEX products_attributes_gin
ON products USING GIN (attributes);

This is the general-purpose JSONB index. It supports @>, ?, ?|, and ?&. After creating it, WHERE attributes @> '{"brand":"acme"}' stops being a Seq Scan and becomes a Bitmap Index Scan over the GIN. The test JsonbGinIndexIT.containmentUsesGinIndex confirms this with an assertion that walks the plan tree.

GIN with jsonb_path_ops

CREATE INDEX products_attributes_path_gin
ON products USING GIN (attributes jsonb_path_ops);

This variant is smaller and faster on disk. The tradeoff: it only supports @>. If you query with ? or ?&, this index cannot help. When the planner chooses an index for key existence, it chooses the default GIN, never jsonb_path_ops.

B-tree expression index

CREATE INDEX products_model_expr_idx
ON products ((attributes ->> 'model'));

This is the trick most people miss. A B-tree on a JSONB extraction expression makes WHERE attributes ->> 'model' = 'MDL-0042' fast. GIN does not help with that query at all. The test textExtractionCannotUseGin creates a default GIN, then verifies that text equality through ->> still does a Seq Scan. The expression index, by contrast, gets the planner to choose an Index Scan and read fewer than 1,000 buffers for the same query.

One caveat: the expression index only helps when the key is selective enough. If you index (attributes ->> 'brand') and there are only 7 distinct brands, each filter returns 14 percent of the table - too many rows for an Index Scan to be cheaper than just streaming sequentially. The planner will correctly ignore your index. This is not a bug. Selectivity matters more than index existence.

JSONB vs Typed Columns

The interesting question is not "does JSONB work" - it does. The question is what you give up by choosing JSONB over typed columns. The two-table seed sets up an apples-to-apples comparison on identical data.

-- Indexed lookup, JSONB
SELECT * FROM products
WHERE attributes ->> 'model' = 'MDL-0042';

-- Indexed lookup, columnar
SELECT * FROM products_columnar
WHERE model = 'MDL-0042';

Each plan picks up its respective index. Where they differ is row width. The columnar table has narrow typed rows. The JSONB table has every row carrying the full attributes blob, which weighs in at a few hundred bytes per row. SELECT * reads the matched rows from each, but the JSONB table physically transports more data per row, and the Plan Width in EXPLAIN reflects this. The test selectStarShowsTheWidthCost asserts that JSONB Plan Width is strictly greater than columnar Plan Width on the same logical query.

The flip side is multi-key containment. WHERE attributes @> '{"brand":"acme","color":"red"}' hits the GIN index in a single lookup. The columnar equivalent needs separate indexes on brand and color (or a composite) and the planner decides whether to use them. JSONB containment is dense in expressive power; typed columns are dense in storage efficiency.

Updating JSONB Without Losing Writes

JSONB rows are not edited in place. Postgres rewrites the whole row on every update, which means three rules for updating JSONB safely.

Use jsonb_set for path-targeted updates

UPDATE products
SET attributes = jsonb_set(attributes, '{color}', '"red"')
WHERE id = 42;

This replaces the value at color and leaves every other field intact. The test jsonbSetReplacesField confirms that an update to color does not touch brand. For nested fields, the path is multi-segment:

UPDATE products
SET attributes = jsonb_set(attributes, '{specs,wifi}', 'true')
WHERE id = 42;

Use || to merge objects

UPDATE products
SET attributes = attributes || '{"color":"green","new_field":"x"}'::jsonb
WHERE id = 42;

The right side wins on key collisions. New keys get added. This is the easy way to apply a partial patch.

Avoid app-side read-modify-write

The dangerous pattern: read the JSONB into your app, mutate the parsed object, write the whole thing back. JsonbUpdateIT.appSideReadModifyWriteLosesData reproduces the failure:

Transaction A reads attributes from the row.
Transaction B updates in_stock in the same row via jsonb_set.
Transaction A writes its modified version back, which still contains the original in_stock value.
B's update is silently overwritten.

The fix is to do the update on the database side with jsonb_set or ||, not in app code. Concurrent updates to different fields can both succeed when the database does the merging. They cannot when the application does.

The Pitfalls That Bite

Five JSONB pitfalls that show up over and over in production.

1. ->> returns text, not the underlying type

Sorting by attributes ->> 'rating' sorts as text. The string '9' is greater than '10'. To get numeric ordering you have to cast:

ORDER BY (attributes ->> 'rating')::numeric DESC

Same for comparisons. WHERE attributes ->> 'rating' > '4' compares strings. WHERE (attributes ->> 'rating')::numeric > 4 compares numbers. The two queries are not equivalent.

2. GIN cannot serve text equality

This is the trap that costs the most time. You have a GIN index on attributes. You write WHERE attributes ->> 'brand' = 'acme'. You assume the GIN helps. It does not. Text equality through ->> bypasses GIN entirely. Either rewrite the query as containment (attributes @> '{"brand":"acme"}') or add a B-tree expression index for the specific key.

3. Containment is type-sensitive

-- These two are NOT the same.
SELECT * FROM products WHERE attributes @> '{"specs":{"wifi":true}}';
SELECT * FROM products WHERE attributes @> '{"specs":{"wifi":"true"}}';

The first matches the seed because wifi is stored as a boolean. The second matches nothing. JSONB containment compares values structurally, including their type. The test containmentIsTypeSensitive asserts both behaviors.

4. Missing keys return NULL

attributes -> 'missing_key' returns SQL NULL. attributes ->> 'missing_key' also returns NULL. WHERE attributes ->> 'missing' = 'something' excludes the row, which is usually what you want. WHERE attributes ->> 'missing' IS NULL matches both genuinely missing keys and keys whose value is JSON null. If the difference matters, use attributes ?? 'key' for explicit existence checks.

5. The ? operator collides with JDBC parameters

Postgres uses ? for key existence. JDBC uses ? for positional parameters. From a JdbcTemplate, escape it as ??:

jdbc.queryForObject(
    "SELECT COUNT(*) FROM products WHERE attributes ?? 'specs'",
    Long.class);

The Postgres JDBC driver translates ?? back to ? on the wire. This applies to ?| and ?& too. Forget it once and you get No value specified for parameter 1.

When to Reach for JSONB

JSONB is the right answer when the schema is genuinely sparse, the keys are dynamic, or the structure is decided by clients (incoming webhook payloads, external API responses, user-defined fields). Stable fields that never vary belong in typed columns. The clearest sign you should not have used JSONB: every query in your codebase extracts the same three keys with ->>. Promote them. Index them. Save the JSONB for the actually variable parts.

The companion repo proves all of this with EXPLAIN ANALYZE assertions. Clone it, run mvn verify, change the queries, watch the plans. The fastest way to learn JSONB is to run the same query against indexed and unindexed tables and see exactly what the planner does.

PostgreSQL JSONB Deep Dive: Operators, Indexes, and the Tradeoffs Nobody Told You About