← Back to Blog

The NULL Trap

NULL is a three-valued logic system that corrupts aggregations, breaks NOT IN subqueries, and hides rows from WHERE clauses. It is not just a missing value.

We had a report that counted active users by country. It ran every Monday morning and fed into a dashboard the product team used to make decisions. Nobody questioned it. The numbers moved in the right direction week over week, which is usually enough to convince everyone that a report is working correctly.

Three months in, someone noticed that the total across all countries did not match the total from a different query that just counted all active users. The difference was not small. We were undercounting by about twelve percent.

The cause was a single NULL. Users who had not set a country were being silently excluded from every bucket. They did not land in an "unknown" bucket. They did not cause an error. They just disappeared. The query ran, returned numbers, and those numbers were wrong in a way that took three months to notice.

That is what NULL does. It does not fail loudly. It participates in your query and produces a result, and that result is subtly incorrect in ways that pass every eyeball test.

NULL in Aggregations

The most common version of the NULL trap is in aggregate functions. COUNT(*) counts every row. COUNT(column) counts only rows where that column is not NULL. Two different numbers, and SQL will not warn you that you picked the wrong one.

The same applies to SUM, AVG, MIN, and MAX. All of them silently skip NULL values. If you are computing average order value and some orders have a NULL discount applied instead of a zero discount, your average is computed over a smaller denominator than you think. The result looks plausible. It is wrong.

Consider this query:

SELECT AVG(discount_percent) FROM orders WHERE status = 'completed';

If ten percent of completed orders have NULL in discount_percent because no discount was applied, this query computes the average over the ninety percent of orders that have a value. It does not compute the average over all orders. You wanted zero to mean no discount. The database treats NULL as unknown and excludes it entirely.

The fix is explicit: use COALESCE to replace NULL with a sensible default before aggregating.

SELECT AVG(COALESCE(discount_percent, 0)) FROM orders WHERE status = 'completed';

Now NULL becomes zero, which is what you meant. The average is correct. But you have to know to write it that way, and the query without COALESCE produces no error to tell you that you should.

NULL in NOT IN Subqueries

This one is worse because it does not just skew numbers. It returns zero rows.

Suppose you want to find users who have not placed an order:

SELECT id FROM users WHERE id NOT IN (SELECT user_id FROM orders);

This query looks correct. It will return zero rows if orders.user_id contains a single NULL anywhere in the table. Not fewer rows. Zero.

Here is why. NOT IN expands to a series of inequality checks. For each user ID, the database checks whether it is not equal to each value in the subquery. If the subquery contains NULL, the check becomes id != NULL, which evaluates to NULL in SQL's three-valued logic, not TRUE or FALSE. NULL means unknown. Since the result is unknown, the row cannot be confirmed as matching, so it is excluded.

One NULL in a subquery poisons the entire NOT IN. You get zero results instead of the rows you expected. No error, no warning, no obvious indication that anything went wrong.

The fix is to use NOT EXISTS instead:

SELECT id FROM users u WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

Or filter out NULLs explicitly in the subquery:

SELECT id FROM users WHERE id NOT IN (
  SELECT user_id FROM orders WHERE user_id IS NOT NULL
);

Both work. I prefer NOT EXISTS because it makes the intent explicit and handles NULLs correctly by design. Using NOT IN with a subquery against a table where the column could be NULL is a bug waiting to ship.

NULL in Comparisons

The third trap is in WHERE clauses. NULL does not behave like a value in comparisons. It behaves like an unknown.

WHERE status != 'inactive' does not return rows where status is NULL. It returns rows where status is a known value that is not 'inactive'. Rows with NULL status are not confirmed as not-inactive because NULL means we do not know what the status is. The comparison evaluates to NULL, which is neither TRUE nor FALSE, and the row is excluded.

This matters more than it sounds. If you are writing an UPDATE or DELETE with a filter like WHERE column != 'value', you are not touching rows where that column is NULL. If you are writing a report that excludes certain categories, rows with NULL category are also excluded, silently.

UPDATE users SET active = false WHERE role != 'admin';

This does not deactivate users whose role is NULL. They are invisible to the WHERE clause. If that is not what you intended, you need to be explicit:

UPDATE users SET active = false WHERE role != 'admin' OR role IS NULL;

The same applies to equality. WHERE column = NULL never matches anything. NULL is not equal to NULL. You must use IS NULL, not = NULL. Most developers know this one, but the inequality version catches people far more often.

What NULL Actually Is

The problem with NULL is that it means too many things. It can mean the value is unknown, the value does not apply, the value was not collected, or the row was created before the column existed. SQL treats all of those the same way, which is the source of most of the confusion.

The three-valued logic that NULL introduces is mathematically consistent. The problem is that it does not match how most people reason about data. When a developer writes WHERE country != 'US', they are thinking in two values: either the country is the US or it is not. SQL adds a third option: we do not know. And the default behavior is to exclude the unknowns from results.

The practical advice is not to eliminate NULL from your schema. Sometimes NULL is genuinely the right representation for an unknown or inapplicable value. The advice is to be deliberate about it. When you write an aggregate, decide whether NULLs should be excluded or replaced with a default. When you write a NOT IN, use NOT EXISTS instead. When you write an inequality filter, decide whether NULL rows should be included or excluded and say so explicitly.

Our report got fixed with a two-line change. We added an ELSE 'Unknown' branch to the CASE expression that was bucketing users by country, so NULL landed somewhere visible instead of disappearing. The twelve percent showed up, the dashboard reflected reality, and the product team stopped making decisions based on a number that was quietly wrong.

Per SQL's rules, the query had been correct the entire time. It just was not doing what we thought it was doing. That is the NULL trap. No error, no warning, wrong answer.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 5 min read

Comments (0)