← Back to Blog

// Posted by Umur Inan

// Category Practices

// Posted on May 8, 2026

The Test That Passes When Prod Is Broken

Your unit tests are mostly testing your mocks. When the mock drifts from the real dependency, tests pass and prod breaks. The fix is harder than it looks.

By Umur Inan · 6 min read

Our entire test suite was green. 4,200 tests, all passing. The deploy went out at 11 AM. By 11:15, customer support was getting calls. The new payment integration was charging users, marking the order as "pending," and never marking it as "paid." Webhooks from the payment provider were being received, parsed, and silently dropped because of a field rename. Unit tests had not caught it. Integration tests had not caught it. Production caught it, the way production always catches things, with money on the line and an executive on Slack.

The tests were green because we had carefully mocked every dependency. Our webhook handler test passed a fake payload that looked like what the documentation showed. The payment client test verified that we called the right method with the right arguments. Order service tests verified that when the payment client returned "paid," the order was marked paid. Every individual unit was tested. The composition of those units against a real provider was not.

Mocks Are Mirrors

A mock test verifies that your code calls a dependency the way you think it does. It does not verify that the dependency does what you think it does. The dependency is not in the test. There is only your assumption about what the dependency does, the code under test, and a layer of stubs sitting between them returning whatever you wrote in the test setup. If your assumption is wrong, the test passes anyway.

This is fine for a lot of code. A function that calculates a discount based on a price and a coupon does not need a real database. Mocking the price fetch and the coupon fetch lets you test the calculation logic without booting Postgres. The mocks here are not lying because the test is not asking them to. It is verifying the math.

Trouble starts when the test is verifying integration, not logic. A test that says "when the payment provider returns success, the order is marked paid" is making a claim about what the payment provider returns. If the test mocks the payment provider, the test is not verifying that claim. The test is verifying that the test setup is internally consistent.

The Drift Problem

The most insidious failure mode is mock drift. You write a mock that returns what the real service returned at the time you wrote the test. Six months later, the real service changes. They renamed a field. They added a required header. They started returning a different status code on a specific edge case. Your code handles this incorrectly, but your mock still returns the old shape. Your tests pass. Production fails.

This is not a hypothetical. Every API client I have maintained has had at least one bug that traced back to a mock that no longer matched reality. The mock was right at the time it was written. Nobody touches a passing test. Reality drifted. The test went on passing.

A mock written in 2024 is testing the API as it existed in 2024. If the API changed in 2025, your test is not testing the current system. It is testing a fossil.

The Mock-Everything Anti-Pattern

There is a school of testing that holds every test should be a unit test, and every unit test should mock all of its dependencies. Pure isolation. Fast tests. Easy to debug failures because each test exercises one unit.

This works for code that genuinely is a unit. It does not work for code whose actual job is to compose other things. A controller calls a service. The service calls a repository. The repository calls the database. If you mock the database in the repository test, the repository in the service test, and the service in the controller test, where is the test that verifies the database actually does what you think? Nowhere. You are testing the contracts you wrote about the database, not the database.

Sometimes the right answer is fewer mocks. Run the actual database in your test. Stand up the actual HTTP server. Wire real implementations together and test the composition. The tests are slower. They are also testing the system instead of testing your model of the system, which is a different and more useful thing.

Contract Tests for Service Boundaries

For dependencies you cannot stand up in a test (a third-party API, a service owned by another team), the right answer is contract testing. The basic idea: the consumer (your code) writes down what it expects from the provider (the API). The provider runs those expectations against its real implementation as part of its own CI. When the provider changes in a way that breaks the consumer's expectations, the provider's CI fails. The drift is caught before deploy, not in production.

Tools like Pact have made this easier. The pattern works without a tool too. The discipline is what matters: the consumer's mocks are not just fixtures in your codebase. They are claims that need to be verified against reality, periodically and automatically.

For external APIs you cannot run contract tests against (most third parties), the next best thing is recorded-replay testing. Make a real call once, save the response, replay it in subsequent tests. When the API changes, re-record. This still drifts, but at least the recording is grounded in reality at some specific point in time, and the rerecord step gives you a chance to notice changes.

Integration Tests Are Not a Slowdown Tax

The argument against integration tests is that they are slow. This is true. A test that boots Postgres takes longer than a test that mocks Postgres. Whole testing philosophies have been built around minimizing this slowdown.

Most of those philosophies optimize for the wrong thing. The cost of a 30-second integration test running on every PR is small. The cost of a missed bug in production is large. If your test suite takes two minutes longer because half of it talks to a real database, you have made a good trade.

The real cost of integration tests is not the runtime. It is the discipline required to keep them stable. Flaky tests against real systems become noise that everyone ignores, which then becomes the excuse to mock more. The fix is not "mock more." The fix is to engineer the integration tests to be reliable: deterministic data, clean teardowns, isolation between tests, no shared mutable state. This is harder than mocking. It is also the only way to actually verify your code works against the systems it depends on.

When Mocks Are the Right Tool

Mocks are the right tool when:

The code under test has logic that does not depend on the mocked thing's actual behavior.
The mocked thing is genuinely deterministic and stable (a system clock, a UUID generator).
The cost of running the real dependency is prohibitive (a third-party API that bills per call, hardware you do not have in CI).

Mocks are the wrong tool when:

The code under test is integrating with the mocked thing.
The mocked thing is something whose behavior could change without you knowing (any external API).
You are testing "when X returns Y, my code does Z" for an X that you cannot control.

A mock is a stand-in for the real thing. When you care about the real thing's behavior, the stand-in cannot tell you the truth.

The Audit That Will Change Your Mind

Look at your test suite. For each mocked dependency, ask: what would happen to my tests if this mock returned something different than the real dependency does? Would the tests still pass? Often, yes. That is the failure mode that ships bugs.

Then ask: when this dependency last changed in production, did any test fail? If no tests failed, your tests are not exercising the integration. They are exercising your assumptions, and your assumptions are stale by definition.

The goal is not zero mocks. The goal is honest tests. A test that says "I called the function and got back what I told the mock to return" is a tautology dressed as verification. The tests worth keeping are the ones that fail when the system fails.

The deploy that broke our payment integration was caught by a customer who could not place an order. A contract test landed the next day. A real integration test against the provider's sandbox landed the week after. Six months later, the provider changed the webhook payload format again. This time we knew within an hour, before any customer noticed. The tests had finally started telling us the truth.

TestingEngineering Practices

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

GitHub LinkedIn Email

👁 0 6 min read