← Back to Blog

Nobody Reads a 1,000-Line Diff

The math on PR review quality is brutal. Past 400 lines, defect detection collapses, approvals get rubber-stamped, and your review process is theatre.

The PR was 1,847 lines. It touched 23 files. The author had been working on it for nine days. It had three approvals already by the time I got to it. I scanned the description. I clicked through five or six files, made some inline comments about naming, looked at the test count (it had tests, which was reassuring), and approved it.

It shipped. Three weeks later we found a bug in some validation logic that had been broken since the day the PR landed. Nobody had caught it in review. Nobody could have caught it in review. By approving that PR, I had not actually reviewed it. I had skimmed it and signed my name.

This happens constantly. Good engineers. Teams with strong code review cultures. Past a certain size, code review stops being a quality gate and starts being a ritual. The math is not subtle. The math is brutal. And almost nobody talks about it honestly.

What the Math Actually Says

Cisco's well-known code review study found that defect detection rates drop sharply once a review crosses about 200 lines of code. Past 400 lines, you are mostly missing things. Past 1,000, the reviewer is functionally guessing. SmartBear's later research replicated this. The shape of the curve is consistent across every team I have ever seen study it.

The reason is simple. A reviewer reading a diff is holding three things in their head: the existing codebase, the proposed change, and the implications of the change rippling through both. That third thing is where bugs live, and it is the most expensive to compute. Each new file in the diff adds load. Each new function adds load. After about thirty minutes of focused review, comprehension drops. After an hour, retention drops. By file ten of a twenty-three file diff, the reviewer is no longer building a mental model. They are pattern-matching for things that look wrong, which is why they catch naming issues but miss logic bugs.

None of this is the reviewer's fault. It is just how human attention works on code. You can train yourself to read longer diffs more carefully, and senior engineers do, but the curve still bends down. Nobody is exempt.

The Objection Nobody Says Out Loud

When you tell a team to keep PRs small, the official pushback is always something like "some changes are by design large" or "splitting this would make it harder to understand." Those are real concerns sometimes. But they are not the actual objection.

The actual objection is that small PRs feel like a tax. You spend more time on the overhead: opening the PR, waiting for review, addressing comments, rebasing, merging. The ratio of work-on-PRs to work-on-code goes up. You ship the same amount of code in more pieces, with more friction. It feels slower. Honest engineers will tell you this if you ask them directly.

The feeling is real. It is also wrong about the timeline that matters.

Why Small PRs Feel Slow but Ship Faster

The intuition: each PR cycle takes time. Open, review, address comments, approve, merge. If each cycle is 24 hours, three small PRs are 72 hours. One big PR is 24 hours. So batching wins.

This math is wrong in two places. First, big PRs sit in review longer because reviewers procrastinate. A 1,500-line diff is something nobody wants to start on Monday morning. So it sits. Then it sits some more. The reviewer eventually does it under time pressure, which is the worst possible context for catching subtle issues. The cycle time on a big PR is rarely 24 hours. It is usually three or four days, sometimes a week.

Second, the cycle time is not the only cost. The cost that gets ignored is everything that happens after merge. The bug from a poorly reviewed PR does not show up in the cycle time number. It shows up later, as a production incident, a customer complaint, a hotfix that takes the team three days to ship. If you actually counted the time from "PR opened" to "the change is correctly running in production with no follow-up needed," small PRs win consistently.

The Cultural Tell

The strongest signal of a team's actual review culture is how senior engineers respond to small PRs. The good ones approve them quickly with one or two genuine comments. They treat each PR as a chance to catch something or improve something. They do not treat the act of reviewing as overhead.

The bad ones push back: "Why didn't you include the database migration in this PR? Now I have to review another one tomorrow." Or, "This should have been combined with the API change." The framing is wrong. They are treating each PR as a cost. The correct framing is that each additional PR is another chance to catch a problem before it merges. Splitting is not generating more work. Splitting is generating more review opportunities.

If your senior engineers complain about small PRs, your review process is broken in a specific way. The complaints will be phrased as efficiency concerns, but they are actually about reviewer comfort. The reviewer wants to see all the context in one place, even if seeing all of it means missing most of it.

How to Actually Split

The hard part is not deciding to split. It is finding the right split lines. A few patterns that work:

Refactor-first PRs. If your change requires moving code around or extracting a function, do that as its own PR. Behavior unchanged. Reviewer can confirm in five minutes that no logic changed. Then the actual feature PR is small because all the structural change is already merged.

Stacked PRs. Open PR 2 against PR 1's branch while PR 1 is still in review. Most platforms support this directly now. The cost is that PR 2 has to be rebased after PR 1 merges, which takes thirty seconds. The benefit is that you can keep working without batching.

Feature flags. Ship the code paths in small PRs, all behind a disabled flag. None of it runs in production until you flip the flag. Each PR is independently reviewable because each one is incomplete on purpose.

The one-sentence test. Every PR should answer "what does this change?" in one sentence. If you cannot, you are doing too much in one PR. "Adds caching to the user service" is a sentence. "Adds caching, refactors the user service, and updates the auth middleware" is three PRs.

What to Do Tomorrow

Look at your last ten PRs. How many were over 400 lines? Of those, how many got more than two substantive review comments (not nits, not naming, actual logic comments)? That ratio is your review process's quality signal. Most teams that audit this honestly are surprised by how bad it looks.

The goal is not "every PR under 400 lines." Some changes really are by design coupled. The goal is that no PR gets approved after a skim. Splitting is the lever you have to pull most of the time, but the underlying discipline is reviewer attention. If your reviewers are skimming 1,000-line diffs and signing off, the size is the symptom, not the disease.

The next time you are about to approve a PR you have not really read, do not. Either read it for real or push back and ask the author to split it. Neither option is comfortable. Either beats what you are doing now.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 5 min read

Comments (0)