← Back to Blog

The Hibernate Query You Didn't Write

Hibernate writes SQL you never see. Three repository lines execute a 2,100-character query that is usually worse than anything you would have written by hand.

It was a Friday afternoon and the dashboard was slow. One endpoint was taking eight seconds to respond. I pulled up the code and found the repository call:

return userRepository.findByStatusAndDepartment(Status.ACTIVE, dept);

Three lines. Clean. Readable. I would have written this myself.

Then I turned on hibernate.show_sql and watched what actually hit the database. A 2,100 character query with six joins, a cross product from my @OneToMany collections, and a WHERE clause that started with LOWER(email), guaranteeing that no index would ever be used. Running eight hundred times a minute. I had not written that SQL. I would not have written that SQL. But Spring Data JPA had, and I had shipped it.

This is the thing about Hibernate and Spring Data. The Java you write is not the SQL that runs. Most of the time you never look at the SQL, because the abstraction is the whole point. The tradeoff is that your database is being hit with queries you did not review, did not design, and often would have rejected in code review if someone had submitted them by hand.

The SELECT * Tax

Your @Entity has forty fields. A dozen of them are JSON columns, large TEXT fields, or eagerly-fetched associations. You write:

User user = userRepository.findById(id).orElseThrow();
return user.getName() + " " + user.getEmail();

You used two fields. Hibernate fetched forty. Plus every eagerly-fetched relation. Plus their relations. On a wide table this is hundreds of kilobytes of wasted IO and network per call, multiplied by however many times per second the endpoint is hit. The handler looks innocent. What runs against the database is anything but.

The fix is projections. Spring Data supports both interface-based and DTO-based projections, and neither gets used as often as it should. Define a projection type with only the fields you need:

public interface UserNameEmail {
    String getName();
    String getEmail();
}

public interface UserRepository extends JpaRepository<User, Long> {
    UserNameEmail findProjectedById(Long id);
}

Hibernate generates SELECT u.name, u.email FROM users u WHERE u.id = ?. Two columns. No joins. A fraction of the work. This is not premature optimization. It is avoiding deliberate waste. If your handler needs two fields, fetching forty is slow for no reason.

The Method Name That Kills Your Index

Spring Data's derived queries are magic until they generate SQL you would never accept from a human. My favorite example is one of the most common patterns in any Spring app:

User findByEmailIgnoreCase(String email);

Looks clean. You want case-insensitive email matching. What Hibernate generates is this:

SELECT * FROM users WHERE LOWER(email) = LOWER(?);

The LOWER(email) on the left side of the comparison means your index on email is useless. The database has to compute LOWER for every row in the table to evaluate the predicate. On a million-row table, every lookup becomes a sequential scan. Your email uniqueness check, which should be a millisecond index lookup, starts taking half a second.

The fix is a functional index or a normalized representation. PostgreSQL supports the first directly:

CREATE INDEX idx_users_email_lower ON users (LOWER(email));

Now the index is on the expression instead of the raw column, and the query can use it. The alternative is to store emails in a consistent case in the first place and drop the IgnoreCase suffix entirely:

User findByEmail(String email);

// In your service
userRepository.findByEmail(email.toLowerCase());

Faster, uses a plain index, and keeps the invariant in the data layer instead of relying on every caller to remember it.

The broader lesson is that convenient method names generate SQL with function calls on columns. findByCreatedDateAfter can generate CASTs. findByNameContaining generates LIKE '%?%', which cannot use a standard btree index on either side. Every function call on an indexed column is an index killed. Spring Data will not warn you. You find out when the query gets slow.

Cartesian Explosion and MultipleBagFetchException

This one has burned every Spring Boot developer at least once. You have a user with orders and with roles:

@Entity
public class User {
    @OneToMany(fetch = FetchType.LAZY)
    private List<Order> orders;

    @OneToMany(fetch = FetchType.LAZY)
    private List<Role> roles;
}

You want both collections eagerly fetched in one query. Natural approach:

@Query("SELECT u FROM User u LEFT JOIN FETCH u.orders LEFT JOIN FETCH u.roles WHERE u.id = :id")
User findWithOrdersAndRoles(@Param("id") Long id);

Hibernate throws MultipleBagFetchException. The error exists because JOIN FETCH-ing two collections produces a cross product. If the user has 100 orders and 50 roles, the database returns 5,000 rows. Hibernate deduplicates in memory, but the database still generates, serializes, and transfers every one of those 5,000 rows across the wire. The feature is disabled because it is wrong in every case that hits it.

The workaround most developers reach for is EntityGraph:

@EntityGraph(attributePaths = {"orders", "roles"})
User findById(Long id);

Under the hood Hibernate issues separate queries for each collection instead of joining them all in one. You avoid the cartesian explosion at the cost of multiple round trips. You are choosing between one catastrophic query and three reasonable ones.

The deeper issue is that eagerly fetching multiple collections is almost never what you actually want. In most cases the right answer is to fetch the user, then separately fetch the specific orders or roles the use case needs. The ORM makes it too easy to write Java code that generates SQL the database was never designed to run efficiently. user.getOrders().size() in a loop is a hundred queries. JOIN FETCH u.orders on a user with a thousand orders is a thousand-row result for one logical record. Every ergonomic Java pattern has a corresponding SQL pathology.

Pageable With Large Offsets

Spring Data's Pageable abstraction looks clean:

Page<Order> findByStatus(Status status, Pageable pageable);

The generated SQL for page 1,000 with size 20 is this:

SELECT * FROM orders WHERE status = ? ORDER BY id LIMIT 20 OFFSET 20000;

LIMIT 20 OFFSET 20000 means the database retrieves 20,020 rows, throws away the first 20,000, and returns the last 20. On a large table with thousands of pages, response time grows linearly with page number. Page 1 is fast. Page 1000 is slow. Page 10000 times out. Plus Spring Data will run a SELECT COUNT(*) to populate the Page metadata, which on a large filtered table is its own full table scan.

Spring Data gives you the tool that scales. It does not default to it. You have to know that offset pagination is broken at scale and switch to a cursor-based query yourself:

List<Order> findByStatusAndIdGreaterThanOrderById(
    Status status, Long lastId, Pageable pageable
);

The query becomes WHERE status = ? AND id > ? ORDER BY id LIMIT 20. The database uses the index on id to jump to the right position. Constant time regardless of page number. No wasted rows, no count query.

The Dialect Lie

Hibernate's selling point is database portability. SQL it generates runs on PostgreSQL, MySQL, Oracle, and SQL Server. That tradeoff is exactly what portability always means: SQL targets the lowest common denominator. It runs everywhere. It is not fast anywhere.

PostgreSQL has features Hibernate will not generate for you. Partial indexes. Lateral joins. JSONB operators. Array containment. Full-text search with tsvector. Window functions beyond the basics. All of them invisible to the ORM because they do not exist in every database Hibernate supports.

If you want to use them you drop to @Query(nativeQuery = true) and write the SQL by hand. Doing that breaks the portability the ORM was selling, which turns out to be fine because most applications never switch databases. The abstraction was serving a future that never comes.

Bulk operations have the same story. Hibernate's batch inserts work, but they are slower than PostgreSQL's COPY. Hibernate's batch updates work, but they are slower than a single UPDATE with a CASE expression. The ORM can generate correct SQL for bulk operations. It cannot generate the fastest SQL because the fastest SQL is database-specific.

What to Actually Do

Turn on SQL logging in your development environment. In application.properties:

spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.orm.jdbc.bind=TRACE

Now every query Hibernate generates shows up in your logs, formatted and with parameters bound. The first time you run the app with this on, you will see five queries where you expected one. You will see selects fetching columns you did not know existed. You will see WHERE clauses with function calls that guarantee a full table scan.

Then take the ten slowest queries and run EXPLAIN ANALYZE on them. Most of them will have obvious fixes: a projection instead of fetching the full entity, an EntityGraph where there was implicit lazy loading, a cursor query instead of a Pageable, a native query where you need a PostgreSQL-specific feature, a functional index where a method name generated a function call.

Hibernate and Spring Data are not the problem. Trusting them without looking at what they do is the problem. Those three lines of repository code you wrote do not execute. A 2,100-character query does. Until you read the SQL, you have no idea what your application is actually doing to the database.

The fix for our Friday afternoon incident ended up being four changes: a projection on the list endpoint, a case-normalized email column with a plain index, two @Query annotations to replace derived queries that were generating unnecessary joins, and a switch from Pageable to cursor pagination on the one endpoint that had blown past 10,000 pages. Response time dropped from eight seconds to eighty milliseconds. None of those changes were Hibernate's fault. All of them were things Hibernate had been quietly doing to us because we had not looked.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 7 min read

Comments (0)