Umur Inan/ Books/ Hibernate & Spring Data JPA in Depth/ Free Sample
Free sample. This is the Preface and Chapter 1 of Hibernate & Spring Data JPA in Depth by Umur Inan. The full book is available below.

Preface

Why This Book Exists

There is a version of this book that starts with “What is JPA?” and ends with a working CRUD application. This is not that book.

If you have been writing Spring Boot applications for a year or more, you already know how to define an entity, declare a repository, and call findById. What you may not know is why that findById sometimes returns a cached instance from two requests ago, why your integration tests pass while production silently corrupts data under concurrent load, or why enabling one extra association suddenly makes your endpoint take twelve seconds instead of forty milliseconds.

Those are the questions this book answers.

What This Book Assumes

You have read — or have the equivalent experience of — a practical Spring Boot book. You know @RestController, @Service, @Repository, @Transactional, and @Entity. You have written JPA entities, defined Spring Data repositories, and run a Spring Boot application against a real database.

What you have not done — or have done without fully understanding — is any of the following:

  • Traced a request through the persistence context from the moment a method is called to the moment a SQL statement executes
  • Deliberately triggered a LazyInitializationException and understood exactly why it happened
  • Configured the second-level cache and measured whether it helped
  • Debugged a deadlock in production and found the two code paths that created it
  • Written a custom Hibernate UserType for a domain-specific Java type
  • Set up row-level security using @Filter and understood why it can silently fail

This book is for the engineer who wants to understand all of those things deeply enough to make good decisions, not just follow tutorials.

What You Will Build

CinéTrack grows up in this book. The application introduced in the Spring Boot book — movies, users, watchlists, reviews — gains a more complex domain in these pages: a subscription model, media file associations, a full audit trail, a multi-tenant architecture for the enterprise tier, and full-text search.

Every chapter extends CinéTrack in a way that creates a genuine need for the concept being taught. Optimistic locking appears when two users try to edit the same review simultaneously. The second-level cache appears when the read-heavy movie catalog starts to stress the database. Hibernate Search appears when users complain that PostgreSQL ILIKE cannot rank results by relevance.

The companion code is at code/chapter-NN/ — one standalone Maven project per chapter, each runnable on its own.

What This Book Is Not

It is not a Hibernate reference manual. Vlad Mihalcea’s work and the official Hibernate documentation cover the API surface exhaustively. This book teaches judgment: when to use which tool, what breaks when you choose wrong, and how to diagnose problems that only appear under production load.

It is not a database book. SQL, indexes, query planning, and vacuum are not covered here except where they directly affect how you should configure Hibernate.

It is not a microservices book. CinéTrack remains a monolith throughout. The patterns here apply directly to microservices as well, but introducing service boundaries would obscure the persistence concepts.

How to Use This Book

Read Part I before anything else. The persistence context, transaction model, and Hibernate 7 internals are the mental model everything else depends on. Engineers who skip to the performance chapters and then wonder why the fixes don’t work have usually skipped Part I.

After Part I, the parts are largely independent. If your immediate problem is N+1 queries, go to Part V. If you need to implement multi-tenancy next week, go to Part VI. The cross-references will send you back to earlier chapters when a concept needs foundation.

The code examples use Java 21, Spring Boot 4, Hibernate 7, and PostgreSQL 16. The Maven POMs in the companion repository are complete and runnable.

Acknowledgments

The bugs are mine. The good ideas are everyone else’s, often without attribution because I forgot where I first heard them. If you recognize one of yours, please write to me; I would like to credit you in the next edition.


Umur Inan New York, 2026


1 JPA, Hibernate 7, and the Contract

1.1 Overview

“Object/relational mapping is the Vietnam of computer science.” – Ted Neward, 2006

I had been using Spring Data JPA for two years before I truly understood what it was doing. Repositories, @Entity annotations, queries running, data persisting. Everything worked. Then a production service started throwing LazyInitializationException at 3am, and I realized I had no idea why. The stack trace pointed into Hibernate internals I’d never bothered to read. My abstractions had held until they didn’t.

The mistake was mine. I had learned the API without learning the contract beneath it. JPA is a specification with a precise lifecycle model for entities. Hibernate is an implementation of that specification, with years of additional machinery layered on top. Spring Boot is a configuration harness that wires them together. Three distinct layers, each making decisions that affect your application in ways you can’t anticipate without knowing what those decisions are.

This chapter strips away the automation. We’ll look at what JPA actually guarantees, what Hibernate adds beyond that, how the SessionFactory comes to exist, what the ServiceRegistry holds, and how Spring Boot 4.0.5 automates the whole sequence. We’ll also introduce CinéTrack, the domain model that runs through every chapter of this book.

By the end of this chapter, you’ll understand:

  • The boundary between the JPA specification and Hibernate’s extensions, and why that boundary matters for everyday decisions
  • How EntityManagerFactory and SessionFactory relate in Hibernate 7 and why creation cost is not trivial
  • What happens during the Hibernate boot sequence and what Spring Boot automates at each stage
  • How ServiceRegistry works and where to plug in custom behavior
  • What changed in Hibernate 7: the SQM query translator, the new type system, and Jakarta Persistence 3.2
  • The CinéTrack entity model that every subsequent chapter builds on

1.2 The JPA Spec and What Hibernate Adds

Jakarta Persistence API (JPA) is a specification. It defines a contract: how entities are mapped to tables, how a persistence context manages object identity, how JPQL queries are written, how transactions demarcate units of work. What it does not define is how any of that is implemented. That’s the implementation’s problem.

Hibernate is one implementation. It satisfies the JPA contract and then keeps going.

1.2.1 What the Spec Actually Guarantees

JPA specifies four things you can rely on regardless of which provider you’re running.

Entity lifecycle. An entity is always in one of four states: transient (not associated with any persistence context), managed (tracked by the current EntityManager), detached (was managed, now isn’t), or removed (scheduled for deletion). The spec defines how entities move between these states and what operations are valid in each. If you call merge() on a detached entity, the spec tells you what to expect. If you call remove() on a transient entity, the spec tells you what exception you’ll get.

Identity within a persistence context. Within a single EntityManager, every entity is identified by its primary key. Load the same row twice and you get the same Java object reference. This is guaranteed by the spec. Hibernate calls it the first-level cache, but the spec just calls it “persistence context identity.”

JPQL. The Java Persistence Query Language is part of the spec. Any JPQL query you write should work on any compliant provider. The syntax covers SELECT, UPDATE, DELETE, JOIN, WHERE, aggregate functions, and subqueries. Stick to JPQL and your queries are portable.

Transaction integration. JPA specifies how EntityManager participates in JTA (Java Transaction API) and resource-local transactions. Spring’s @Transactional works because it talks to this contract.

That’s it. Nothing about second-level caching. Nothing about batch fetching strategies. Nothing about schema generation details or dialect-specific types. The spec is deliberately narrow.

1.2.2 What Hibernate Adds

Hibernate’s extensions are where the interesting decisions live.

HQL beyond JPQL. Hibernate Query Language is a superset of JPQL. HQL supports FETCH JOIN with collection filtering, TREAT for polymorphic downcasting, natural identifier queries, and the full SQM expression set introduced in Hibernate 6 and expanded in 7. When you write a query that works in Hibernate but breaks if you swap to EclipseLink, you’re using an HQL extension.

Batch fetching. The spec defines no strategy for loading collections or associations in batches. Hibernate’s @BatchSize and hibernate.default_batch_fetch_size are Hibernate-specific. They’re also one of the most effective tools for avoiding N+1 problems, which is why you’ll see them throughout this book.

Second-level cache. JPA has a @Cacheable annotation and a Cache interface, but the actual eviction policies, region factories, and provider integrations (Ehcache, Redis via JCache) are all outside the spec. Hibernate ships with its own cache SPI and integrates with any JSR-107-compliant provider.

Envers. Automatic entity auditing. Annotate an entity with @Audited and Hibernate creates a shadow table that records every change with a revision number. Zero lines of audit code in your application. The spec has no equivalent.

Filters and interceptors. @FilterDef and @Filter let you attach reusable WHERE clauses to queries at the session level, activated or deactivated at runtime. Useful for soft deletes, tenant isolation, and data visibility rules.

@Entity
@FilterDef(                                                              // (1)
    name = "activeOnly",
    parameters = @ParamDef(name = "active", type = Boolean.class)
)
@Filter(name = "activeOnly", condition = "active = :active")             // (2)
public class Movie {
    @Id
    private Long id;
    private boolean active;
    // ...
}

(1) @FilterDef declares the filter and its parameters at the entity level. No JPA equivalent exists.

(2) @Filter attaches the SQL fragment. Activated per-session with session.enableFilter("activeOnly").setParameter("active", true).

Multitenancy. Hibernate supports database-per-tenant, schema-per-tenant, and discriminator-per-tenant strategies natively. The spec is silent on multitenancy.

1.2.3 Why the Distinction Matters

If you program only to the JPA API, you’re writing portable code. You could theoretically swap providers. In practice, almost no team does this, which means portability costs you features for no real benefit.

My opinion: don’t pay the portability tax. Use Hibernate as Hibernate. Use @BatchSize, use Envers, use HQL extensions when they solve the problem better than JPQL would. Document your Hibernate-specific choices in comments so the next engineer isn’t confused, but don’t hobble yourself with artificial constraints.

The spec matters for one thing: understanding the baseline guarantees. When something behaves unexpectedly, the first question is whether you’re relying on a Hibernate extension that has edge cases the spec doesn’t protect you from. Knowing where the spec ends and Hibernate begins helps you ask the right question.

Tip

Spring Data JPA’s JpaRepository methods are written against the JPA spec. When you need Hibernate-specific behavior, drop down to EntityManager via @PersistenceContext or use a custom repository implementation. Don’t fight the abstraction; just know when to step past it.

Note

Jakarta Persistence 3.2, shipped with Hibernate 7, expanded the spec to include features that were previously Hibernate-only: @NamedEntityGraph improvements, Union query support, and a richer criteria API. The line between spec and extension moves slowly, but it does move.

The spec tells us what any JPA provider must do. Hibernate tells us what this provider actually does, which is quite a bit more. But before we can use any of it, we need to understand the objects at the center of Hibernate’s runtime: SessionFactory and EntityManagerFactory. They turn out to be the same object.

1.4 EntityManagerFactory and SessionFactory

The JPA spec defines EntityManagerFactory. Hibernate defines SessionFactory. In Hibernate 7, SessionFactory extends EntityManagerFactory. They are the same object.

SessionFactory sf = entityManagerFactory.unwrap(SessionFactory.class); // (1)
EntityManagerFactory emf = sessionFactory;                              // (2)

(1) unwrap() is the JPA-standard way to get the Hibernate-specific interface. No casting, no reflection needed.

(2) The assignment compiles because SessionFactory implements EntityManagerFactory. They are assignment-compatible.

You’ll see both names in code, documentation, and Stack Overflow answers from different eras. Don’t let the two names confuse you into thinking they have different lifecycles or different creation costs.

1.4.1 What the Factory Holds

SessionFactory is the heavyweight object in a Hibernate application. Built once at startup, lives for the life of the application. What it holds explains why creation is expensive.

Entity metadata. Every @Entity class has been scanned, its annotations parsed, its field types resolved, its associations mapped. The result is a ClassMetadata graph that Hibernate uses to generate SQL, navigate associations, and track dirty state. This scan happens at startup, not per-query.

SQL templates. Hibernate pre-generates the SQL for standard CRUD operations per entity: insert, select-by-id, update, delete. These are dialect-specific strings, cached and reused for every operation. Generating them once at boot means no string building per request.

Connection pool. The factory owns the JDBC connection pool (HikariCP by default in Spring Boot). Connections are checked out when a Session opens and returned when it closes. The pool configuration lives here.

Second-level cache. If you configure a second-level cache, the cache region manager lives in the factory. All sessions share it. First-level cache is per-session; second-level cache is per-factory.

Type registry. Hibernate 6 rewrote the type system around JavaType and JdbcType. The factory holds the TypeConfiguration that maps Java types to JDBC types to SQL types. Custom type mappings you register are stored here.

1.4.2 Creation Cost

Building a SessionFactory is not fast. On a real application with 50-100 entities, expect 2-5 seconds of startup time attributed to Hibernate. That cost comes from:

  • Classpath scanning for @Entity classes
  • Annotation processing and metadata graph construction
  • SQL template generation per entity per dialect
  • Connection pool initialization and connection pre-warming
  • Second-level cache region setup
  • Schema validation or generation (if configured)

Warning

Never create a SessionFactory per-request, per-user, or per-tenant (unless you’re doing database-per-tenant multitenancy with a factory cache). One factory, application scope, created once. Any other pattern will exhaust memory and connections.

1.4.3 Thread Safety

SessionFactory is fully thread-safe. All threads in your application share one instance. This is by design.

Session (and EntityManager) is not thread-safe. One session per thread, per request, per unit of work. Spring manages session lifecycle through @Transactional. The session opens when the transaction starts, flushes and closes when the transaction commits or rolls back.

@Service
@RequiredArgsConstructor
public class MovieService {

    private final EntityManagerFactory emf; // (1)

    public Movie findById(Long id) {
        try (EntityManager em = emf.createEntityManager()) { // (2)
            return em.find(Movie.class, id);                 // (3)
        }                                                    // (4)
    }
}

(1) One factory instance, injected, shared across all callers.

(2) Open a new EntityManager (= Session) for this unit of work. Opening is cheap: it acquires a connection from the pool and creates a context object.

(3) find() by primary key uses first-level cache on this session and, if configured, second-level cache.

(4) Try-with-resources closes the EntityManager, returning the JDBC connection to the pool.

In practice, Spring handles this for you via @Transactional. You rarely write the try-with-resources pattern directly. But knowing it happens under the hood is what makes LazyInitializationException predictable rather than mysterious: the session is closed, the proxy can’t load the association, you get the exception.

Tip

SessionFactory.getStatistics() returns a Statistics object with cache hit rates, query execution counts, and connection checkout times. Enable it with hibernate.generate_statistics=true. Expose the metrics via Micrometer in production. The numbers will surprise you the first time you look.

Important

If Hibernate startup time is becoming a CI bottleneck, check spring.jpa.hibernate.ddl-auto. Setting it to validate forces schema comparison against the live database on every startup. Set it to none and let Flyway handle schema management separately. The difference is measurable, especially against remote databases.

The SessionFactory exists. But how does it come to exist? Hibernate’s boot sequence is a multi-stage pipeline, and understanding each stage is the key to understanding what Spring Boot automates and where you can intervene.

1.5 The Hibernate Boot Process

When you use Spring Boot, you never call Hibernate’s boot API directly. Spring Boot calls it for you. But knowing what it calls, and in what order, is the difference between being able to customize Hibernate’s behavior and being stuck with whatever auto-configuration decided.

The boot sequence has five stages. Each stage produces an object the next stage consumes.

1.5.1 Stage 1: BootstrapServiceRegistry

Before Hibernate can read your configuration or scan your entities, it needs some foundational services: a classloader, a way to discover integrators, and a registry of named strategies. BootstrapServiceRegistry holds these. It’s the root of the registry hierarchy.

You rarely build it directly. StandardServiceRegistryBuilder creates it implicitly. The only reason to touch it is if you need to register a custom ClassLoader (OSGi, for example) or a custom Integrator.

1.5.2 Stage 2: StandardServiceRegistry

StandardServiceRegistry holds the runtime services: dialect, connection provider, transaction coordinator, second-level cache region factory. It’s built from configuration properties and the BootstrapServiceRegistry.

StandardServiceRegistry registry = new StandardServiceRegistryBuilder()
    .applySetting(AvailableSettings.DIALECT, PostgreSQLDialect.class.getName()) // (1)
    .applySetting(AvailableSettings.HBM2DDL_AUTO, "validate")                   // (2)
    .build();                                                                    // (3)

(1) AvailableSettings is Hibernate’s typed constants class for configuration keys. Always prefer it over string literals.

(2) Schema validation mode. Hibernate compares the mapped model against the database schema at startup.

(3) build() instantiates and initializes all services. HikariCP starts here.

1.5.3 Stage 3: MetadataSources

MetadataSources is where you tell Hibernate about your domain model. Annotated classes, persistence.xml entries, legacy hbm.xml files: all feed into MetadataSources.

MetadataSources sources = new MetadataSources(registry); // (1)
sources.addAnnotatedClass(Movie.class);                   // (2)
sources.addAnnotatedClass(AppUser.class);
sources.addAnnotatedClass(WatchLog.class);

(1) MetadataSources takes the ServiceRegistry because some services (dialect resolution) are needed to interpret metadata correctly.

(2) Explicitly registering entity classes. In Spring Boot, LocalContainerEntityManagerFactoryBean does this via package scanning configured by spring.jpa.packages-to-scan or the base package of your @SpringBootApplication.

1.5.4 Stage 4: MetadataBuilder and Metadata

MetadataSources.getMetadataBuilder() returns a MetadataBuilder for configuring naming strategies, type contributors, and implicit mapping rules. Call build() and you get Metadata.

Metadata metadata = sources.getMetadataBuilder()
    .applyPhysicalNamingStrategy(                                               // (1)
        new CamelCaseToUnderscoresNamingStrategy())
    .applyImplicitNamingStrategy(                                               // (2)
        ImplicitNamingStrategyJpaCompliantImpl.INSTANCE)
    .build();                                                                   // (3)

(1) PhysicalNamingStrategy transforms the logical name to the actual database identifier. Spring Boot 4’s default is CamelCaseToUnderscoresNamingStrategy: watchLog becomes watch_log.

(2) ImplicitNamingStrategy controls what name is chosen when no explicit @Column or @Table name is provided.

(3) This is where annotation parsing happens. Hibernate processes every @Entity, resolves every @ManyToOne and @OneToMany, validates every association, and builds the complete mapping graph. Metadata is an immutable snapshot of your entire domain model.

1.5.5 Stage 5: SessionFactoryBuilder and SessionFactory

SessionFactory sessionFactory = metadata.getSessionFactoryBuilder()
    .applyStatisticsSupport(true)         // (1)
    .applySecondLevelCacheEnabled(true)   // (2)
    .build();                             // (3)

(1) Enables Hibernate’s built-in statistics collection.

(2) Activates second-level cache. Requires a cache provider (Caffeine, Ehcache, JCache) on the classpath.

(3) SQL template generation, connection pool initialization, cache region setup, and type registry finalization all happen here. This is the expensive step. Everything before this is cheap compared to this.

1.5.6 What Spring Boot 4.0.5 Automates

HibernateJpaAutoConfiguration drives the sequence. Here’s how each stage maps to Spring Boot components:

Boot Stage Spring Boot Component
BootstrapServiceRegistry Built internally by LocalContainerEntityManagerFactoryBean
StandardServiceRegistry HibernatePropertiesCustomizerComposite feeds settings from application.yaml
MetadataSources Package scanning via @SpringBootApplication base package
MetadataBuilder config HibernatePropertiesCustomizer beans and spring.jpa.properties.*
SessionFactory creation LocalContainerEntityManagerFactoryBean.afterPropertiesSet()

The configuration surface in application.yaml:

spring:
  jpa:
    hibernate:
      ddl-auto: validate              # (1)
    properties:
      hibernate:
        dialect: org.hibernate.dialect.PostgreSQLDialect # (2)
        default_batch_fetch_size: 25  # (3)
        generate_statistics: true     # (4)
    open-in-view: false               # (5)

(1) Schema validation at startup. Mismatches fail fast rather than at query time.

(2) PostgreSQL 16 dialect. Spring Boot 4 can auto-detect from the JDBC URL, but explicit is better for traceability.

(3) Global @BatchSize equivalent. Every collection and association uses batch-of-25 loading unless overridden locally.

(4) Hibernate statistics. Expensive in production; use with sampling or only in staging.

(5) Turns off Open Session in View. You want this off. Chapter 5 explains the full reasoning.

Tip

Any spring.jpa.properties.hibernate.* key maps directly to a Hibernate configuration property. If the Hibernate documentation lists a property as hibernate.jdbc.batch_size, add it as spring.jpa.properties.hibernate.jdbc.batch_size and Spring Boot passes it through unchanged.

Important

LocalContainerEntityManagerFactoryBean.afterPropertiesSet() runs synchronously on the main thread during startup. If schema validation fails (wrong column type, missing table, unexpected column), the application refuses to start. This is the correct behavior. Fail at startup, not at 2am when the first query hits.

Stage 2 of the boot sequence builds a StandardServiceRegistry. That registry is not just a configuration bag: it’s a full service locator with a specific architecture. Understanding it gives you hooks to customize Hibernate’s behavior in ways that spring.jpa.properties alone can’t reach.

1.6 ServiceRegistry Internals

Hibernate’s ServiceRegistry is a service locator. It holds named services that Hibernate components depend on, provides them on request, and manages their lifecycle. Think of it as Hibernate’s own dependency injection framework, predating Spring’s influence on the codebase by years.

There are two registries, and they form a parent-child hierarchy.

1.6.1 BootstrapServiceRegistry

BootstrapServiceRegistry is the root. It holds three services that exist solely to bootstrap everything else.

ClassLoaderService handles all classloading within Hibernate. It aggregates multiple ClassLoader instances (the application classloader, Hibernate’s classloader, any extras you specify) and presents a unified view. If you’ve ever wondered how Hibernate finds your @Entity classes in a custom classloader context like OSGi or a multi-module build, this service is the answer.

IntegratorService manages Integrator implementations. An Integrator receives the Metadata and SessionFactory after they’re built and can wire in additional behavior. Spring uses this. Envers uses this. Any library that needs to hook into Hibernate’s bootstrap lifecycle registers an Integrator.

StrategySelector is a registry of named strategy implementations. When you write hibernate.cache.region.factory_class=jcache, Hibernate uses StrategySelector to resolve the short name jcache to a fully qualified class name. You can register your own short names here.

BootstrapServiceRegistry bootstrapRegistry = new BootstrapServiceRegistryBuilder()
    .applyIntegrator(new MyAuditIntegrator()) // (1)
    .applyClassLoader(isolatedClassLoader)    // (2)
    .build();

(1) Register a custom Integrator. It will receive Metadata and the SessionFactory after boot completes, letting you inspect or augment the mapping model.

(2) Add an extra classloader. Hibernate’s ClassLoaderService will delegate to it when standard classloading fails.

1.6.2 StandardServiceRegistry

StandardServiceRegistry is built on top of BootstrapServiceRegistry and holds the runtime services Hibernate needs during normal operation.

JdbcEnvironment encapsulates database-level metadata: product name, server version, identifier quoting character. Hibernate uses this to generate correct SQL for the target database.

ConnectionProvider manages JDBC connections. By default, this wraps HikariCP. Replaceable with a custom implementation to use any connection source: JNDI, a custom pool, or a test double that returns an in-memory H2 connection.

RegionFactory is the second-level cache service. The JCacheRegionFactory, EhcacheRegionFactory, or any custom region factory lives here.

TransactionCoordinatorBuilder coordinates transaction participation. This is how Hibernate knows whether it’s running inside JTA or a resource-local transaction. Spring’s transaction management talks to this service.

SchemaManagementTool handles hbm2ddl operations: create, validate, update, drop. Schema generation calls route through here.

1.6.3 Replacing a Service

The most common reason to touch ServiceRegistry directly is replacing a service with a custom implementation. The canonical example: swapping ConnectionProvider to use a DataSource from JNDI, a cloud-managed connection service, or a test double.

public class BoundDataSourceConnectionProvider        // (1)
        implements ConnectionProvider {

    private final DataSource dataSource;

    public BoundDataSourceConnectionProvider(DataSource ds) {
        this.dataSource = ds;
    }

    @Override
    public Connection getConnection() throws SQLException {
        return dataSource.getConnection();
    }

    @Override
    public void closeConnection(Connection conn) throws SQLException {
        conn.close();
    }

    @Override
    public boolean supportsAggressiveRelease() {
        return false;
    }
}

// Registration:
StandardServiceRegistry registry = new StandardServiceRegistryBuilder()
    .addService(ConnectionProvider.class,              // (2)
        new BoundDataSourceConnectionProvider(myDataSource))
    .build();

(1) Implement the ConnectionProvider SPI. Hibernate calls getConnection() every time it needs a JDBC connection.

(2) addService() overrides the default service. Hibernate’s service lookup will return your instance from this point.

1.6.4 The Spring Boot Extension Point

In a Spring Boot application, you rarely touch StandardServiceRegistryBuilder directly. The extension point is HibernatePropertiesCustomizer:

@Bean
public HibernatePropertiesCustomizer hibernatePropertiesCustomizer() {
    return properties -> {
        properties.put(                                  // (1)
            AvailableSettings.STATEMENT_BATCH_SIZE, 50);
        properties.put(
            AvailableSettings.USE_SECOND_LEVEL_CACHE, true);
        properties.put(
            AvailableSettings.CACHE_REGION_FACTORY,
            JCacheRegionFactory.class);
    };
}

(1) Settings added here go directly into the StandardServiceRegistryBuilder. They override spring.jpa.properties.* from application.yaml. Useful when configuration values are determined programmatically rather than statically.

For deeper customization (replacing services, registering integrators), implement EntityManagerFactoryBuilderCustomizer. Spring Boot calls it with the full EntityManagerFactoryBuilder before LocalContainerEntityManagerFactoryBean finalizes the factory, giving you access to the full Hibernate boot API.

Note

Services in StandardServiceRegistry are singletons within their registry. If you register a stateful custom service, that state is shared across every session and thread. Design custom services to be thread-safe or stateless.

Warning

If you call StandardServiceRegistryBuilder.build() directly in a Spring Boot application, you’ll end up with two separate StandardServiceRegistry instances: yours and Spring Boot’s. They won’t share anything. Configuration, connection pool, second-level cache: all duplicated. Always customize through the Spring Boot extension points.

You now know how Hibernate builds itself and where the extension seams are. The next question is what Spring Boot 4.0.5 specifically configures in this pipeline, which properties map to which Hibernate settings, and how to override the defaults without fighting the framework.

1.7 Spring Boot 4.0.5 Auto-Configuration

Spring Boot’s JPA auto-configuration is not magic. It’s a specific sequence of @Bean definitions and @ConditionalOn* guards, all wired together in HibernateJpaAutoConfiguration. Understanding what it does makes overriding it deliberate rather than accidental.

1.7.1 The Auto-Configuration Chain

HibernateJpaAutoConfiguration extends JpaBaseConfiguration. The chain works like this:

  1. DataSourceAutoConfiguration creates a DataSource bean (HikariCP by default).
  2. HibernateJpaAutoConfiguration detects the DataSource and a Hibernate implementation on the classpath.
  3. It creates a LocalContainerEntityManagerFactoryBean, which drives the entire Hibernate boot sequence.
  4. JpaTransactionManagerAutoConfiguration creates a JpaTransactionManager backed by the EntityManagerFactory.
  5. Spring Data JPA’s JpaRepositoriesAutoConfiguration scans for @Repository interfaces and generates proxy implementations.

Each step is conditional. Remove Hibernate from the classpath and step 2 doesn’t fire. Remove Spring Data JPA and step 5 doesn’t fire. The conditions are explicit in the source code if you need to read them.

1.7.2 Key Properties and Their Hibernate Mappings

Every spring.jpa.properties.hibernate.* entry is passed verbatim to StandardServiceRegistryBuilder. The mapping is direct: no translation, no transformation.

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/cinetrack  # (1)
    username: cinetrack
    password: secret
    hikari:
      maximum-pool-size: 20                          # (2)
      minimum-idle: 5
      connection-timeout: 30000
  jpa:
    hibernate:
      ddl-auto: validate                             # (3)
    show-sql: false                                  # (4)
    open-in-view: false                              # (5)
    properties:
      hibernate:
        dialect: org.hibernate.dialect.PostgreSQLDialect # (6)
        default_batch_fetch_size: 25                 # (7)
        jdbc.batch_size: 50                          # (8)
        order_inserts: true                          # (9)
        order_updates: true                          # (10)
        generate_statistics: false                   # (11)
        format_sql: true                             # (12)

(1) HikariCP uses this URL to create connections. Hibernate auto-detects PostgreSQLDialect from it, but explicit dialect (6) is clearer.

(2) Pool size. For a typical web application, 10-20 is a reasonable starting point. Match this to your PostgreSQL max_connections minus connections used by other services.

(3) validate checks the schema at startup without modifying anything. Use this in production. Use create-drop only in tests.

(4) show-sql logs SQL to stdout. Convenient during development, never in production. It bypasses SLF4J and can’t be controlled by your logging configuration.

(5) Open Session in View. Off. Always. Chapter 5 covers this in depth.

(6) Explicit dialect. Auto-detection works but adds a JDBC roundtrip at startup to query the database version.

(7) Default batch fetch size for all collections and associations. Without this, every @OneToMany traversal is a potential N+1. With it, Hibernate issues one query per 25 items.

(8) JDBC batch size for bulk inserts and updates. Hibernate accumulates this many statements before flushing them to the database in a single round trip.

(9) Reorders insert statements by entity type before batching. Required for effective JDBC batching when inserting multiple entity types in one transaction.

(10) Same for updates. Without this, Hibernate issues updates in the order entities were dirtied, which breaks batch grouping.

(11) Statistics collection. Off in production unless you’re sampling. The overhead is low but not zero.

(12) Formats SQL across multiple lines. Only meaningful when logging SQL. Combine with show-sql: true during debugging.

1.7.3 Customizing the EntityManagerFactory

For programmatic customization beyond application.yaml, Spring Boot provides two hooks.

HibernatePropertiesCustomizer gets the raw properties map before they’re passed to Hibernate:

@Configuration
public class HibernateConfig {

    @Bean
    public HibernatePropertiesCustomizer hibernatePropertiesCustomizer(
            CacheManager cacheManager) {                         // (1)
        return properties -> {
            properties.put(
                AvailableSettings.CACHE_REGION_FACTORY,
                new JCacheRegionFactory());                      // (2)
            properties.put(
                AvailableSettings.USE_SECOND_LEVEL_CACHE, true);
            properties.put(
                AvailableSettings.USE_QUERY_CACHE, true);
        };
    }
}

(1) Spring injects other beans into the customizer. You can wire in a CacheManager, a DataSource, or any other Spring-managed object.

(2) Constructing the RegionFactory directly gives you control over its configuration. Alternatively, use a string name that Hibernate’s StrategySelector resolves.

For replacing the EntityManagerFactory bean entirely, define your own LocalContainerEntityManagerFactoryBean:

@Bean
@Primary
public LocalContainerEntityManagerFactoryBean entityManagerFactory(
        DataSource dataSource,
        JpaVendorAdapter vendorAdapter) {

    LocalContainerEntityManagerFactoryBean factory =
        new LocalContainerEntityManagerFactoryBean();           // (1)
    factory.setDataSource(dataSource);
    factory.setPackagesToScan("dev.cinetrack.domain");          // (2)
    factory.setJpaVendorAdapter(vendorAdapter);
    factory.setJpaProperties(hibernateProperties());            // (3)
    return factory;
}

private Properties hibernateProperties() {
    Properties props = new Properties();
    props.setProperty(
        AvailableSettings.HBM2DDL_AUTO, "validate");
    props.setProperty(
        AvailableSettings.DEFAULT_BATCH_FETCH_SIZE, "25");
    return props;
}

(1) Creating the factory bean manually gives you full control over every setting.

(2) Package scanning. Multiple packages are supported; pass a comma-separated string or call setPackagesToScan with varargs.

(3) Properties fed directly into StandardServiceRegistryBuilder. Equivalent to spring.jpa.properties.* but defined in code.

Warning

If you define your own LocalContainerEntityManagerFactoryBean bean, Spring Boot’s auto-configuration backs off entirely due to @ConditionalOnMissingBean(LocalContainerEntityManagerFactoryBean.class). You take full responsibility for the configuration. Don’t forget to set the JpaVendorAdapter or Hibernate won’t be used as the provider.

Tip

To verify what Hibernate configuration is actually in effect (auto-configured values, your overrides, Hibernate defaults), add a SessionFactory @Bean that calls sessionFactory.getProperties() and logs the result at startup. It prints the complete resolved configuration, not just what you explicitly set.

Important

spring.jpa.show-sql=true uses System.out.println internally. It cannot be routed through Logback or Log4j2. For production-controllable SQL logging, set logging.level.org.hibernate.SQL=DEBUG and logging.level.org.hibernate.orm.jdbc.bind=TRACE instead. The latter logs bind parameter values, which show-sql does not.

Spring Boot’s auto-configuration is the foundation. What you build on top of it depends on knowing what Hibernate 7 specifically gives you that older versions didn’t. The internals changed substantially between Hibernate 5 and 7, and some of those changes affect how you write queries and map types.

1.8 Hibernate 7 Highlights

If you’ve been using Hibernate 5 or early 6, Hibernate 7 is a different codebase in several important areas. The changes aren’t surface-level API tweaks. They go into the query engine, the type system, and the specification compliance layer. Knowing what changed tells you which patterns from older books and Stack Overflow answers are still valid and which ones you should discard.

1.8.1 SQM: The New Query Translator

Hibernate 6 replaced the old HQL parser with SQM (Semantic Query Model). Hibernate 7 completes that transition. The old parser is gone.

SQM is an AST-based query translator. When you write an HQL or JPQL query, Hibernate parses it into an SQM tree, validates it against the entity model, and then translates it to SQL. The old parser translated queries directly to SQL strings with limited semantic understanding. SQM knows the full structure of your query before it generates a single SQL character.

What this means in practice:

Better implicit joins. SQM generates correct SQL for path expressions across associations without requiring explicit JOIN clauses in every case. SELECT m FROM Movie m WHERE m.director.nationality = :nat generates an implicit join that the old parser handled inconsistently.

Union and intersection queries. SQM supports UNION, INTERSECT, and EXCEPT in HQL. The old parser did not.

// Both of these work in Hibernate 7. Neither worked cleanly in Hibernate 5.
String union = """
    SELECT m FROM Movie m WHERE m.releaseYear > 2020
    UNION
    SELECT m FROM Movie m WHERE m.rating > 8.5
    """;

TypedQuery<Movie> query = em.createQuery(union, Movie.class); // (1)

(1) SQM parses the UNION, validates both branches against the Movie entity model, and generates the correct SQL UNION with proper column alignment.

Lateral joins. SQM supports LATERAL joins, which allow a derived table to reference columns from preceding tables in the FROM clause. Useful for top-N-per-group queries.

Arithmetic in ORDER BY. You can now use expressions in ORDER BY clauses that reference aggregate results from the SELECT. The old parser rejected these.

1.8.2 The New Type System

Hibernate 5’s type system was a single Type interface hierarchy that conflated Java type handling and JDBC handling in one object. Hibernate 6 split this into two separate hierarchies. Hibernate 7 finishes the work.

JavaType<T> handles everything on the Java side: equality, hashing, mutability checking, conversion between Java representations. Registered in JavaTypeRegistry.

JdbcType handles everything on the JDBC side: which java.sql.Types code to use, how to bind values, how to extract results. Registered in JdbcTypeRegistry.

A BasicType is the combination: one JavaType and one JdbcType. For String mapped to VARCHAR, Hibernate uses StringJavaType (Java side) and VarcharJdbcType (JDBC side).

This split matters when you write custom type mappings. In Hibernate 5, you implemented one UserType that handled both sides. In Hibernate 7, you implement either JavaType (if the issue is Java-side representation) or JdbcType (if the issue is JDBC-side binding/extraction), or both if needed.

@Entity
public class Movie {
    @Id
    private Long id;

    @JdbcTypeCode(SqlTypes.JSON)                    // (1)
    private Map<String, String> metadata;           // (2)

    @Array(length = 10)                             // (3)
    private String[] genres;
}

(1) @JdbcTypeCode tells Hibernate which JdbcType to use. SqlTypes.JSON maps to JsonJdbcType, which uses PostgreSQL’s native JSON column type and Jackson for serialization.

(2) The Map<String, String> is the Java representation. Hibernate uses MapJavaType on the Java side.

(3) @Array maps to a PostgreSQL array column. Hibernate 7’s native array support replaces the need for custom UserType implementations for array mappings.

1.8.3 UUID Improvements

Hibernate 7 improved UUID handling to be database-aware by default. On PostgreSQL, a UUID field mapped with @Id or @Column now uses PostgreSQL’s native uuid column type and binds values via setObject(n, uuid) rather than setString(n, uuid.toString()). This matters for index performance: PostgreSQL’s native UUID type is 16 bytes; a varchar UUID is 36 bytes.

@Entity
public class AppUser {
    @Id
    @UuidGenerator(style = UuidGenerator.Style.TIME) // (1)
    private UUID id;
}

(1) @UuidGenerator replaces @GeneratedValue(generator = "uuid2") from Hibernate 5. Style.TIME generates version 7 UUIDs (time-ordered, monotonically increasing), which are B-tree-friendly. Style.RANDOM generates version 4 UUIDs.

1.8.4 Jakarta Persistence 3.2

Hibernate 7 targets Jakarta Persistence 3.2, which added several features that were previously Hibernate-only:

@ManyToOne(fetch = LAZY) as the spec default recommendation. The spec now explicitly recommends LAZY as the default, matching what Hibernate has always done in practice.

Improved CriteriaBuilder. The 3.2 criteria API adds union(), intersect(), except(), and select() on subqueries. These map directly to SQM’s query model.

EntityManagerFactory.getSchemaManager(). The spec now defines a SchemaManager interface for programmatic schema operations. Hibernate’s implementation wraps SchemaManagementTool.

@Version on LocalDateTime. Version fields for optimistic locking now officially support LocalDateTime in addition to numeric types.

Warning

If you’re migrating from Hibernate 5, the @Type annotation changed significantly. @Type(type = "my.custom.UserType") no longer works. Replace with @Type(MyUserType.class) or, better, with the appropriate @JdbcTypeCode or @JavaType annotation. The Hibernate 6 migration guide covers all cases. Don’t skip it.

Note

SQM’s query model is exposed via org.hibernate.query.sqm.tree. If you’re writing a framework that generates HQL dynamically, you can build the SQM tree directly rather than constructing query strings. It’s lower-level but avoids string concatenation entirely. The Hibernate source code’s own tests are the best documentation for this API.

With Hibernate 7’s internals understood, we can look at the domain model this book uses to demonstrate all of it. CinéTrack is the running example, and its entity map shapes every code sample from here to the final chapter.

1.9 The CinéTrack Domain Model

Every code example in this book comes from CinéTrack, a Letterboxd-inspired film tracking application. Before we go deeper into Hibernate internals, you need a clear picture of the domain so that later examples don’t require you to hold unfamiliar context in your head while learning unfamiliar concepts.

CinéTrack lets users log films they’ve watched, write reviews, build watchlists, and subscribe for additional features. The domain is rich enough to demonstrate complex mapping scenarios but constrained enough that you can hold the whole model in one mental view.

1.9.1 Core Entities

Movie is the central entity. It represents a theatrical film with a title, release year, runtime, and a set of genre tags. A Movie has many WatchLog entries, many Review entries, and can belong to many Watchlist collections. Movies can have associated MediaFile records (posters, trailers).

Series represents a television series. It has many Episode children. The distinction between Movie, Series, and ShortFilm is handled with a table-per-hierarchy inheritance strategy: all three share a media_item table with a media_type discriminator column.

Episode belongs to a Series. It has its own runtime, air date, and episode number. Episodes can be individually logged and reviewed.

ShortFilm is a MediaItem subtype for short-form content under forty minutes. Same fields as Movie, different discriminator value.

AppUser is the user entity. Deliberately named AppUser rather than User to avoid collisions with PostgreSQL’s reserved user keyword. It has a UUID primary key, email, display name, and a @OneToMany to WatchLog.

WatchLog records a single viewing event. It links an AppUser to a Movie, Episode, or ShortFilm with a watched date, optional rewatch flag, and optional rating (0.5 to 5.0 in half-point increments). This is the highest-volume table in the application.

Review is a text review written by an AppUser about a MediaItem. It has a body, a rating, a created timestamp, and a soft-delete flag. Envers audits it.

Watchlist is a named collection of MediaItem entries owned by an AppUser. It’s a @ManyToMany join between AppUser and MediaItem via a watchlist_item join table.

Subscription records a user’s subscription status. It has a type (FREE, PREMIUM), a start date, an optional end date, and a status (ACTIVE, EXPIRED, CANCELLED). @Enumerated(EnumType.STRING) throughout.

MediaFile stores metadata about uploaded files: poster images, trailers, subtitle files. It links to a Movie or Series via a nullable foreign key. The actual file is stored in object storage; this entity holds the S3 key, MIME type, file size, and upload timestamp.

1.9.2 Relationships at a Glance

AppUser ──── WatchLog ──── Movie
  │                   ──── Episode ──── Series
  │                   ──── ShortFilm
  │
  ├──── Review ──── MediaItem (Movie | Series | ShortFilm)
  │
  ├──── Watchlist ──── (many-to-many) ──── MediaItem
  │
  └──── Subscription

Movie ──── MediaFile (one-to-many)
Series ──── MediaFile (one-to-many)
Series ──── Episode (one-to-many)

The inheritance hierarchy:

MediaItem (abstract, @MappedSuperclass behavior via single-table inheritance)
  ├── Movie
  ├── Series
  └── ShortFilm

1.9.3 Why This Domain

The CinéTrack model was chosen to exercise specific Hibernate scenarios that generic to-do apps don’t cover.

WatchLog is high-volume and needs careful attention to batch inserts, bulk updates, and second-level cache invalidation. Chapter 7 is one long WatchLog performance story.

The MediaItem inheritance hierarchy gives us a realistic table-per-hierarchy mapping with a discriminator column. Chapter 4 covers the trade-offs between joined, single-table, and table-per-class strategies using this exact model.

The Review entity’s audit requirement (every edit must be traceable) introduces Envers naturally in Chapter 9, where we enable it with three annotations and zero application code.

The Watchlist many-to-many association is the canonical source of ordering problems, orphan removal edge cases, and flush-order surprises. We’ll break it in at least three different ways before fixing it correctly.

Subscription with its enum status and date range is the test bed for Hibernate 7’s improved BasicType system and JSON column support.

1.9.4 The Starting Entity

Here’s Movie as we’ll map it in Chapter 2:

@Entity
@Table(name = "movie")
public class Movie {

    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE,      // (1)
        generator = "movie_seq")
    @SequenceGenerator(name = "movie_seq",
        sequenceName = "movie_id_seq", allocationSize = 50)
    private Long id;

    @Column(nullable = false, length = 500)
    private String title;

    @Column(name = "release_year", nullable = false)
    private int releaseYear;

    @Column(name = "runtime_minutes")
    private Integer runtimeMinutes;

    @Enumerated(EnumType.STRING)
    @Column(nullable = false)
    private MediaType mediaType;                             // (2)

    @Column(name = "average_rating",
        precision = 3, scale = 1)
    private BigDecimal averageRating;

    @OneToMany(mappedBy = "movie",
        cascade = CascadeType.ALL,
        orphanRemoval = true,
        fetch = FetchType.LAZY)                             // (3)
    @BatchSize(size = 25)
    private List<WatchLog> watchLogs = new ArrayList<>();

    // getters, setters, or Lombok @Getter @Setter
}

(1) Sequence-based ID generation with allocationSize = 50. Hibernate pre-allocates 50 IDs per sequence call, reducing database roundtrips on bulk inserts. The default allocationSize = 1 is a performance trap.

(2) Discriminator stored as a string. EnumType.STRING over EnumType.ORDINAL always. Ordinal breaks when you reorder the enum.

(3) FetchType.LAZY is the default for collections but writing it explicitly makes the intent clear to the next person reading this. @BatchSize(size = 25) overrides the global default_batch_fetch_size for this specific collection.

Important

Every @OneToMany in this book is lazy. Every @ManyToOne is lazy. Hibernate’s default for @ManyToOne is EAGER, which is a spec-mandated default that Hibernate itself recommends against. We override it everywhere. If you see fetch = FetchType.EAGER in production code and it’s not on a @ManyToOne to a value object, that’s a bug waiting to manifest under load.

Tip

The full CinéTrack schema, including all Flyway migrations, is in the book’s companion repository at github.com/umur/hibernate-example. Each chapter’s code is in its own directory: chapter-01/, chapter-02/, and so on.

With the domain model established, we have a concrete foundation for everything that follows. Before moving to Chapter 2’s deep dive into entity mapping, let’s close out this chapter by looking at the mistakes that most commonly derail developers who are new to Hibernate’s internals.

1.10 Common Mistakes

  1. Treating SessionFactory creation as cheap. Creating a SessionFactory inside a request handler, a test setup method that runs before every test, or a Spring @Scope("request") bean. The factory takes seconds to build. Build it once, cache it forever.

  2. Programming to the JPA spec while paying the cost of Hibernate. Using only EntityManager and JPQL, avoiding @BatchSize, Envers, and HQL extensions “for portability.” You’re already on Hibernate. Use it.

  3. Leaving open-in-view enabled. Spring Boot’s default is true. OpenSessionInViewFilter holds a database connection open for the entire HTTP request, including template rendering and serialization. Under load, this exhausts your connection pool. Set spring.jpa.open-in-view=false and initialize the associations you need inside the transaction.

  4. Using @GeneratedValue with allocationSize = 1. The default allocationSize for SEQUENCE strategy is 1 in many configurations, meaning one database roundtrip per insert. Set allocationSize = 50 (or higher) to let Hibernate pre-allocate IDs from the sequence in batches.

  5. Using EnumType.ORDINAL for @Enumerated. The ordinal is the position in the enum declaration. Reorder the enum and every existing row has the wrong value. Always use EnumType.STRING.

  6. Skipping the Hibernate 6/7 migration guide when coming from Hibernate 5. The @Type annotation changed. The type system changed. The query translator changed. Reading the migration guide takes two hours. Not reading it costs you two days of debugging.

  7. Logging SQL with spring.jpa.show-sql=true in production. This bypasses your logging framework entirely, writes to stdout, and can’t be turned off without redeployment. Use logging.level.org.hibernate.SQL=DEBUG instead.

  8. Ignoring statistics. SessionFactory.getStatistics() tells you cache hit rates, query counts, connection checkout times. Most teams never enable it and fly blind on Hibernate performance. Enable it in staging at minimum.

1.11 Summary

  • JPA is a specification, not an implementation. It guarantees entity lifecycle semantics, persistence context identity, JPQL query portability, and transaction integration. Nothing more.
  • Hibernate extends JPA with batch fetching, second-level caching, Envers auditing, session-level filters, multitenancy strategies, and HQL extensions. Don’t avoid these features for the sake of portability you’ll never exercise.
  • SessionFactory and EntityManagerFactory are the same object in Hibernate 7. One instance per application, fully thread-safe, expensive to create, cheap to use.
  • The boot sequence is MetadataSourcesMetadataBuilderMetadataSessionFactoryBuilderSessionFactory. Spring Boot’s LocalContainerEntityManagerFactoryBean drives this sequence automatically from your application.yaml configuration.
  • ServiceRegistry is a two-level hierarchy: BootstrapServiceRegistry for classloading and integrators, StandardServiceRegistry for runtime services. Custom services plug in via HibernatePropertiesCustomizer or EntityManagerFactoryBuilderCustomizer in Spring Boot applications.
  • Hibernate 7 shipped SQM, a full AST-based query translator that replaces the old HQL parser. It adds UNION, INTERSECT, lateral joins, and more. It also completed the JavaType/JdbcType split, making custom type mappings cleaner and more precise.
  • The CinéTrack domain model covers Movie, AppUser, WatchLog, Review, Watchlist, Subscription, MediaFile, Series, Episode, and ShortFilm. Every example in this book comes from this model.
  • Next: Chapter 2 goes into entity mapping in depth: identifier strategies, basic type mappings, the new @JdbcTypeCode and @Array annotations, embedded types with @Embeddable, and the inheritance strategies available in Hibernate 7. The CinéTrack MediaItem hierarchy is our test case for all of it.
Buy the full book on Leanpub Google Play Books