← Back to Blog

// Posted by Umur Inan

// Category Backend

// Posted on March 28, 2026

Connection Pools: The Thing You Never Think About Until Production Burns

Connection pools sit quietly until they break. Here is what happens when they fail, the warning signs to watch, and how to catch it before production burns.

By Umur Inan · 9 min read

I want to tell you about the worst production incident I ever caused. It was a Tuesday afternoon. Traffic was normal. No deploys in the last four hours. And then, over the span of about ninety seconds, every API endpoint in our Spring Boot service started returning 500 errors. Not some of them. All of them.

The logs were full of the same exception: SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms. Every single request was waiting for a database connection and not getting one. The pool was exhausted.

It took us forty minutes to figure out what happened. A seemingly harmless query in a rarely-used admin endpoint had started running long. Not crazy long. About eight seconds per call. But someone had kicked off a batch operation that hit that endpoint in a tight loop, and each request held a connection for the full eight seconds. Within a minute, all ten connections in the pool were occupied by slow queries. Every other request in the entire application, login, feed, notifications, everything, was stuck waiting in line behind them.

Ten connections brought down an entire service. That's the thing about connection pools. You never think about them until they ruin your afternoon.

What a Connection Pool Actually Does

If you already know this, skip ahead. But I've been surprised how many backend engineers treat the connection pool as a black box they never open.

Opening a database connection is expensive. There's a TCP handshake. TLS negotiation if you're using SSL. Authentication. Protocol negotiation. On a typical PostgreSQL setup, creating a new connection takes somewhere between 20 and 100 milliseconds. That doesn't sound like much, but if every HTTP request opens and closes a database connection, you're adding that overhead to every single request. At a few hundred requests per second, it adds up fast.

A connection pool solves this by keeping a set of connections open and reusing them. Your application borrows a connection from the pool, runs a query, and returns it. The next request borrows the same connection. No setup cost. No teardown cost. It's one of those abstractions that works so well you forget it exists.

Until it doesn't.

The Default Configuration Is Almost Never Right

Here's the thing that bit us. We were running HikariCP, which is the default connection pool in Spring Boot, with mostly default settings. The defaults are reasonable for getting started, but they make assumptions about your workload that might not hold.

The default maximum pool size in HikariCP is 10. Ten connections. For a lot of applications, that's actually fine. The math is simpler than people think. If your average query takes 5 milliseconds and you have 10 connections, you can theoretically handle 2,000 queries per second. Most CRUD applications don't come close to that.

But that math only works if your queries are consistently fast. The moment you have one slow query holding a connection for seconds instead of milliseconds, the entire equation breaks. That one connection isn't serving 200 queries per second anymore. It's serving one query every eight seconds. And you only have ten connections total.

The formula that actually matters is:

connections_needed = concurrent_requests × average_hold_time / average_query_time

If your average query takes 5ms but some requests hold connections for 8 seconds, you need to account for that. And most people don't.

The Silent Killers

Pool exhaustion is dramatic. It's the fire alarm. But there are quieter ways your connection pool can hurt you that are harder to detect.

Connection leaks

This is the classic one. You borrow a connection from the pool but never return it. In raw JDBC, this happens when you forget to close a connection in a finally block. In Spring, it's less common because the framework manages the connection lifecycle, but it still happens. Transaction methods that throw an exception in a way that bypasses the transaction manager. Manual DataSource access that doesn't go through the template. Test code that opens connections without proper cleanup.

HikariCP has a leakDetectionThreshold property that logs a warning when a connection has been out of the pool for longer than a threshold. I set this to 30 seconds on every project now. It's saved me more than once.

spring.datasource.hikari.leak-detection-threshold=30000

If you see those warnings in your logs, treat them as bugs, not noise.

Stale connections

Connections can go stale. The database restarts. A network blip kills the TCP connection. A firewall closes idle connections after a timeout. Your pool still thinks the connection is valid, but when your application tries to use it, it fails.

HikariCP handles this reasonably well out of the box with its connection validation, but not every pool does. And if you're behind a connection proxy like PgBouncer, the behavior can be different. I've seen setups where PgBouncer evicts idle connections after 5 minutes, but the application pool holds connections for 30 minutes. Every few minutes, a request would randomly fail because it grabbed a dead connection.

The fix is making sure your pool's idle timeout is shorter than whatever sits between it and the database. Sounds obvious in retrospect. It always does.

Long transactions

This is the one that got us. A connection is held for the entire duration of a transaction. If you have a @Transactional method that calls an external API, that connection is occupied the entire time the HTTP call is in flight. Your query might have taken 2 milliseconds, but the connection is held for 800 milliseconds while you wait for a third-party service to respond.

I've seen this pattern cause pool exhaustion in services that had plenty of capacity for their database load. The database was fine. The queries were fast. But the connections were tied up waiting on network calls that had nothing to do with the database.

The rule is simple: never do I/O inside a transaction unless you absolutely have to. Read from the database. Close the transaction. Call the external service. Open a new transaction if you need to write the result back. Yes, this means you lose atomicity across the whole operation. That's a trade-off you need to think about explicitly, not one to hit by accident while holding connections hostage.

Monitoring That Actually Helps

After our incident, we added monitoring that I now consider mandatory for any production service with a database.

Active connections. How many connections in the pool are currently in use. If this number is consistently close to your maximum, you're living on the edge. If it spikes and stays high, something is holding connections too long.

Pending requests. How many threads are waiting for a connection. This should normally be zero. If it's not zero, you're already in trouble. If it's growing, you're about to page someone.

Connection acquisition time. How long does it take to get a connection from the pool. This should be under a millisecond. If it's in the hundreds of milliseconds, the pool is under pressure. If it's hitting your connection timeout, requests are failing.

Connection usage time. How long each connection is held before being returned. This is your best indicator for spotting long-running transactions or connection leaks before they become incidents.

HikariCP exposes all of these through JMX and Micrometer. If you're using Spring Boot with Actuator and Prometheus, it's about five minutes of configuration.

management.metrics.enable.hikaricp=true

Set up alerts on pending requests greater than zero and connection acquisition time above 100ms. These two metrics alone would have caught our incident before it became an outage.

PgBouncer and Connection Proxies

Once you scale to multiple application instances, you'll probably end up looking at a connection proxy like PgBouncer. The idea is sound: instead of each application instance maintaining its own pool of connections to the database, they all connect to PgBouncer, which maintains a smaller pool of actual database connections and multiplexes.

This works well, but it adds another layer of pool management with its own configuration and its own failure modes. Now you have two pools. The application pool and the proxy pool. Each has its own maximum sizes, timeouts, and eviction policies. And they need to be configured in harmony.

I've seen setups where the application pool holds connections longer than PgBouncer's server idle timeout, causing random connection resets. I've seen transaction-mode PgBouncer break prepared statements because the connection you prepared the statement on isn't the connection that executes it. I've seen PgBouncer run out of connections because the application pool was sized too large.

A proxy doesn't eliminate the pool sizing problem. It moves it. You still need to understand the math. You just have two layers of math now.

What I Do Now

Every Spring Boot project I start now gets the same connection pool configuration on day one. Not because I've figured out the perfect settings, but because the defaults leave too many things unmonitored.

# Pool sizing (start conservative)
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=5

# Timeouts (fail fast rather than hang)
spring.datasource.hikari.connection-timeout=5000
spring.datasource.hikari.validation-timeout=3000

# Leak detection
spring.datasource.hikari.leak-detection-threshold=30000

# Idle timeout
spring.datasource.hikari.idle-timeout=300000
spring.datasource.hikari.max-lifetime=900000

The connection timeout is the important one. The default is 30 seconds. That means a request will hang for 30 seconds waiting for a connection before failing. In most applications, if you can't get a connection in 5 seconds, something is already wrong, and the user isn't going to wait 30 seconds anyway. Fail fast. Return a useful error and let the monitoring tell you why.

I also review every @Transactional method for external calls. If a transactional method calls an HTTP endpoint, sends an email, publishes to a message queue, or does any non-database I/O, that's a code review comment. Every time. No exceptions.

The Boring Stuff Matters

Connection pools aren't exciting. Nobody builds a career talking about HikariCP configuration at conferences. There's no trending blog post about maximum-pool-size settings. It's the kind of infrastructure that only gets attention when it breaks.

But here's the thing. When it breaks, it takes everything with it. Not one endpoint. Not one feature. Everything. Because every part of your application that touches the database shares the same pool. A slow query in your admin panel can bring down your user-facing API. A connection leak in a background job can kill your checkout flow.

The pool is a shared resource, and shared resources are where the scariest production incidents live. They're single points of failure hiding in plain sight.

Spend an hour understanding your connection pool configuration. Set up the monitoring. Add the leak detection. Review your transaction boundaries. It's the most boring hour you'll spend this month, and it might save you from the worst on-call night of your year.

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

GitHub LinkedIn Email

👁 0 9 min read

Connection Pools: The Thing You Never Think About Until Production Burns

What a Connection Pool Actually Does

The Default Configuration Is Almost Never Right

More Connections Is Not the Answer

The Silent Killers

Connection leaks

Stale connections

Long transactions

Monitoring That Actually Helps

PgBouncer and Connection Proxies

What I Do Now

The Boring Stuff Matters

Comments (0)

Connection Pools: The Thing You Never Think About Until Production Burns

What a Connection Pool Actually Does

The Default Configuration Is Almost Never Right

More Connections Is Not the Answer

The Silent Killers

Connection leaks

Stale connections

Long transactions

Monitoring That Actually Helps

PgBouncer and Connection Proxies

What I Do Now

The Boring Stuff Matters

Comments (0)

Related Posts

Deepen Your Understanding