← Back to Blog

// Posted by Umur Inan

// Category Backend

// Posted on March 30, 2026

Caching Is Easy Until It Isn't

Redis, in-memory, CDN: caching feels simple until invalidation ruins your week. Here's how each caching layer bites you and what I learned the hard way.

By Umur Inan · 9 min read

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors. You've heard the joke. I used to laugh at it. Then I spent a weekend debugging why 15% of our users were seeing someone else's profile data, and I stopped laughing.

Caching is one of those things that feels obvious. Data is slow to fetch. Store it somewhere fast. Serve the fast copy. Done. You can explain it to a non-technical person in thirty seconds. The concept is trivially simple. The implementation will humble you.

I've worked with three main caching layers over the years: in-memory caches, Redis, and CDNs. Each one solved a real problem. Each one also created problems I didn't see coming. Here's what I've learned, mostly by getting burned.

In-Memory Caching: The One That Feels Free

The first cache most developers reach for is the in-memory cache. A HashMap in Java, a dictionary in Python, a Map in JavaScript. Or something slightly fancier like Guava's LoadingCache or Caffeine in Spring Boot. You keep frequently accessed data in the application's memory. No network hop, no serialization, sub-microsecond reads. It feels like cheating.

@Bean
public CacheManager cacheManager() {
    CaffeineCacheManager manager = new CaffeineCacheManager();
    manager.setCaffeine(Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofMinutes(5)));
    return manager;
}

@Cacheable("users")
public UserDto getUser(Long userId) {
    return userRepository.findById(userId)
        .map(this::toDto)
        .orElseThrow();
}

Simple. Fast. And it works great until you have more than one application instance.

This is where in-memory caching first bites you. If you have three instances behind a load balancer, each instance has its own cache. User updates their profile on instance A. Instance A invalidates its cache. Instances B and C still serve the old data. The user refreshes the page, hits instance B, and sees their old profile picture. They update again. Hit instance C. Old picture again. They submit a support ticket saying the app is broken.

You can work around this with sticky sessions, routing the same user to the same instance. But sticky sessions have their own problems. Uneven load distribution. Instance restarts losing all the sessions stuck to that instance. It's a band-aid, not a fix.

The other thing that gets you with in-memory caches is memory. Obvious in retrospect, but easy to miss in practice. You start caching user profiles. Then you cache product listings. Then someone adds caching to the search results. Each entry is small, but you have 200,000 of them, and suddenly your application's heap is 2 GB larger than expected. GC pauses get longer. The application gets slower. You added caching to make things faster and made them slower. I've done this.

In-memory caching works for small, read-heavy datasets that don't change often and where staleness is acceptable. Configuration data. Feature flags. Reference data like country lists or currency codes. For anything else, you need a shared cache.

Redis: The One That Solves Everything (Until It Doesn't)

Redis is the default answer to "we need a cache." It sits outside your application, so all instances share the same data. It's fast. Sub-millisecond reads over the network. It supports expiration, pub/sub, data structures. It feels like the right tool for almost everything.

And honestly, for simple key-value caching, it is. Put data in. Get data out. Set a TTL. Let it expire. This works for a surprising number of use cases.

Problems start when you need to invalidate.

The Stale Data Incident

We had a social feature where users could update their display name. That name showed up everywhere: posts, comments, profile cards, notifications. We cached the user object in Redis with a 10-minute TTL. When a user updated their name, we invalidated the cache entry. Simple.

Except we had a race condition. The sequence went like this:

Request A reads user from database (old name)
User updates their name via Request B
Request B invalidates the cache
Request A writes the old data to the cache

Request A started before the update but finished after the invalidation. It wrote stale data into the cache, and that stale data sat there for up to 10 minutes. The user changed their name, saw it update, then watched it revert a second later. From their perspective, the app was broken.

The fix was to use a cache-aside pattern with versioning. Every cache entry included a version number that matched the database. On write, we'd only update the cache if the version was newer. But this added complexity to every cache interaction. The "simple" cache wasn't simple anymore.

The Thundering Herd

A popular item's cache expires. A thousand requests hit the endpoint at roughly the same time. The cache is empty, so all thousand requests go to the database. The database buckles under the sudden load. Queries that normally take 5ms now take 2 seconds because every connection is busy. The whole application slows down, not just the one endpoint.

This is the thundering herd problem, and it's surprisingly easy to trigger. A popular API endpoint with a 5-minute TTL will experience this every 5 minutes during peak traffic. Not a theoretical concern. A regular occurrence.

The standard fix is cache stampede protection. Only one request actually fetches from the database. The rest wait for that one request to populate the cache, then read from it. In Spring, you can do this with the sync parameter:

@Cacheable(value = "products", sync = true)
public ProductDto getProduct(Long productId) {
    return productRepository.findById(productId)
        .map(this::toDto)
        .orElseThrow();
}

Another approach is to refresh the cache before it expires. If the TTL is 5 minutes, start a background refresh at 4 minutes. The old data is still served while the new data loads. No gap. No herd. But now you have background threads to manage and you need to handle the case where the refresh fails. Nothing is free.

Redis Goes Down

Redis is fast and reliable, but it is still a network service. It can go down. The connection can drop. The instance can run out of memory. And when it does, you need to decide: does your application fail, or does it fall back to the database?

A lot of codebases I've seen treat a Redis failure as a hard failure. Cache miss throws an exception, exception propagates up, user gets a 500 error. The cache was supposed to improve performance, but it became a single point of failure. Your application now has worse availability than it would have had without caching at all.

Every cache read should have a fallback. If Redis is down, go to the database. It'll be slower, but it'll work. It sounds obvious, but it means wrapping every cache interaction in error handling:

public UserDto getUser(Long userId) {
    try {
        UserDto cached = redisTemplate.opsForValue().get("user:" + userId);
        if (cached != null) return cached;
    } catch (Exception e) {
        log.warn("Redis read failed, falling back to database", e);
    }
    UserDto user = userRepository.findById(userId)
        .map(this::toDto)
        .orElseThrow();
    try {
        redisTemplate.opsForValue().set("user:" + userId, user,
            Duration.ofMinutes(10));
    } catch (Exception e) {
        log.warn("Redis write failed", e);
    }
    return user;
}

It's not pretty. But it's the difference between "the app is slow" and "the app is down."

CDN Caching: The One You Can't Take Back

CDN caching is a different beast because you don't control the cache. You tell Cloudflare or CloudFront to cache a response for an hour, and it does. On hundreds of edge nodes around the world. When you realize the cached response is wrong, you can't just delete a key from Redis. You have to issue an invalidation request and wait for it to propagate across a global network. That takes time. Sometimes minutes. Sometimes longer.

I once shipped a broken API response that got cached by our CDN with a 24-hour Cache-Control header. We fixed the bug within twenty minutes, but a third of our users kept seeing the broken response for hours because their nearest edge node hadn't picked up the invalidation yet. We ended up deploying the fix under a different URL path and updating the client to call the new path. Ugly, but it worked immediately.

The rules I follow for CDN caching now:

Never cache personalized data at the CDN layer. User-specific responses should have Cache-Control: private or no-store. If a CDN caches a response with one user's data and serves it to another user, that's not a bug. That's a data leak.

Start with short TTLs. You can always increase them. You can't easily shorten them for responses already cached worldwide. I start with 60 seconds for API responses and increase only after I've measured the hit rate and confirmed the data changes infrequently.

Use versioned URLs for static assets. Instead of caching app.js for a year and hoping invalidation works, serve app.v2.3.1.js or app.abc123.js with a hash in the filename. New deploy, new filename, new cache entry. The old version expires naturally. No invalidation needed. This is the one caching strategy I've never seen go wrong.

Always have a way to bust the cache in an emergency. Know your CDN's invalidation API. Have the command ready. Test it before you need it. When production is serving stale data to millions of users, you don't want to be reading the Cloudflare docs for the first time.

The Invalidation Problem

Every caching layer eventually comes down to the same question: when do you throw away the cached data?

TTL-based expiration is the simplest approach. Set it and forget it. The data might be stale for up to N minutes, and you accept that. For a lot of use cases, this is fine. Does it really matter if the product listing is 30 seconds old? Probably not.

But when staleness matters, like showing a user their own updated data, TTL isn't enough. You need active invalidation. And active invalidation is where caching goes from "easy" to "I need to draw a diagram to understand the state machine."

The problem is that invalidation requires you to know what to invalidate. User updates their profile? Invalidate the user cache entry. Easy. User updates their profile, and that profile data is embedded in four different caches: post responses, comment responses, notification responses, and a leaderboard. Now you need to invalidate five different cache keys from one write operation. Miss one and you have stale data somewhere.

Event-driven invalidation helps. User update publishes an event. Each cache consumer listens for events relevant to its data and invalidates accordingly. But now you have eventual consistency between the write and the invalidation. And you need to handle event delivery failures. And you need to make sure every new cache consumer remembers to subscribe to the right events. One missed subscription and you have a cache that never invalidates.

There's no clean answer here. Every approach trades complexity for correctness or performance for simplicity. The best advice I have is: start without caching. Measure. Find the actual bottleneck. Cache only that specific thing. Use the shortest TTL you can tolerate. And when someone suggests caching something "just in case it's slow later," push back. Every cache entry is a liability. It's stale data waiting to happen. Only take on that liability when the performance data justifies it.

What I Do Now

After enough invalidation incidents, I've settled on a set of rules that have kept me out of trouble. Mostly.

Cache the smallest thing possible. Don't cache an entire API response if you can cache the expensive database query result. The smaller the cached unit, the easier it is to invalidate precisely.

Always set a TTL. Even if you have active invalidation. The TTL is your safety net. If your invalidation logic has a bug, the stale data will at least expire eventually. Without a TTL, a missed invalidation means stale data forever.

Log cache hits and misses. If your hit rate is below 80%, the cache isn't helping much and the complexity isn't worth it. If your hit rate is 99.9%, you might be caching too aggressively and serving stale data more often than you think.

Treat the cache as optional. Your application should work without it. Slower, sure. But it should work. If removing your cache layer causes a complete outage, you've built a dependency, not a performance optimization.

Never cache errors. If your database query fails and you cache the error response, every subsequent request gets the error from cache instead of retrying the database. I've seen this turn a 10-second database blip into 5 minutes of errors. Negative caching has its uses, but it needs short TTLs and explicit intent.

Caching is one of those things that gets harder the more you learn about it. The first time you add a cache and see response times drop from 200ms to 5ms, it feels like magic. The tenth time, when you're debugging why one user out of thousands sees yesterday's data, it feels like a curse. Each feeling is accurate. The trick is knowing which situations call for the magic and being ready for the curse when it shows up.

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

GitHub LinkedIn Email

👁 0 9 min read