Distributed Systems Topic

Distributed Systems

A distributed system is a system in which the failure of a computer you didn't know existed can render your own computer unusable. These posts cover the realities of replication lag, distributed transactions, retry storms, and the patterns engineers actually use instead of two-phase commit.

Backend

Posted on Jun 12, 2026

Your @Scheduled Job Is Running Five Times

@Scheduled and cron run per instance, so five replicas run your nightly job five times. Leader election, ShedLock, and why the job must still be idempotent.

Read more Backend

Posted on May 28, 2026

Why Your Distributed Lock Doesn't Lock

Distributed locks don't provide mutual exclusion. Fencing tokens, GC pauses, clock drift, and why the lock you wrote is actually a polite hint at best.

Read more Backend

Posted on May 24, 2026

The Kafka Consumer Group That Stopped Consuming

A Kafka consumer group can stop consuming while every metric looks healthy. Rebalance storms, max poll timeouts, stuck partitions, and how to actually diagnose.

Read more Backend

Posted on May 15, 2026

Sagas Are Not Transactions

Sagas replace ACID transactions with compensation actions, not rollbacks. Intermediate states are visible to other services, and compensations can fail too.

Read more Backend

Posted on May 9, 2026

Your Replica Is Lying To You

Read replicas trade staleness for throughput. Replication lag, read-your-writes, and the staleness window nobody tracks: these are where things actually break.

Read more Backend

Posted on Apr 26, 2026

CQRS Sounds Fancy Until You Have to Debug It

CQRS separates reads from writes but not bugs from confusion about which side caused them. Here is when the pattern helps and when it just adds complexity.

Read more Backend

Posted on Apr 19, 2026

The Thundering Herd Problem

Cache stampedes, retry storms, reconnect floods: three failure modes with the same root cause. Synchronized behavior under load amplifies failures every time.

Read more Backend

Posted on Apr 16, 2026

Database Partitioning: The Decision You Can't Undo

Range vs hash partitioning, hot spots, and the re-partitioning trap. Partitioning looks like a scaling win until you find out you cannot undo the choice.

Read more Backend

Posted on Apr 14, 2026

Webhook Reliability: The Lost Art

Webhooks break predictably: duplicate events, missed deliveries, retry storms. Here is what it actually takes to build receivers that hold up in production.

Read more Backend

Posted on Apr 8, 2026

The Outbox Pattern: Reliable Events Without Two-Phase Commit

Reliable event publishing alongside database writes is harder than it looks. The transactional outbox pattern solves it without distributed transactions.

Read more Backend

Posted on Apr 7, 2026

Event Sourcing Sounds Better Than It Is

Event sourcing promises auditability, time travel, and decoupled systems. The operational complexity arrives later, and most teams are not ready for it.

Read more Backend

Posted on Apr 6, 2026

Rate Limiting Is Harder Than It Looks

Token bucket, sliding window, fixed counter: rate limiting algorithms all sound simple until you actually implement them correctly across distributed systems.

Read more Backend

Posted on Apr 5, 2026

Distributed Transactions Are a Lie

Why two-phase commit fails in production distributed systems, and what engineers actually use instead: sagas, the outbox pattern, and eventual consistency.

Read more Backend

Posted on Apr 8, 2026 intermediate

The Transactional Outbox Pattern in Spring Boot

Build the transactional outbox pattern in Spring Boot with Kafka: atomic writes, a polling relay, and full integration tests with Testcontainers, all working.