← Back to Blog

Your @Scheduled Job Is Running Five Times

@Scheduled and cron run per instance, so five replicas run your nightly job five times. Leader election, ShedLock, and why the job must still be idempotent.

The 2am job that fired five times

The job was boring and it worked. Every night at 2am a @Scheduled method swept the day's usage, generated invoices, and emailed each customer a receipt. It ran on our single application instance for a year without anyone thinking about it.

Then we got traffic, and one instance became five behind a load balancer. The next morning support was on fire: every customer had received five identical receipts, and the ones we billed had been charged five times. The job had not changed a single line. We had just run five copies of it, each convinced it was the only one, all waking at 2am to do the same work against the same database.

Scheduling is per process, not per cluster

This is the part that catches people, because the framework hides it so well. @Scheduled in Spring, its equivalent in any other framework, a cron entry baked into a container image: every one of them schedules work inside one process. The timer lives in that JVM and fires in that JVM. It has no concept of a cluster and no way to ask whether another instance is about to do the identical thing.

So the behavior you had on one box was never the behavior you designed. It was the behavior you got for free because there happened to be exactly one process, and that one happy accident quietly became a load-bearing assumption. The day you scaled horizontally for throughput, you also multiplied every scheduled job by your replica count, and nobody wrote a ticket for that.

The fixes people try first, and where they leak

The first instinct is to just run the job on one instance, which sounds simple until you ask which one. Pin it to a specific pod with an environment flag and that pod becomes a single point of failure for every scheduled task, and during a rolling deploy you get a window with either zero schedulers or two. Keep a row in the database that says is_running = true and you have invented a lock with a race condition in the gap between the read and the write. Each of these feels like a fix, and each one is the same bug wearing a different hat: there is still no authority deciding who runs.

Leader election and ShedLock

The real options give the cluster a way to agree. Leader election picks one instance to be in charge, and only the leader runs scheduled work; if it dies, the others elect a new one. The lighter-weight tool most Spring teams reach for is ShedLock, which wraps a scheduled method in a lock backed by a shared store, so only one instance acquires it for a given run.

@Scheduled(cron = "0 0 2 * * *")
@ScheduledLock(name = "nightlyBilling",
               lockAtMostFor = "15m",
               lockAtLeastFor = "1m")
public void runNightlyBilling() {
    // only one instance gets here per run
}

Five instances wake at 2am and all five try to take the nightlyBilling lock. One wins and runs. The other four see the lock held and skip this tick. The duplicate receipts stop, and for most periodic jobs this is the whole answer, in a handful of lines.

A lock is not exactly-once

Here is where I have to be honest, because I have written a whole post on why a distributed lock does not actually guarantee mutual exclusion. ShedLock leans on two timeouts, and the space between them is the danger. lockAtMostFor is a safety net: if the instance holding the lock dies without releasing it, the lock expires after that long so the job is not blocked forever. Set it too short and a slow run can outlive its own lock while it is still working, at which point another instance sees an expired lock and starts a second copy.

A long garbage collection pause does the same thing without anyone dying. The holder freezes, the lock ages out, and a sibling picks it up. Now two instances are running the job you thought could only run once. The lock makes a double run rare. It does not make it impossible, and rare is not the same as never when the job moves money.

Make the job safe to run twice

Because the lock cannot promise once, the job itself has to survive running twice without doing harm. That is idempotency, and it is the half of the solution that actually protects you. An invoice run should claim each unbilled item by id and mark it billed in the same transaction, so a second pass finds nothing left to do. A receipt send should record that the email for invoice N went out, and check that record before it sends again.

The lock and the idempotent body are not competing answers. They are two layers. The lock keeps the normal case to a single run and spares you a hundred instances doing redundant work. Idempotency catches the rare edge where the lock slips, so the worst case is wasted effort instead of a double charge. You want both, and if you can only have one, keep idempotency.

When to move scheduling out of the app

At some point the cleanest move is to stop letting the application schedule itself at all. A single external trigger, a Kubernetes CronJob or a dedicated scheduler service, fires once for the whole cluster and drops one message on a queue. Your instances become plain workers competing to process that one message, which is a problem queues already solve well, and there is no per-process timer left to multiply.

For anything with steps, retries, and state that has to survive a crash, a durable execution engine earns its place, because hand-rolling reliable multi-step scheduling on top of cron is how you rebuild a workflow engine by accident. The point holds either way: scheduling becomes one thing the cluster does, not five things each instance does on its own.

What I actually do

For a simple periodic job inside a Spring service, I reach for ShedLock and I still write the job body to be idempotent, because the lock is a frequency control and the idempotency is the correctness guarantee. For anything heavier, I move the trigger out of the app to a single external scheduler that enqueues work for ordinary workers to pick up.

What I never do anymore is let a @Scheduled method run unguarded in a service that has more than one instance, because that is not a scheduled job. It is a scheduled job per replica, and the database does not care that you only meant to run it once. The fix is cheap. The 2am receipts were not.

Share
X LinkedIn HN
UI

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

👁 0 5 min read

Comments (0)