← Back to Blog

// Posted by Umur Inan

// Category Backend

// Posted on May 4, 2026

Your Async Code Is Still Single-Threaded

Async lets one thread do more I/O. It does not let one thread do more CPU. Most async-related performance disappointments come from confusing those two.

By Umur Inan · 6 min read

The pitch was that switching to Node.js would give us massive scalability for free. We had a Python service handling 200 requests per second on eight instances. After the rewrite, the same workload would run on two Node instances. Async would magically make it faster. Six months later, the Node service was running on twelve instances and falling over. The same database queries were slower. The same JSON serialization was slower. Nothing about the Node version was actually better, and async had not given us anything resembling the throughput we expected.

The problem was not Node. The problem was that we had confused two different things that both get called "async," and we had built our scaling argument on top of that confusion. Once you actually look at what async does and does not do, the surprise stops being a surprise.

The Word That Means Two Different Things

"Async" gets used to mean two completely different things in the same sentence by the same engineer. The first meaning is "this code does not block the calling thread while waiting for I/O." The second meaning is "this code runs in parallel with other code." Those are not the same.

Non-blocking I/O is what async/await syntax actually does in JavaScript, Python, C#, and Rust. You hand control back to the event loop while you wait for a network response. Other tasks on the same thread can run during that wait. The event loop wakes you back up when the response arrives.

Actual parallelism is what threads, workers, or multiple processes do. They run on different CPU cores at the same time. Async/await does not do this. Not on a single thread. Not on a single process. Sprinkling await in front of every function call does not magically use more cores.

What Async Actually Does

When you write await fetchUser(id) in Node.js, the runtime does roughly this:

Start the fetch operation, handing off to the OS or to a network library.
Mark this function as suspended.
Look for other ready tasks in the event loop queue.
Run those until the fetch completes.
Resume this function with the result.

There is one thread doing all of this. The thread switches between tasks at await points. If no task is awaiting anything, the thread runs straight through. If many tasks are all awaiting I/O, the thread is idle waiting for any of them to be ready.

This is fast for I/O-heavy work because the thread is not blocked on slow operations. It can do work for hundreds of pending requests in the time it would take a synchronous version to wait for one. It is not fast for CPU work, because there is still only one thread running CPU work.

Where the Mistake Lives

The most common mistake I see is engineers who treat async as a generic performance tool without checking what their work actually is.

// Looks fine
async function processBatch(items) {
    for (const item of items) {
        await transformItem(item);
        await saveItem(item);
    }
}

If transformItem is CPU-bound (parsing, image manipulation, hashing, JSON manipulation on large payloads), you are running CPU work serially with extra overhead from the async machinery. There is no parallelism happening. The loop is doing the same thing as a synchronous loop, but slower, because every await is a context switch.

If saveItem is doing real network I/O, the await at least lets the event loop run other things during the network wait. But within this batch function, items are still processed one at a time. To run them concurrently, you need:

async function processBatch(items) {
    await Promise.all(items.map(async (item) => {
        await transformItem(item);
        await saveItem(item);
    }));
}

This runs items concurrently. But concurrently is not the same as in parallel. If transformItem is CPU-bound, the parallelism is fake. The CPU work still runs on one thread, just interleaved at await points.

CPU-Bound Work Is Where It Falls Apart

The classic Node.js failure story is a service that needs to do JSON parsing on a 5MB payload during request handling. Every request that hits this code path blocks the event loop for 50 to 200 milliseconds. While that happens, every other request waits. Even health checks. Even unrelated database calls already in flight. The whole service stops doing useful work for that 200ms.

This is not a Node-specific problem. Python's asyncio has the same issue. So does Java's reactive ecosystem if you accidentally call a synchronous API inside a reactive chain.

In Node, the fix is worker threads for CPU-bound code. In Python, multiprocessing or thread pools. The fix in Spring WebFlux is to schedule blocking work on a different scheduler with subscribeOn(Schedulers.boundedElastic()). None of these are async. They are actual parallelism, on top of the async runtime.

The Spring WebFlux Trap

Spring WebFlux is a reactive framework that uses async I/O on top of Project Reactor. It is fast for the right workload. The wrong workload is where teams get burned.

I have seen teams move a Spring MVC service to WebFlux because someone read that "reactive is faster." The service was making one database call per request, using JDBC. JDBC is blocking. Wrapping the JDBC call in Mono.fromCallable() runs synchronously on a worker thread anyway. So the "reactive" service was MVC with extra ceremony, and a more complex programming model with no measurable benefit.

Another failure mode: forgetting that any blocking call inside a reactive chain blocks the event loop thread, which is shared across many requests. One blocking JDBC call in a flatMap will make a WebFlux service slower than a regular MVC service under load. The whole event loop stalls until the JDBC call returns.

Reactive on the JVM is async I/O with backpressure. It is a real win for services that fan out to many slow downstream calls per request. It is a disaster if you do not internalize what blocks the event loop and what does not.

What Async Is Actually Good For

Async is good for workloads where threads spend most of their time waiting for I/O. The canonical case is a service that calls 5 downstream services per request and then aggregates the responses. With threads, each request holds a thread for the duration of the slowest downstream call. With async, one thread can handle many concurrent requests, switching between them whenever any one is waiting.

The win is throughput per thread. You do not need more cores. You do not need more compute. You just stop wasting threads on I/O waits. For services that are I/O-bound at scale, this is real and valuable.

The win is not free. Async code is harder to write correctly. Stack traces are worse. Debugging is harder. Blocking calls accidentally introduced anywhere in the chain destroy the throughput gain. The benefit only shows up if the workload is genuinely I/O-bound and the scale is high enough that thread cost matters.

How to Know What You Have

Two questions to ask when you suspect async will help:

What is your service spending its time on? If you profile and see CPU work dominating, async will not help. If you see threads sitting in network reads, async might help. Profile before you switch frameworks.

Do you have many concurrent requests, or few? If you have 10 RPS and each request does one thing, threads are fine. The cost of threads is invisible at low concurrency. If you have 10,000 RPS each fanning out to multiple services, threads start to matter and async starts to pay back its complexity.

If neither of those applies, the synchronous version of your service is probably fine. Async on its own does not make the service better. The benefits are specific and narrow.

The shortest version of the rule: async lets one thread do more I/O, not more CPU. Almost every async-related performance disappointment I have seen comes from confusing those two.

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

GitHub LinkedIn Email

👁 0 6 min read