← Back to Blog

// Posted by Umur Inan

// Category Backend

// Posted on May 29, 2026

Server-Sent Events Are Back. You Should Use Them.

Server-Sent Events made a quiet comeback because of LLM streaming. SSE vs WebSocket, the HTTP/1.1 connection trap, and the cases where SSE is the right call.

By Umur Inan · 5 min read

Why SSE is back

Server-Sent Events shipped in HTML5 in 2011. They went almost completely ignored for a decade. WebSocket got all the attention. Every "real-time" tutorial used WebSocket. Every job ad asked about WebSocket. SSE was the forgotten cousin.

Then LLM streaming happened. The output-as-it-generates UX that ChatGPT made canonical needs a one-way stream from server to client. WebSocket can do it, but SSE is the simpler tool for that exact shape of problem. Every major AI provider's streaming endpoint ended up being SSE. OpenAI. Anthropic. Google. Mistral. All SSE.

Now every team shipping an AI feature has SSE in their stack whether they know it or not. The protocol is having its second moment. This is the version of the article I would have wanted before our team rolled their own.

What SSE actually is

SSE is plain HTTP. The client opens a long-lived GET request. The server keeps the connection open and writes lines as they happen. Each line is a UTF-8 text event with a simple format: data: hello\n\n. The client gets each event via the browser's EventSource API.

That is the whole protocol. There is no framing, no binary, no negotiation, no upgrade handshake. The wire format is the same plain text you would get from a normal HTTP response, except the body never ends.

The simplicity is the value. SSE goes through every HTTP intermediary that exists: load balancers, proxies, CDNs, browser dev tools, curl. No special config. Anything that speaks HTTP speaks SSE.

SSE vs WebSocket

WebSocket is bidirectional and binary. SSE is server-to-client only and text. That single difference covers most of the comparison.

Where WebSocket wins: bidirectional traffic. Multiplayer games where every client sends input continuously. Collaborative editors where both sides type. Chat apps where users send messages over the same connection. WebSocket is the right tool when traffic flows both directions on the same channel.

Where SSE wins: server-to-client streams. LLM token streaming. Live dashboards. Notifications. Stock tickers. Server-rendered logs. Any case where the client kicks off the request and the server pushes results back, you reach for SSE. Sending data from client to server stays a regular POST.

The honest tradeoff is that WebSocket can do anything SSE can do. WebSocket comes with more moving parts: a custom protocol, a framing layer, ping/pong logic, and a reconnection scheme you have to write. For a one-way stream, that is a lot of code for no functional gain.

The HTTP/1.1 connection trap

Browsers limit HTTP/1.1 to 6 concurrent connections per origin. An SSE stream eats one of those six for the lifetime of the stream. If you open three SSE streams (a dashboard with three live widgets), the user has three connections left for normal browsing. Open six, the rest of the page stops loading.

This was the historical reason SSE got skipped for WebSocket. WebSocket runs over a single TCP connection that does not count against the HTTP limit.

HTTP/2 fixes this. Under HTTP/2 (and HTTP/3), all requests to the same origin multiplex over one connection. The 6-connection limit goes away. SSE streams can scale to dozens per page without hurting the rest of the experience.

If your site is served over HTTP/2 (almost every site behind Cloudflare, Vercel, Netlify, or any modern CDN), the connection trap is no longer a concern. If you are still on HTTP/1.1, it is the most important thing to know.

Server-side gotchas

SSE is simple. Implementing SSE is full of small mistakes that cost you a day to debug. The list, from most common to least:

Flushing. The framework you use will buffer output by default. Express, Spring, FastAPI, Django, Rails, all of them. You have to call the explicit flush method after each event or the client sees nothing until the connection closes. Look for res.flush(), response.flushBuffer(), or the equivalent.
Proxy buffering. Nginx (by default) buffers proxied responses. Your server flushes, Nginx holds the bytes. Set X-Accel-Buffering: no in the response headers to tell Nginx to pass through. Cloudflare and similar CDNs have similar settings that need explicit opt-out.
Idle timeouts. Load balancers, reverse proxies, and serverless functions all have an idle-connection timeout. AWS ALB defaults to 60 seconds. If your stream goes quiet for longer than that, the connection drops. Send a comment line (a line starting with :) every 30 seconds as a keepalive. It is ignored by the client but counts as traffic to the proxy.
Last-Event-ID. The client sends Last-Event-ID on reconnect. The server should honor it and resume from the next event. If you cannot resume, at least acknowledge the header in your design. Otherwise reconnections lose events.
CORS. Cross-origin SSE needs the same CORS headers as any other endpoint. The EventSource API does not let you set custom headers, which means cookies or bearer tokens have to go through the URL or through a cookie domain. The official spec allows withCredentials: true on the EventSource constructor.

Client side: EventSource and the reconnection story

The browser's EventSource API does most of the heavy lifting:

const es = new EventSource('/stream');
es.onmessage = (e) => console.log(e.data);
es.onerror = (e) => console.log('disconnected, will retry');

The retry is automatic. If the connection drops, the browser waits a default 3 seconds and reopens. The reopened request includes the Last-Event-ID header so the server can resume.

You can override the retry delay by sending a retry: line from the server. retry: 10000\n\n tells the browser to wait 10 seconds. This is useful when your server is taking a deliberate break (deploys, scheduled maintenance) and you do not want a thundering herd of reconnections at three-second intervals.

For Node, Python, Go, and JVM stacks, library support is fine. The Anthropic, OpenAI, and Mistral SDKs all use SSE-flavored clients under the hood; building a custom one is maybe 30 lines of code if you need to.

When SSE is the wrong choice

Three cases where you reach for something else:

Bidirectional traffic in the same channel. Use WebSocket. Multiplayer games, voice/video signaling, collaborative editing.
Binary payloads. SSE is text-only. You can base64-encode binary, but if your stream is mostly binary, WebSocket (or a custom HTTP/2 stream) is the right tool.
Sub-millisecond latency. SSE rides on HTTP, which has framing overhead. For ticker feeds or trading systems where you measure latency in microseconds, WebSocket and raw TCP are still ahead.

What I reach for now

For any "server pushes updates while client watches" UX, SSE is the default. LLM streaming, live logs, deployment progress, notifications, dashboards. The protocol fits the shape of the problem and stays out of the way of every HTTP tool you already use.

For anything bidirectional, WebSocket.

And the thing that always catches teams: turn off proxy buffering on day one. Send a keepalive every 30 seconds. Wire up Last-Event-ID resume on day two. Those three lines of operational hygiene are the difference between SSE that works once and SSE that works in production for a year.

HTTPWebSocketStreamingReal-timeAPI Design

Umur Inan

Principal Software Engineer

Backend engineer focused on JVM systems, distributed architecture, and the failure modes that only show up in production. I write about what I learn building and breaking things at scale.

GitHub LinkedIn Email

👁 0 5 min read