Realtime & Streaming APIs

Backpressure in Realtime Connections

Ishtmeet Singh @ishtms/June 11, 2026/48 min read

#nodejs#realtime#websocket#sse#backpressure

One browser tab sits on a slow network while your server keeps broadcasting. The socket stays open, the user still shows up as online, and your writes keep succeeding for a while. Then memory climbs, latency creeps into everything, and the process spends more time holding pending messages than it spends serving the clients that can actually keep up.

A file stream or a single HTTP response body has a clear endpoint. The transfer finishes, the pressure drains, and the handler is done. Realtime connections stay open across a long run of unrelated updates, so pressure builds across the whole session instead of during one transfer. A user can miss five typing updates, still get the next room message, then reconnect an hour later and ask for durable history. The next user receives every price tick and only ever looks at the latest one. A dashboard pushes a new counter value every 250ms to a phone that drains the socket once every few seconds.

The stream mechanics from Chapter 3 still apply underneath. What this chapter adds is the policy around them, how much each client may buffer, for how long, and which messages still count once it falls behind.

Follow one application update from the moment your code produces it to the moment the remote client can actually use it. That whole route is the realtime send path for one connected client. In a Node server it runs through serialization, an application queue, protocol framing, a WebSocket library or SSE response writer, Node's writable stream layer, the kernel send buffer, TCP flow control, the peer's receive buffer, and finally the peer application.

Several of those stages hold their own buffer.

application event
  -> encode message
  -> per-connection send queue
  -> WebSocket frame or SSE event block
  -> Node writable state
  -> kernel send buffer
  -> remote receive path

The realtime send path as a vertical stack of buffers for one client, with the gauge that observes each layer. — One application update flows down through the per-connection queue, the WebSocket sender, Node's writable state, the kernel send buffer, and the network before the peer can process it. queue.length and queuedBytes watch the application queue, ws.bufferedAmount covers the sender plus the writable bytes under the socket, write() returning false watches the writable layer, and a successful send never proves the peer received the bytes.

Chapter 3 already covered backpressure for a single stream. The difference in a realtime server is how much more the application has to decide. Stream backpressure only tells one writer that one writable destination has crossed a local threshold. A realtime server tracks how many unsent messages a single client may hold, how long they can sit there, and which of them still count once that client falls behind. Those decisions are yours, made in application code.

The Send Path Stays Open

An ordinary HTTP handler deals with one response. A route receives a request, writes headers and some body bytes, and finishes. Backpressure still happens along the way, but the request ends and everything it held goes away.

Realtime has no such end. A single WebSocket connection can receive ten thousand unrelated outbound messages during one tab session. An SSE event stream can hold one ServerResponse open for hours. Long polling breaks into shorter requests, but the client can still lag behind the application event rate if it polls too slowly or keeps reconnecting from an old cursor.

Take WebSocket first. Each connection carries a little state object that your code controls.

const client = {
  ws,
  queue: [],
  queuedBytes: 0,
  flushing: false,
  closed: false
};

That object is the per-connection send queue. Your code attaches it to one live client and uses it to hold outbound messages until they enter the WebSocket sender and the socket underneath.

The queue exists because application events arrive at one rate and each socket drains at another. Your room service might publish one message to 20,000 clients at once, and every connection drains at its own pace after that. A few are on the same datacenter network, plenty are on home Wi-Fi, some are on phones, and a handful are backgrounded tabs that the browser has throttled. A single shared outbound queue would blend all of those into one pressure number. A queue per connection keeps each client's pressure tied to that client.

A single outbound update changes form several times on the way out.

domain event
  -> JSON string or Buffer
  -> queued message record
  -> WebSocket frame or SSE lines
  -> bytes accepted by socket write

Every one of those forms costs something. The domain event is often a single object shared across many recipients. The encoded payload is bytes sitting on the V8 heap or in external memory. The queued record adds metadata, things like byte length, sequence number, deadline, channel, and sometimes a coalescing key. The WebSocket layer tacks on frame headers, and compression spends both temporary memory and CPU before any byte reaches the socket.

The per-connection queue is also a small state machine, even when the code starts life as one array and one boolean.

idle
  -> queued
  -> flushing
  -> blocked
  -> closing

State machine for one connection's send queue with states idle, queued, flushing, blocked, closing, and released. — The queue moves from idle to queued on enqueue, into flushing as it drains, and into blocked when write() returns false or the connection goes over budget. Closing is reachable from any state and frees the queued bytes while rejecting new work.

In the idle state the connection has nothing pending. It moves to queued once messages exist in application state but have not yet reached the protocol writer. flushing is the active state where one or more of them are moving into the WebSocket sender or the HTTP response stream. The connection goes blocked when the writable path below reports pressure or the library's queued bytes climb past budget. And closing covers two cases, either policy has decided to end the connection or the transport is already shutting down.

Different layers control different transitions. Your application code manages the queued records. The WebSocket library builds frames and runs its own sender queue. Node handles the writable stream state, the kernel handles the socket buffer, and the remote endpoint deals with its own receive path. One send() call passes through several of those layers at once, but the first decision, whether to enqueue the next message at all, is the application's alone.

The queue record is where the application keeps the facts it needs for policy. A socket buffer counts bytes and nothing else, so it draws no distinction between a byte range from a typing indicator and one from a payment status update. That distinction has to live in your code. The WebSocket sender works at the level of frames, with no concept of which pending frame a newer state snapshot could replace. The per-connection queue holds that policy metadata during the window where the bytes still carry meaning, before they turn into undifferentiated transport bytes.

Encoding time forces a similar choice. Encode the message once at publish time and every client can share that one payload, though you burn CPU encoding for clients that later drop it. Hold off and encode at flush time instead, and you skip the work for dropped items while risking the same serialization once per recipient. Most realtime servers run a mix. Shared encoding handles the common room message, per-client encoding handles personalized fields, and lazy encoding suits lossy channels that often coalesce before they ever flush.

Close state belongs in the queue too. A WebSocket can start closing while your code still holds a reference to the client object. An SSE response fires close when the browser disconnects. A long-poll request can time out while the server is still selecting events for it. Once a connection is closing, the queue should refuse new work and free its queued bytes as the transport shuts down. Skip that and the disconnected clients stay around as live JavaScript objects, each holding payloads that will never be delivered.

SSE behaves the same way with different wire units. The server writes an event block to http.ServerResponse, and from the server's side that response is just a Writable stream.

if (!res.write(block)) {
  res.once('drain', flush);
  return;
}

res.write() hands you the same stream-level signal Chapter 3 described. What it does not handle is the realtime question, what to do with later application events while that response waits for drain. One answer is to hold every event in an array until the client catches up, which is also the answer that runs you out of memory.

ws.send() has the same gap. The call accepts your message, frames it, and pushes it toward the socket, but the only thing it reports back is a local enqueue or send attempt. Knowing the peer actually received the bytes takes a protocol-level or application-level acknowledgement, and plenty of realtime feeds skip that step on purpose, trading per-message confirmation for lower latency.

⚠️

Warning

ws.send() never pushes back. It takes the message, frames it, and queues the bytes whether or not the socket can keep up, then returns with no signal that the peer has fallen behind. A broadcast loop that calls send() once per recipient will push gigabytes into per-connection buffers for clients that stalled, and nothing throws. Check your own queue or bufferedAmount before you call send(), while you can still act on the answer.

Long polling moves the queue somewhere else again. A pending poll request can grab the next available event right away, but once the server sends that response the client has to open a fresh poll. Events that arrive in the gap between two requests need somewhere to wait, so the server keeps state keyed by an event cursor. Slow consumption is still here too. It takes the form of stale cursors, oversized replay responses, and clients that keep asking from a position they never catch up from.

All three transports come back to the same requirement. Every one of them needs a bounded place to hold pending outbound work, and code for the moment that place fills up.

Where Pressure Accumulates

Pressure almost never sits in a single variable. A Node realtime server can stack up pending work in application arrays, per-connection queues, the WebSocket sender's internals, bufferedAmount, Node's writable state, kernel send buffers, a reverse proxy's buffers, and the remote client itself. You can read some of those numbers directly, infer others only roughly, and several of them overlap.

The version that shows up by accident is an unbounded array.

function sendLater(client, payload) {
  client.queue.push(payload);
}

That code takes every payload, no limit anywhere. A send queue limit fixes that by capping how much pending outbound work one connection may hold before your policy code steps in. You can express the cap in bytes, in message count, in age, or some combination. Bytes keep memory under control. A message count protects CPU and keeps one connection from hogging flush time. Age controls how stale the feed is allowed to look to the user.

A byte limit has to count encoded size, not the number of source objects.

const MAX_BYTES = 256 * 1024;

function canQueue(client, bytes) {
  return client.queuedBytes + bytes <= MAX_BYTES;
}

This check runs before any data reaches the library, so it is the first pressure signal your code sees. The moment canQueue() returns false, that connection is in queue overflow, meaning the next message would push it past its configured limit. From there your policy decides what to do, whether that is to drop, coalesce, close, or mark the client for replay.

WebSocket bufferedAmount is a second signal. In the browser, it reports how many bytes the WebSocket object has accepted through the API and still has waiting to go out. Server libraries like ws expose a similar property on their own WebSocket object, and there the value covers bytes in the sender plus writable bytes already sitting under the socket. Read it as a local queued-byte gauge for that one WebSocket object, never as proof of delivery.

if (ws.bufferedAmount > 512 * 1024) {
  ws.close(1013, 'client too slow');
  return;
}

That guard looks at the library and socket side, after messages have already crossed into the WebSocket layer, so it sits alongside the application queue limit instead of replacing it. A lot of memory can already be tied up in message records, JSON strings, and compression buffers before bufferedAmount so much as moves.

bufferedAmount is a late signal by nature, because it only grows after send() has already run. Say a broadcast eagerly calls send() for every recipient and only checks bufferedAmount once the loop finishes. By then the process has paid almost the entire cost. The per-connection queue is the earlier signal, the one to check before you serialize an expensive per-client payload, run compression, or pile more work onto a connection that is already behind.

💡

Tip

Check the per-connection queue before you spend anything on per-client work. bufferedAmount only grows after send() has already framed the payload and, with permessage-deflate, compressed it. A loop that serializes, compresses, and sends to every recipient and only then reads bufferedAmount has already spent the full cost. Application queue length and queued-byte count are the early signals, so read those before serialization and compression.

A low or zero bufferedAmount can fool you in the other direction. All it says is that the local WebSocket object has little queued outbound data at this instant. It tells you nothing about whether the user's view is fresh. The server might have dropped lossy messages before they ever reached the WebSocket layer, the client might still be behind in a replayable feed, or the browser might hold the bytes while the UI update sits in a task queue waiting its turn. bufferedAmount measures transport-side pressure. It is one input to the realtime contract, and you read it alongside the others.

Node's writable stream state is one more layer. A false return from socket.write() or res.write() means Node took the chunk into local writable state and crossed the stream's high water mark. Chapter 3 went through that mechanism in detail. In realtime code the usual mistake is reading that return value and then writing the next chunk anyway.

const ok = socket.write(payload);
if (!ok) socket.once('drain', flush);

⚠️

Warning

Register the drain handler once per blocked write, and keep flush idempotent. The trap is calling socket.once('drain', flush) (or res.once('drain', flush)) on every write attempt while the socket is still blocked. None of those listeners has fired yet, so they pile up, you hit MaxListenersExceededWarning, and several overlapping flush loops all fire the instant the socket drains. Keep a single flushing flag on the connection so only one drain wait is ever outstanding.

Your code has to stop feeding that destination until drain fires. Keep writing past the false and Node keeps accepting chunks until some other limit, or plain process memory, finally stops it. Treat the false as a hard stop.

Underneath Node, the kernel send buffer holds whatever bytes the OS has accepted for that TCP connection, and TCP flow control governs how much the peer is ready to take. Chapter 9 digs into those internals. The one thing to carry forward from here is that a JavaScript send can return long before delivery. The bytes can still be in the local kernel, or in flight between the two machines, while the peer application stays behind.

The remote side runs its own buffers and its own scheduling. Browser JavaScript can be paused, a mobile OS can throttle a backgrounded tab, a client process can parse messages slowly. Underneath all of that, TCP and the WebSocket layer can hold the connection open while the application-level feed goes stale without any error.

All of this makes pressure hard to detect cleanly, because several signals tend to move at the same time.

application queue length rises
queued application bytes rise
WebSocket bufferedAmount rises
write() returns false more often
send callback latency rises
oldest queued message age rises

On its own, not one of those tells the whole truth. A short bufferedAmount spike during a big broadcast can disappear again within milliseconds. The opposite case, a small queue whose oldest message keeps getting older, often means the connection has stalled. And a large queue can be completely harmless when its updates coalesce, because the next flush collapses them to a single latest value, while a slow trickle of durable chat messages is far more expensive, since every one of those has to survive and stay in order.

Compression complicates it further. permessage-deflate shrinks the bytes on the wire but adds CPU work before every send. A server that compresses the same broadcast separately for thousands of clients can burn that CPU well before any socket pressure shows up. It also muddies byte accounting, since the object size, the uncompressed payload size, the compressed frame size, and the temporary compression memory are all different numbers.

The practical move is to keep accounting simple and local. Track application queued bytes before compression, the message count, and the age of the oldest queued item. Watch bufferedAmount, and count your drops and closes. That short list of numbers is enough local evidence to run policy, without pretending you can see the full state of the network.

Slow Consumers

Call a connection a slow consumer when its outbound path drains slower than the server fills it, and stays that way for a meaningful stretch of time.

The key part is rate over time. A single stalled write during a network hiccup is just noise. A connection counts as slow only when the queue keeps growing across several flush attempts, when the oldest queued message keeps aging while new ones arrive, or when bufferedAmount stays over budget even after a grace period.

You have to classify this from inside the server, because TCP accept behavior looks healthy the entire time. One slow WebSocket connection can build up memory in its own send path while the server happily accepts new connections. The listening socket, the accept queue, the active connection count, all of them read as normal. The real problem sits in per-client outbound state, out of view of the usual metrics.

Detection can stay small.

function isSlow(client, now) {
  return client.queuedBytes > client.maxBytes ||
    now - client.oldestQueuedAt > client.maxAgeMs ||
    client.ws.bufferedAmount > client.maxBuffered;
}

Each clause covers a different way to fall behind. The byte budget catches a memory problem, the age check catches a freshness problem, and bufferedAmount catches pressure that has already reached the WebSocket layer. Real code adds hysteresis on top, so the condition has to hold for a short window, or fail repeatedly across several flush attempts, before it counts. Without that, one large payload can flag a connection as slow for a few milliseconds and then clear it again.

False positives are normal, not rare. A client moves between networks and pauses for a second, a browser tab hits a temporary scheduler delay, a mobile radio wakes up after sitting idle. React to the first failed write by closing the connection and you create needless churn. Never close it and you eventually run out of memory. What you want is a middle zone between those two extremes.

One layout that works splits the budget into three tiers.

under budget
  -> send normally
near budget
  -> coalesce or skip lossy updates
over budget
  -> drop, replay later, or close

That middle zone exists because realtime traffic carries mixed value. Presence state, cursor position, typing indicators, counter values, progress percentages, these can usually collapse down to whatever the newest value is. Chat messages, payment status transitions, and audit events generally need ordered delivery or replay instead. So the slow-consumer policy has to know which channel it is dealing with.

SSE needs the same classification. When res.write() keeps returning false and the oldest queued event keeps aging, that client is slow for the feed. Long polling shows it a little differently, because a client that keeps requesting from old cursors is consuming slower than the stream advances. One old cursor right after a reconnect means nothing on its own. A client that requests from behind over and over is genuine pressure.

SSE has one extra trap. The HTTP response object can read as alive while the useful stream is already stale. The TCP connection still accepts tiny heartbeat comments, and the server still writes :\n\n often enough to keep an idle timeout from firing, all while the application event queue sits full of old event blocks stuck behind a blocked res.write(). Heartbeats only prove the connection has some life left in it. For the data to be fresh, the event queue has to actually drain.

So SSE detection has to look at both the event stream queue and the response object. A response that keeps writing heartbeats while data events stay queued is still behind on the actual feed. And if one queue holds both heartbeats and data, the heartbeats can make it look busier than it really is, so keep them out of durable event accounting or mark them as discardable control traffic.

For long polling the symptom turns into cursor lag. A client sends a poll cursor, waits for the response, processes it, then comes back with a newer cursor. If the newest retained event on the server is 9000 and the client keeps asking from 8300, it is consuming behind the stream. One larger replay batch might let it catch up that one time, but batch after large batch means it is staying behind at the current event rate.

server newest event: 9000
client cursor:       8300
cursor lag:           700

Cursor lag does for long polling roughly what queue age does for a socket connection. It tells you how far behind the client is in application terms, while a byte queue tells you how much pending transport work is sitting around. They answer different questions, so track both.

The false positives themselves differ by transport. A WebSocket stalls when the client connection is slow. SSE can stall because a proxy buffers response chunks or the browser throttles the tab. Long polling can look slow only because the client deliberately backs off between polls. Detection should take each transport's normal behavior as the baseline. A one-second gap is alarming on a market feed and totally fine on a notification badge.

The danger this all points at is per-connection memory multiplying. Take one queued message at 4KB, 10,000 clients each fallen behind by 500 messages, and the total climbs fast. Shared references help only up to the point where each connection needs its own per-recipient metadata, compression state, personalized fields, or ordering state. A bounded design starts from the assumption that any connection can go slow, and keeps the cost of each one contained.

Handling slow consumers belongs in the normal operating path, not in some emergency code branch. Most clients stay under budget, a few drift up toward the limit and then recover, and a small number go past it and hit policy. The goal is narrow, to keep the cost of every over-budget connection bounded.

Budgets and Bounds

A backpressure budget is just the pending outbound work you allow one connection to hold, written down as real numbers. So make it concrete. Pick a byte count, a message count, a maximum age, and a send deadline where one helps, then put those values in code or configuration. Skip the actual numbers and your backpressure plan collapses into an unbounded queue the first time a broadcast loop hits a slow socket.

A workable per-connection budget usually combines four fields.

maxQueuedBytes
maxQueuedMessages
maxOldestMessageAgeMs
maxBufferedAmount

Each field guards something different. Bytes guard memory. The message count guards CPU and keeps flushing fair across connections. Oldest-message age guards freshness. And bufferedAmount tracks the WebSocket library and the socket beneath it. A connection that blows past any one of the four drops into pressure handling.

Send deadlines add a time dimension to the policy. A deadline is the last moment a queued message is still useful to this connection, and after it passes the server treats the message as expired. In a realtime send path a deadline is nothing more than a local freshness limit. The broader theory of deadlines and cancellation shows up later in the book. For now it is one timestamp on an outbound message.

const item = {
  payload,
  bytes: Buffer.byteLength(payload),
  deadline: Date.now() + 2000
};

At flush time the server can skip an expired item, fold it into a newer value, or turn the expiry into a close or replay decision. Two seconds is long enough to make a typing indicator worthless. A durable order-status event that waited those same two seconds can still be entirely valid. The channel decides which case you are in.

The queue record usually needs a little more metadata.

const item = {
  seq,
  channel,
  key,
  payload,
  bytes,
  deadline
};

Each field has a job. seq drives gap detection, channel keeps policy separate per stream, and key is what coalescing matches on. bytes feeds memory accounting, while deadline carries the age limit. The payload holds the encoded message itself, or a reference to shared encoded bytes.

Byte accounting needs discipline. Bump the counter up when the queue accepts an item, and bring it back down when the item leaves, regardless of how it left, whether sent, dropped, expired, or thrown away during close. Miss a single decrement and the connection reads as over budget forever, even when it is healthy. Miss an increment instead and the limit quietly stops applying at all.

⚠️

Warning

Balance queued-byte accounting on every exit path. Increment when an item enters the queue, and decrement when it leaves, however it leaves, whether sent, dropped, expired, coalesced, or discarded on close. A missing decrement leaves queuedBytes permanently inflated, so the connection reads as over budget forever and starts dropping or closing healthy traffic for no reason. A missing increment does the opposite and lets the limit stop applying. Route every removal through one helper so the counter changes in exactly one place.

function removeQueued(c) {
  const item = c.queue.shift();
  if (item) c.queuedBytes -= item.bytes;
  return item;
}

That helper looks too trivial to bother with, right up until three different overflow paths all start removing items. Putting the decrement in one place is what keeps the count correct across all of them.

Message count deserves the same care. One queue holding a single 2MB payload and another holding 2,000 tiny payloads put very different pressure on the server. The big payload is a memory and write-size problem. The 2,000 small ones are an iteration, callback, bookkeeping, and fairness problem. A byte-only budget misses the second situation, and a count-only budget misses the first.

Oldest age needs a stable creation timestamp. Set it when the queue accepts the item, not when the domain event first came into existence, because that event may have passed through other systems before this connection ever saw it. Local send pressure only cares how long this connection has been holding the message. Replay freshness is the case where you might also want the original event time. Keep both timestamps only when both decisions genuinely use them.

const item = {
  createdAt: Date.now(),
  eventTime,
  payload,
  bytes
};

The per-connection queue also wants a maximum flush batch. Say one client has 1,000 queued items and the transport suddenly goes writable again. Draining all of them in a single synchronous pass can starve every other connection of CPU. A batch limit lets the queue make progress without turning recovery into one long uninterrupted JavaScript run.

const BATCH = 32;
for (let i = 0; i < BATCH; i++) {
  if (!sendNext(client)) break;
}

Batching is a separate concern from dropping. Dropping is about which messages survive in the first place. Batching is only about how much of the surviving work one turn of the loop gets through. They are easier to reason about, and to change, when they stay apart in the code.

The flush loop should work one connection at a time.

function flush(c) {
  if (c.flushing || c.queue.length === 0) return;
  c.flushing = true;
  const item = c.queue.shift();
  c.ws.send(item.payload, () => {
    c.flushing = false;
    flush(c);
  });
}

That snippet is deliberately stripped down. Real code layers error handling, byte accounting, close-state checks, and deadline checks on top. The structure underneath is the part to hold onto. A connection drains its own queue, and it only schedules the next item after the previous send callback has fired. A different client runs the same loop over its own separate queue.

The send callback still only reports local progress. That is enough for pacing the queue and measuring send latency, though confirming the peer received anything still takes an application-level response.

The budget decision belongs before the enqueue, not after it.

function enqueue(c, item) {
  if (!canQueue(c, item.bytes)) return overflow(c, item);
  c.queue.push(item);
  c.queuedBytes += item.bytes;
  flush(c);
}

That runs the overflow decision right at enqueue time. Either the item fits the budget and goes in, or policy runs immediately, before a single extra queued item has grown memory.

Age has to be checked both during flush and during enqueue. On a quiet connection, items can go stale while no new messages arrive at all, so nothing triggers a flush to notice them. On a busy one, every incoming item can produce fresh overflow. A periodic sweep handles the quiet case, but enqueue still has to enforce the limits itself, since enqueue is the moment the queue actually grows.

The actual budget numbers are product and infrastructure choices. A collaborative editor with lossy cursor updates can get away with small queues and aggressive coalescing. A chat room might allow larger ordered queues but close clients that fall too far behind, then rebuild them from history. A financial ticker often wants latest-value channels for display state and a separate replayable channel for the trades themselves. Whatever the choice, the server code should state it openly instead of leaving it implicit.

A global memory limit helps as well, but the real decision lives in the per-connection budgets. A global cap can only tell the process that total queued work is high. The per-connection budgets are what identify which connection caused the pressure and what response fits it. Without them, a handful of slow clients can consume the whole global budget and drag healthy clients down with them.

A process-wide budget still has a place in protecting the runtime. Sum queued bytes across all connections and start refusing new lossy work once the process nears its own cap. Even then the action should run through per-connection policy. Closing clients at random because the global number looks high leads to confusing behavior. Shedding the specific clients that went over their own budgets is much easier to follow.

Admission control is the last lever. A server under heavy broadcast pressure can refuse new realtime connections, or accept them with reduced subscriptions. That decision belongs to capacity and load-shedding design, which the book covers later. Right here, the send path only has to expose enough state to make that call possible, things like total queued bytes, the count of over-budget clients, average queue age, and close counts split out by reason.

Drop, Coalesce, Close, Or Replay

Once a connection actually hits queue overflow, the budget you sketched turns into real policy. The server has four common moves here, drop, coalesce, close, or replay. Each one fits some kinds of message and is wrong for others.

A drop policy spells out which messages the server may discard when a connection cannot keep up, and you want it stated per channel rather than left to chance. With drop-newest, the server keeps the existing queue intact and ignores the message that just arrived. Drop-oldest goes the other way and removes older queued messages to make space for newer ones. Drop-by-type sheds low-value channels first, so durable channels lose messages only after the lossy ones already have.

Drop-newest is the conservative choice for ordered streams.

if (!canQueue(client, item.bytes)) {
  client.dropped++;
  return false;
}

That code turns away the new item and leaves the queue as it was. It fits cases where the older messages count for more than the newest one, or where a separate replay path can fill the gap afterward. The downside is that it can keep a client stale, with the queue packed full of old messages while fresh state never gets in.

Drop-oldest favors freshness instead.

while (!canQueue(client, item.bytes) && client.queue.length > 0) {
  removeQueued(client);
  client.dropped++;
}

if (!canQueue(client, item.bytes)) {
  return closeForPressure(client);
}

That code clears room by deleting older queued items through the accounting helper from earlier. The second budget check catches an incoming item that is bigger than the entire connection budget, where pressure policy has no choice but to run. Closing is one response. A lossy channel could just as well drop the incoming item.

Drop-oldest suits lossy realtime streams where the newest value is the only one the client actually needs. Before an ordered durable stream can use the same policy, it needs gap detection and stored-history recovery in place.

Coalescing replaces several pending messages with one newer message that stands for the current state of the same key. It helps when the feed carries state snapshots rather than a history you need to keep.

function coalesce(c, next) {
  const i = c.queue.findIndex(x => x.key === next.key);
  const old = i === -1 ? undefined : c.queue[i];
  if (old) c.queuedBytes -= old.bytes;
  if (old) c.queue[i] = next;
  else c.queue.push(next);
  c.queuedBytes += next.bytes;
}

Replacing a record still moves the byte budget. The counter drops the old record's bytes and adds the new one's. Passing the connection into the helper keeps coalescing and accounting together in the same spot.

A latest-value channel is one where the current value cancels out any earlier unsent values. Presence state works this way, and so does cursor position. If user u1 moves from coordinate A to B to C while the client is blocked, delivering A and B afterward is wasted, stale work. C alone says everything that channel needs.

presence:u1 online
presence:u1 away
presence:u1 offline

On a latest-value channel the queue can hold only presence:u1 offline for that key and drop the rest. The earlier values were meaningful when they were produced, but they expired before the client could receive them.

A lossy realtime stream is a feed where skipping some messages still leaves the current experience correct. Typing indicators, cursor movement, live progress percentages, and high-frequency dashboard samples usually qualify. Calling a stream lossy is a statement about the application contract, that gaps are allowed on specific, declared channels. The stream should make clear to the client which channels can skip and which ones need recovery.

Closing the connection releases pressure outright. A server can close a WebSocket once it has been over budget for too long. Close code 1013 commonly tells the client to retry later, and some systems use 1008 for policy violations. Whichever you choose, line the code up with the actual reason. The reason string is diagnostic only, so do not make client protocol correctness depend on its exact text.

Closing makes sense when reconnect and recovery are already part of the contract. Reconnect behavior comes up in Chapter 13.4, but the send-path policy can still decide on its own that an over-budget connection has to go. Holding a broken connection open indefinitely just spends memory on a client that cannot use the feed right now.

Replay leans on stored history. A replayable event, which the previous subchapter went through, can be sent again from retained state when the client reconnects or asks from an event cursor. What sets replay apart is where the cost lands. It moves pressure off per-connection memory and onto retained event history and recovery reads, and for durable application events that is often the right trade.

In practice the channels sort out like this.

chat message
  -> keep order, close on overflow, replay from history
typing indicator
  -> drop or coalesce, no replay
presence state
  -> latest-value coalescing
price display sample
  -> lossy stream with sequence gaps allowed

That policy table carries more weight than the transport choice does. WebSocket, SSE, and long polling can all carry any of these channels. The transport settles framing and how the connection stays open, and the channel policy settles how the server reacts under pressure.

Mixed channels need extra care. When one WebSocket carries both durable chat messages and lossy typing indicators, the send queue has to stop the typing indicators from eating the budget the chat needs. The fix can be separate per-channel queues under one connection, priority classes, or simply dropping lossy channels first when overflow hits.

Priority classes should stay bounded too.

connection budget
  durable queue: 192KB
  lossy queue: 32KB
  control queue: 32KB

Those numbers are only an example. Splitting the budget stops low-value traffic from swallowing the whole connection, and it also keeps durable traffic from taking every byte when control messages still need room for close, resume, or protocol coordination.

Control messages deserve handling of their own. A server closing a slow connection still has to send a close frame, and a server asking a client to resync may need to send a compact "replay required" message. When the normal queue is already full, the control path needs either a small reserved budget or a straight socket close.

Backpressure policy should be deterministic. Feed it the same queue state and the same incoming item, and it should reach the same decision every time. Random drops make debugging harder, and priority rules hidden across the code produce surprises in production. One small, explicit policy function is far easier to test than pressure logic smeared across every broadcast call.

The version I reach for in real code stays this small.

function overflow(c, item) {
  if (item.lossy) return dropOrCoalesce(c, item);
  if (item.replayable) return closeForReplay(c);
  return closeForPressure(c);
}

Under budget the server sends normally, and near budget it coalesces or skips lossy updates. Once the queue overflows, lossy messages shed work locally, replayable messages close and recover from stored history, and messages that are neither force a close because there is no safe slot for the next required message.

The function names hold the whole policy. Lossy traffic sheds work locally. Replayable traffic can close and then recover from stored history. Traffic that is neither lossy nor replayable runs out of options the moment the queue fills, because there is nowhere safe to put the next required message for that client. Closing the connection at that point at least reflects reality, instead of accepting writes that will never arrive.

Dropping should also raise a local signal to the client when the channel contract calls for one. Some lossy feeds can stay quiet, while others need to send a compact state reset once the pressure clears. A collaborative editor can drop cursor movements without a word, then push the latest cursor map afterward. A dashboard can skip samples silently, since the next sample overwrites them anyway. A room feed with replayable durable messages, on the other hand, should tell the client to resume from a cursor after it reconnects.

Coalescing has one real failure mode, which is breaking per-message side effects. If the client plays a sound for every alert, then folding three queued alerts into one latest alert changes what the user actually hears. A client that only shows the current server count is unaffected. So coalesce state snapshots, and leave any event that represents user-visible history alone.

⚠️

Warning

Only coalesce channels that carry state snapshots, never channels with per-message side effects. If the client plays a sound, bumps a badge, or appends to a log for each message, then collapsing three queued alerts into the latest one changes what the user sees. Coalescing is safe for presence, cursor position, and counters, where the newest value replaces the rest. It is wrong for anything the user is meant to observe one message at a time.

Close policy needs a reason that matches the recovery you expect. An overloaded server closing with the equivalent of "try again later" is one case. "Resume required" is the case where the server deliberately dropped replayable data and wants the client back with a cursor. A "policy violation" close belongs to clients that keep blowing past a documented subscription budget. When a close code and an application message both go out before the close, make them tell the same story.

Ordering And Gaps

Both dropping and replay only work if the client can tell what it missed, and that comes from ordering signals.

An application sequence number is one such signal, a number your code assigns to outbound messages in a stream or channel so the client can spot a missing one cheaply. The transport has its own ordering rules already, but those only describe bytes or frames on a single connection. A sequence number describes the logical feed instead.

let seq = 0;

function event(type, data) {
  return { seq: ++seq, type, data };
}

The number has to climb monotonically within whatever scope the client tracks, whether that is a room, a document, a user-specific feed, or one durable event stream. A single global sequence across the entire system costs coordination you rarely need, since per-channel numbers usually do the job.

A sequence gap is what the client sees when the next number it receives is higher than the next number it expected. Receive 41 and then 44, and the client knows 42 and 43 went missing within that scope.

let expected = 42;

function receive(msg) {
  if (msg.seq < expected) return;
  if (msg.seq > expected) return requestReplay(expected);
  applyMessage(msg);
  expected++;
}

That snippet skips a lot. A stale or duplicate message leaves expected where it was. A future message triggers a replay request from the missing sequence and then waits for recovery. Only the exact next message gets applied to state and bumps expected forward. Real clients also deal with stream resets and replay responses, but the point that carries over is that the client can detect a missing range from its own sequence state, with no help from the server.

A gap means different things on different channels. On a lossy cursor-position channel it can be completely fine. A durable chat stream treats the very same gap as something that demands replay or a visible resync state. On a presence channel, the next full snapshot repairs it on its own.

Gap handling has to run before the client applies the message. Once state has already changed, recovery code is stuck unwinding or overwriting half-applied data. A small client state machine can sort each message into old, next, future, or reset for that feed. Old ones are duplicates or late replays. The next one applies normally. A future one signals a gap. A reset replaces local state outright and sets a new expected sequence.

seq < expected  -> duplicate or stale
seq == expected -> apply and advance
seq > expected  -> gap, recover
reset           -> replace state

None of that logic depends on the transport. WebSocket messages, SSE events, and long-poll responses can all carry the same sequence fields. Only the delivery mechanism differs from one to the next, and the client state machine stays the same size regardless.

The sequence scope has to be stable. A room feed can keep one counter per room, a notification feed one per user, a document feed one per document, or even split counters for durable operations versus lossy presence. Whatever you pick, the scope has to match the recovery path. If a client asks for replay from room:7 at sequence 42, the server needs stored history under that exact scope.

Several scopes can ride one connection, which means the client has to keep a separate expected value for each scope.

const expected = new Map();

function nextSeq(scope) {
  return expected.get(scope) ?? 1;
}

A single WebSocket carrying room messages, notifications, and presence updates should not collapse them onto one global sequence unless every one of those channels replays from the same history. You do not want a gap in notifications forcing a room-message replay, or a gap in lossy presence corrupting durable message state.

The server can also state its gap policy right in the message.

{
  "seq": 44,
  "channel": "room:7",
  "loss": "replayable",
  "data": { "text": "ship it" }
}

The loss field here is only an example. Other protocols encode the same contract through channel names, subscription metadata, or message types. The goal either way is to stop the client from treating every gap identically.

Event cursors and replayable events from the previous subchapter slot in here. The cursor records where the client last reached. The replayable event is the stored history the server answers from at that point. And the application sequence number is what tells the client it needs to ask in the first place.

SSE already ships id: and Last-Event-ID, which can carry cursor or sequence information. WebSocket has no built-in event ID, so application messages have to carry their own. Long polling usually puts the cursor in the query string or the request body. The policy stays identical across all three, and only the place it lives changes.

Ordering and coalescing also interact. When a latest-value channel replaces seq: 10 with seq: 12, the client can see a gap. That gap is only acceptable if the channel contract allows intermediate values to be skipped. A client that treats every gap as replayable will turn coalescing into a stream of needless recovery requests.

For gap detection, monotonic IDs beat timestamps. A timestamp can run backward across hosts, collide under bursty writes, or simply lack the precision you need. A per-channel counter or a stored event ID gives much cleaner semantics. Timestamps still work as metadata, but they make a weak ordering contract unless the system was built around them from the start.

Idempotent client updates help here too, though the deeper theory sits elsewhere. The local rule for this chapter is enough. After a replay, a reconnect, or a duplicated delivery, the client should be able to apply the same message twice without corrupting its state, or recognize the duplicate by sequence number and drop it.

Broadcast Pressure

Broadcast pressure shows up when one application event turns into many connection writes at once.

Sending one chat message to one client is a unicast. Sending one room message to 20,000 connected clients is a broadcast inside the process. The event is still a single object, but the send path fans out into 20,000 per-connection operations, and each recipient brings its own queue, socket state, compression cost, and policy decision.

The event path runs in stages.

room event
  -> select recipients
  -> encode or reuse payload
  -> enqueue per recipient
  -> flush each recipient by budget

The failure-prone version uses one global pause.

one client slow
  -> global broadcast queue backs up
  -> healthy clients wait behind slow connection

Broadcast fan-out comparing one shared global queue against independent per-recipient queues. — A single global queue lets one slow client back up the shared path so healthy clients wait behind it. Per-recipient queues keep each client's pressure local, so a slow client absorbs its own backlog while the others keep draining at full speed.

That design makes healthy clients pay for the slow one. A slow consumer should only build pressure in its own send queue. Broadcast code should keep walking the recipient list and apply policy per recipient. When client A overflows, A is the one that drops, coalesces, closes, or marks for replay. Client B keeps receiving as long as B is under budget.

Reusing serialization saves CPU. When every recipient gets the same JSON payload, encode it one time and share that one string or Buffer reference around. The per-recipient metadata still has to stay separate.

const payload = JSON.stringify(event);

for (const client of room.clients) {
  enqueue(client, makeItem(payload));
}

That snippet keeps encoding outside the loop. There is a subtlety in it, though. makeItem() has to build per-client metadata, the deadline, the byte accounting, maybe a sequence scope. The encoded bytes are safe to share, but the queue record is not, because each client drops, replaces, or flushes its own copy on its own schedule.

Compression can undo that reuse. With permessage-deflate, the compression context and the negotiated options can depend on the individual connection, and some libraries compress on every send. A single broadcast can then expand into thousands of separate compression operations. When clients are already slow, the server burns CPU compressing messages it is about to drop. For high-fanout feeds, think about turning compression off on the noisy channels, or only compressing messages above a size that makes it worthwhile.

Broadcast loops need fairness too. A synchronous loop over 100,000 clients can hold the event loop for too long even when each enqueue is cheap on its own. Chapter 1 already covers the event loop model in full. The practical takeaway here is narrow. Large fanout should work in batches and yield between them once the recipient set gets big.

for (const client of batch) {
  enqueue(client, itemFor(client));
}
setImmediate(sendNextBatch);

That stops a single broadcast from monopolizing JavaScript execution. It does nothing for cross-process fanout, which Chapter 13.5 takes on. Within one process, though, it keeps the per-recipient pressure checks from piling up into one giant synchronous burst.

Broadcast loops should also keep failures isolated. An error thrown while sending to one client should flag that client and let the loop carry on. If a single exception escapes the broadcast loop, every remaining recipient in that batch silently loses the message, and one failed connection becomes a room-wide bug.

⚠️

Warning

Wrap each recipient send in its own try/catch inside the broadcast loop. If an exception from one connection escapes the loop, every remaining client in that batch silently misses the message, so one broken socket becomes a room-wide delivery bug. Catch transport errors per recipient, mark or close that connection, count the failure, and move on to the next client.

for (const client of batch) {
  trySend(client, itemFor(client));
}

trySend() should catch transport-specific failures, close the connection when it has to, and return cleanly. The broadcast loop counts the failures and keeps going.

Large rooms also push a decision about serialization. When every client receives identical bytes, one encode covers all of them. When each client receives a personalized field, the better approach is to split the payload into a shared part and a per-client part wherever you can. The room message body might be shared, while a per-recipient unread count goes out on a separate latest-value channel. That stops high-fanout durable traffic from carrying personalized state through every broadcast payload.

A global pause is tempting because it spreads the wait evenly across everyone. The producer just stops until every client has caught up. In a realtime server that ties healthy clients to the slowest connection in the room. Per-recipient queues and channel policy behave more fairly, since each slow client absorbs its own pressure while the healthy ones keep draining.

Broadcast pressure shifts close policy as well. When a broadcast pushes many clients over budget at the same instant, closing all of them at once just creates a wave of reconnect churn. A common approach drops lossy channels first, closes only the clients still over budget after a grace window, and asks the reconnecting ones to resume from cursors. The reconnect strategy itself comes up in the next section. The send path only has to produce the pressure signal that sets it off.

The part to get right in all of this is per-recipient accounting. A global counter can only say that broadcast pressure exists somewhere in the process. The per-connection counters are what identify the clients that are actually behind, and from there channel policy decides what to do with each one.

Local Pressure Signals

Pressure-aware code begins with local state. Hang a handful of fields off the connection object. For WebSocket, track queue length, queued bytes, oldest queued age, and bufferedAmount. For SSE, track how often write() comes back false. Add send callback latency, a drop count, and a close reason while you are there.

const pressure = {
  queued: client.queue.length,
  bytes: client.queuedBytes,
  buffered: client.ws.bufferedAmount,
  drops: client.dropped
};

Those numbers can land in a development log, an admin endpoint, or a metrics pipeline later on. Metrics systems and SLOs get their own chapter. Either way, the local code is what has to gather the facts in the first place.

Of all of these, oldest queued age tells you the most when debugging.

const oldest = client.queue[0];
const ageMs = oldest ? Date.now() - oldest.createdAt : 0;

Twenty queued messages with an oldest entry 15ms old is almost certainly a short burst. Twenty messages where the oldest has been sitting for 30 seconds is a stalled connection. A byte count on its own cannot tell those two apart.

Close reasons help as well. Count them separately for byte overflow, message overflow, age overflow, buffered amount, send error, and idle timeout. Label every close "transport error" and you throw away the policy signal completely. Give every tiny condition its own unique reason and you create high-cardinality noise instead. A small set works best, each reason tied to a real decision in the code.

For SSE, record how many times res.write() returned false and how long you waited for drain. WebSocket calls for bufferedAmount and send callback latency. Long polling calls for cursor lag and replay response size. Every transport surfaces its pressure somewhere different.

Local instrumentation should match the policy table.

channel
  budget
  overflow action
  drops
  closes
  replay requests

That pays off the next time you are debugging. When a dashboard reports that clients missed updates, you can immediately ask which channel dropped, whether gaps were allowed there, whether replay ran, and which budget fired.

During development, a small local pressure snapshot is often all you need.

function snapshot(c) {
  return {
    id: c.id,
    bytes: c.queuedBytes,
    buffered: c.ws.bufferedAmount,
    oldestAge: oldestAge(c)
  };
}

That endpoint should hand back metadata, not the payloads themselves. Dumping payloads leaks user data and piles on more pressure at the exact moment you are trying to debug pressure. IDs, counts, ages, channel names, and close reasons are usually plenty to find the faulty path.

⚠️

Warning

Keep pressure snapshots to metadata only, never payloads. Connection IDs, queued byte counts, message counts, oldest-message age, channel names, and close reasons are enough to find the stuck path. Dumping the actual queued messages into a debug endpoint or a log leaks user data and adds memory pressure at the exact moment you are trying to relieve it.

Pressure logs belong at decision points, not on every message. Log the moment a connection crosses a budget, when overflow policy drops or coalesces, when the server closes for pressure, and when a replay gets requested. Logging every single enqueue creates its own throughput problem the instant a broadcast storm hits.

accepted
near_budget
overflow_drop
overflow_close
replay_required

Those names are local states, not a finished metrics taxonomy, which can come later. What makes them worth having is that each transition lines up with one branch in the code. When a connection closes, its last pressure state should point to the exact branch that closed it.

Long polling wants its own snapshot with different fields, the current cursor, the newest retained event, cursor lag, replay batch size, and response age. SSE tracks queued event count, queued bytes, oldest event age, last successful write time, and drain wait time. WebSocket needs the queue fields plus bufferedAmount and send callback latency. One shared dashboard can display all of it, as long as the collection underneath stays transport-aware.

Every code path should end in one clear decision, whether that is accept within budget, coalesce, drop, close, or mark for replay. Anything short of a decision tends to turn into accidental buffering.

Realtime systems usually fail quietly. The send path keeps taking work long after a client has stopped keeping up, and nothing ever throws. A bounded per-connection queue is what turns that invisible state into something you can actually inspect, how much work is pending, how old it is, which messages still count, and the point at which a falling-behind connection has to pay for it.

The Send Path Stays Open

Where Pressure Accumulates

Slow Consumers

Budgets and Bounds

Drop, Coalesce, Close, Or Replay

Ordering And Gaps

Broadcast Pressure

Local Pressure Signals

Related Reading