thenodebook.com
Lab 03 / Node Runtime Labs

Stream Processing Workbench.

Build a local Node.js workbench that generates NDJSON, processes it through whole-file and streaming paths, records backpressure, compresses output, cancels pipelines, and compares the evidence.

Stream commandsWorkbench command path
  • 01Generate
    $ npm run generate -- --rows 1000 --payload-bytes 32

    Creates repeatable NDJSON for small test runs.

  • 02Baseline
    $ npm run naive

    Measures the whole-file implementation first.

  • 03Stream
    $ npm run stream -- --where "status >= 500"

    Runs the streaming path with a real filter.

  • 04Pressure
    $ npm run pressure -- --slow-write-ms 5

    Makes drain timing and buffer growth visible.

  • 05Compare
    $ npm run compare

    Checks output parity and report differences.

Generate data, compare processors, then pressure the pipeline.
14Phases
8.5/10Difficulty
0 depsPackage installs
What this lab builds

The finished project measures stream behavior.

Controlled data generator

Generate predictable NDJSON with row count, payload size, error ratio, seed, output path, byte count, and a generator report.

Naive and stream processors

Build a whole-file baseline, then build a stream path with source adapters, line splitting, JSON parsing, filtering, plugins, and stringification.

Backpressure evidence

Simulate a slow writable destination, record queue length and drain events, and produce a pressure timeline with memory and throughput samples.

Comparison reports

Write generator, naive, stream, split-check, pressure, gzip, abort, and mode comparison reports without shipping a solution on the public page.

Complete phase plan

Every phase in Lab 03.

The build starts with a predictable CLI shell and ends with a comparison report across naive, stream, pressure, gzip, and highWaterMark variants. Each phase ships one visible capability and one artifact to inspect.

00

Create the project

Create the workbench shell with ESM enabled, Node v24+ declared, stable folders, npm scripts, help output, and strict flag handling.

  • Create the package manifest with ESM mode and a Node v24 or newer runtime contract.
  • Separate source code, generated input, processed output, and reports into dedicated folders.
  • Create the overview CLI entry that prints available commands, default paths, and common flags.
  • Add scripts for workbench, generate, naive, stream, split-check, pressure, and compare.
  • Treat --help as a success path and unknown flags as clear failures.
  • Keep the first argument parser small enough to inspect branch by branch.
A runnable project shell with predictable commands, paths, help behavior, and failure behavior.
01

Generate large NDJSON data

Generate data/access.ndjson with configurable row count, payload size, error ratio, seed, and output location.

  • Create the generator entrypoint and connect it to the generate command.
  • Support row count, payload bytes, error ratio, seed, and output path options.
  • Emit deterministic rows with id, timestamp, method, path, status, IP, user agent, and payload fields.
  • Write each row incrementally through a file stream while respecting writable backpressure.
  • Report the generated line count and byte size after the writer finishes.
  • Write reports/generator.json with settings, output path, line count, and byte count.
A repeatable NDJSON workload plus a generator report that later phases can cite.
02

Build the naive baseline

Process the whole NDJSON file with readFile, filter error rows, write filtered output, and report the memory-heavy baseline.

  • Create the naive processor and connect it to the naive command.
  • Support input, output, and where options with status >= 500 as the default filter.
  • Load the full input file, decode it, split it into lines, and count rows.
  • Parse every non-empty line, retain matching rows, and report parse failures with line numbers.
  • Write filtered NDJSON with the same logical format the stream path will produce.
  • Write reports/naive.json with rows, matches, elapsed time, bytes, and peak memory.
A whole-file processor that intentionally retains the input, decoded text, line array, matches, and serialized output.
03

Add measurement utilities

Create shared timing, memory, byte-count, and report-writing utilities so later modes can be compared.

  • Create the measurement module used by every command that produces evidence.
  • Record wall-clock start and end timestamps plus monotonic elapsed milliseconds.
  • Sample RSS, heap used, heap total, external memory, and ArrayBuffer memory on an interval.
  • Track input bytes, output bytes, and mode-specific byte counters as raw numbers.
  • Write pretty JSON reports with stable top-level groups for mode, status, input, output, counts, timing, and memory.
  • Move the naive command onto the shared timer, sampler, counters, and report writer.
One report shape shared by naive, stream, pressure, compression, abort, and comparison runs.
04

Build source adapters

Create compatible readable sources for regular files, stdin, and gzipped input.

  • Create the stream command boundary with source, input, and read highWaterMark options.
  • Create a source adapter module that returns a readable stream and source metadata.
  • Implement file input through a file stream with optional highWaterMark tuning.
  • Implement stdin input while keeping diagnostics off stdout.
  • Implement gzip input by reading compressed bytes and exposing decompressed NDJSON downstream.
  • Record source kind, input path, compressed state, declared bytes, and counted bytes read.
  • Keep all source-specific branching inside the adapter so downstream stages receive one byte-stream contract.
File, stdin, and gzip sources that produce compatible readable streams and source reports.
05

Build a line splitter transform

Split arbitrary byte chunks into complete NDJSON lines while carrying partial lines between chunks.

  • Create the line splitter module as a Transform stream or async generator stage.
  • Decode incoming Buffer chunks without corrupting UTF-8 characters split across chunk boundaries.
  • Carry the incomplete final segment after each newline split.
  • Flush the final carry when the upstream source ends and ignore the empty carry created by a final newline.
  • Normalize CRLF input by stripping one trailing carriage return from emitted lines.
  • Create a split checker that reads with tiny chunks to force carry-buffer behavior.
  • Use the splitter in the stream command so file, stdin, and gzip runs report both byte count and row count.
A chunk-safe NDJSON splitter plus reports/split-check.json with expected lines, seen lines, read buffer size, and pass/fail status.
06

Build a filter DSL

Add a small query syntax for filtering parsed rows and share it between naive and stream modes.

  • Create the filter module with separate parsing and predicate compilation boundaries.
  • Parse one-clause expressions for status comparisons, method equality, path containment, and inequality.
  • Compile each filter once so the row loop receives a ready predicate function.
  • Add a JSON row parser stage that converts line strings into row objects and reports malformed lines with numbers.
  • Add the stream filter stage and count rows seen and rows matched.
  • Move naive mode onto the shared compiled predicate so output comparisons use the same selection rules.
Shared filter behavior for whole-file and streaming modes, with matching counts for the same input and query.
07

Add transform plugins

Support pluggable row transformations for redaction, field selection, and enrichment.

  • Create a plugin module or plugin directory that builds an ordered list of row transforms from CLI options.
  • Implement redaction for requested fields before data leaves the process.
  • Implement field selection while preserving the requested output field order.
  • Implement enrichment fields such as status class and payload length.
  • Encode the plugin order in one place: filter, redact, enrich, select, then stringify.
  • Add a composition check that proves redaction, enrichment, and selection run in the intended order.
  • Record active plugins and their order in the stream report.
A stream path that can redact, enrich, narrow, and serialize rows while preserving auditable plugin order.
08

Add a slow sink simulator

Create a Writable stream that intentionally writes slowly and records backpressure signals.

  • Create a custom SlowWritable destination with configurable delay, highWaterMark, discard mode, and optional output path.
  • Count chunks, bytes, drain events, current writable length, writable highWaterMark, and max buffered bytes.
  • Delay each write callback so the destination becomes slower than the upstream stages.
  • Create the pressure command using the existing source, splitter, parser, filter, plugin, stringifier, and slow sink stages.
  • Support slow-write delay, write highWaterMark, read highWaterMark, and filter options.
  • Write reports/pressure.json with slow sink metrics and elapsed time.
A controlled slow writable destination that makes backpressure measurable.
09

Add backpressure visualization

Produce a timeline of readable pressure, writable pressure, throughput, drain events, and memory.

  • Add a sample interval option to the pressure command and record it in the report.
  • Collect timeline samples during the run and force one final sample during shutdown.
  • Sample RSS, heap used, external memory, readable queue length when available, and writable queue length from the slow sink.
  • Sample bytes read, bytes written, rows seen, rows matched, and per-window throughput.
  • Record drain events with elapsed timestamps from inside the drain listener.
  • Summarize max readable length, max writable length, max RSS, total drains, total bytes, and elapsed time.
A pressure timeline that shows queue length, drain events, throughput, and memory over time.
10

Make pipeline lifecycle explicit

Record completion, failure, and cleanup details around the stream/promises.pipeline boundary.

  • Refactor stream and pressure commands so each run creates fresh source, transform, and destination stages.
  • Use the promise-based pipeline API as the operation boundary.
  • Normalize the stream command stage list: source, counters, optional gunzip, splitter, parser, filter, plugins, stringifier, optional gzip, destination.
  • Normalize the pressure command through the same pipeline boundary while preserving slow sink metrics.
  • Record pipeline status, completion, error code, and tracked stage-close information.
  • Keep stream errors flowing through one try/catch/finally path before report finalization.
Stream and pressure reports that say whether the pipeline completed, failed, or cleaned up after an error.
11

Add abort and partial output cleanup

Cancel a running pipeline safely and keep partial files from being mistaken for successful output.

  • Support row-count aborts, timer aborts, and a keep-partial inspection flag.
  • Create one AbortController per run and pass its signal into the pipeline boundary.
  • Abort after the configured row threshold from inside the data path.
  • Abort after the configured timeout from outside the stream graph and clear the timer during cleanup.
  • Write file output through a temporary path and promote it only after successful completion.
  • Remove or mark partial output on abort while preserving the original abort reason in the report.
Aborted runs with status, reason, partial file policy, cleanup state, and no corrupt final output promotion.
12

Add compression

Add gzip compression for output and decompression for input, then report compression ratio and pressure effects.

  • Support gzip output mode, gzip level, and compressed output paths.
  • Read gzipped input through the existing gzip source path before line splitting.
  • Track compressed input bytes, uncompressed input bytes, uncompressed output bytes, and compressed output bytes.
  • Record gzip mode, gzip level, and compression ratio when compression is enabled.
  • Compare decompressed gzip content against raw NDJSON content for equivalent runs.
  • Run pressure tests in raw and gzip modes and compare drain counts, elapsed time, and byte counts.
Stream reports that show compression settings, byte boundaries, ratios, and pressure changes.
13

Compare modes

Compare naive, stream, slow sink, gzip, and highWaterMark variants in one report.

  • Create the comparison command and connect it to the compare script.
  • Define named scenarios for naive default, stream default, pressure default, slow pressure, gzip output, low read highWaterMark, and larger write highWaterMark.
  • Compute line counts, match counts, and SHA-256 content hashes for outputs that should be equivalent.
  • Read rows, matches, byte counts, elapsed time, peak RSS, peak heap, peak external memory, and drain count from scenario reports.
  • Write reports/comparison.md with one row per scenario and links to the JSON reports used.
  • Fail the comparison command when equivalent logical outputs disagree.
  • End the comparison with the production rule you would apply based on the measured result.
A mode comparison report that validates output equality and exposes memory, timing, pressure, and compression differences.
Workbench evidence

The reports make each mode auditable.

Lab 03 keeps the experiment grounded in files you can inspect: generator settings, whole-file memory, stream counts, splitter checks, pressure timelines, compression data, abort cleanup, and final mode comparison.

generator.jsonNDJSON settings, output path, rows, and bytes
naive.jsonwhole-file counts, timing, bytes, and peak memory
stream.jsonsource, filter, plugin, lifecycle, and output data
split-check.jsontiny-chunk line splitting validation
pressure.jsonslow sink queues, drains, memory, and throughput
stream-gzip.jsoncompressed and uncompressed byte boundaries
stream-abort.jsonabort reason, cleanup state, and partial files
comparison.mdmode table with hashes, metrics, and final rule

Choose your NodeBook package.

Switch between one-reader pricing and team licenses for up to 25 team members.

Individual pricing is for one reader and one personal purchase record.TEAM LICENSES INCLUDE UP TO 25 MEMBERS. DOWNLOADED CONTENT MAY BE
SHARED INTERNALLY WITH UP TO 30 PEOPLE TOTAL.
Downloadable book bundle

Digital Bundle

Volume I as EPUB, light/dark PDFs, slides, cheatsheets, and future updates.

$19.99$49.99
One-time purchase
  • Volume I EPUB for offline reading
  • Light and dark PDF editions
  • Slide decks for chapter review
  • Cheatsheets for quick lookup
  • Future Digital Bundle updates
  • Lifetime access to the files
Get Digital Bundle
This is the downloadable Volume I study bundle. It does not include Node Runtime Labs.
Best value
Everything together

NodeBook Pro

All Labs plus the downloadable Volume I bundle in one purchase. Save $9.99 vs buying separately.

$49.99$99.99
One-time purchase
Node Runtime Labs+Digital Bundle
  • Everything in Node Runtime Labs
  • Everything in the Digital Bundle
  • Seven included lab projects
  • Three upcoming labs when released
  • Future updates for both products
  • Lifetime access to purchased files
Get NodeBook Pro
This includes both paid products: Node Runtime Labs and the Digital Bundle.
Premium labs
Complex runtime projects

Node Runtime Labs

Seven long-form builds: recorder, binary store, stream workbench, resolver, watcher, task runtime, and protocol gateway.

$39.99$79.99
One-time purchase
  • Node Runtime Flight Recorder
  • Binary File Store / Append-Only Log Database
  • Stream Processing Workbench
  • Module Resolution Inspector
  • Atomic File Watcher + Incremental Build Cache
  • Async Task Runtime / Local Job Orchestrator
  • Custom Binary Protocol Gateway
  • Three more labs upcoming
Get Labs Bundle
This is the paid labs bundle. It does not include EPUB, PDFs, slides, or cheatsheets.
Team downloadable files

Digital Bundle Team

Volume I files, slides, cheatsheets, and updates for a small engineering team.

$59.99
One-time team license
  • Everything in the Digital Bundle
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • Single purchase for the team
  • Future Digital Bundle updates
  • Lifetime access to purchased files
Get Team Bundle
Covers up to 25 team members and internal sharing with up to 30 people.
Team value
Everything for the team

NodeBook Pro Team

Node Runtime Labs plus the downloadable Volume I bundle in one team license.

$149.99
One-time team license
Node Runtime Labs+Digital Bundle
  • Everything in NodeBook Pro
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • One receipt and license record
  • Future updates for both products
  • Save $29.99 vs team products separately.
Get Pro Team
Covers up to 25 team members and internal sharing with up to 30 people.
Team labs
Runtime training projects

Node Runtime Labs Team

Seven long-form builds for onboarding, study groups, and internal training.

$119.99
One-time team license
  • Node Runtime Flight Recorder
  • Binary File Store / Append-Only Log Database
  • Stream Processing Workbench
  • Module Resolution Inspector
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • Three more labs upcoming
Get Team Labs
Covers up to 25 team members and internal sharing with up to 30 people.
See complete pricing breakdown
Other labs in the bundle

Lab 03 sits inside the runtime lab set.

The bundle includes seven runtime projects covering process observation, binary storage, streams, module resolution, file watching, async orchestration, and custom protocols.

Stream Processing Workbench Lab | NodeBook Runtime Labs