thenodebook.com
Lab 02 / Node Runtime Labs

Binary File Store / Append-Only Log Database.

Build a small local database from byte layout up: binary records, append-only segment files, direct indexed reads, corruption checks, recovery, tombstones, compaction, Buffer ownership measurements, and benchmark reports.

Storage commandsAppend-only database loop
  • 01Help
    $ npm run db -- --help

    Checks the command router before storage exists.

  • 02Write
    $ npm run put -- name ish

    Appends a binary record to the segment file.

  • 03Read
    $ npm run get -- name

    Uses the latest offset from the in-memory index.

  • 04Recover
    $ npm run recover

    Truncates broken tails after a simulated crash.

  • 05Bench
    $ npm run bench

    Compares replay, persisted index, and compaction cost.

Write records, read by index, then test recovery paths.
14Phases
AdvancedDifficulty
0 depsPackage installs
What this lab builds

The finished tool is a real file-backed store.

Binary record format

Define a 28-byte header with magic bytes, version, record type, reserved bytes, key length, value length, timestamp, checksum, and payload offsets.

Append-only storage

Build a CLI that encodes records, appends them to a segment file, tracks byte offsets, scans records sequentially, and inspects one record by offset.

Indexes and recovery

Replay the segment into an in-memory index, persist the index, validate segment size, detect corruption, and truncate a partial tail to the last valid record.

Compaction and reports

Represent deletes as tombstones, compact live records into a replacement segment, measure Buffer ownership modes, and write benchmark reports.

Complete phase plan

Every phase in Lab 02.

The build starts with a command shell and ends with benchmark reports. Each phase adds one storage capability and one artifact a developer can inspect.

00

Create the project

Create the storage engine shell with a runnable CLI, fixed folders, npm scripts, help output, strict command errors, and command placeholders.

  • Create a private ESM package with a Node.js v24-or-newer engine requirement.
  • Separate source files, mutable database state, generated reports, and repeatable fixtures into dedicated folders.
  • Create the CLI command router and make help succeed without creating database files.
  • Add scripts for db, put, get, scan, delete, inspect, inspect-format, encode, decode, recover, compact, bench, and test.
  • Print command names, default paths, and the main command shapes in the help output.
  • Reject unknown commands through stderr with a failure exit code.
  • Add placeholders for put, get, scan, and inspect so routing is proven before storage logic exists.
A project shell with package metadata, folders, command routing, help output, npm scripts, and strict placeholder behavior.
01

Define the record format

Design the byte layout for one database record and document the file format beside the implementation constants.

  • Create a format module that owns magic bytes, version, record types, header size, and field offsets.
  • Define the 28-byte header with magic bytes, version, type, reserved bytes, key length, value length, timestamp, and checksum.
  • Choose big-endian numeric fields and UTF-8 key/value payload bytes.
  • Define PUT as record type 1 and DELETE as record type 2.
  • Add an inspect-format command that prints offset, width, field name, and validation rule from the shared constants.
  • Write docs/format.md with record types, payload order, byte order, and checksum coverage.
A documented binary disk contract with one source of truth for header offsets, widths, type values, and payload rules.
02

Encode records into Buffers

Convert key/value data into one contiguous Buffer that matches the record format.

  • Create the encoder module and expose one record construction function.
  • Convert keys and values into UTF-8 Buffers and validate byte length instead of string length.
  • Reject empty keys, unknown record types, and fields that cannot fit into unsigned 32-bit length slots.
  • Allocate exactly header size plus key bytes plus value bytes, then write the fixed header fields.
  • Copy key bytes after the header and value bytes immediately after key bytes.
  • Add an encode CLI, a hex dump helper, and tests for length, header fields, and payload order.
An encoder that produces inspectable record bytes and tests that protect the format before disk writes enter the project.
03

Decode records from Buffers

Parse a Buffer back into record data, validate malformed input, and prove encode/decode round trips.

  • Create the decoder module and return type, key, value, timestamp, checksum, byte lengths, record length, and payload offset.
  • Validate incomplete headers, invalid magic bytes, unsupported versions, and unknown record types before reading payload data.
  • Compute the full record length from header fields and reject incomplete payload buffers.
  • Decode key and value bytes into UTF-8 strings only after boundary validation succeeds.
  • Add copy, slice, and subarray payload ownership modes for later memory measurement.
  • Add a decode CLI and round-trip tests for ASCII, non-ASCII, invalid headers, and incomplete payloads.
A decoder that validates fixed headers, calculates payload boundaries, supports payload ownership modes, and round-trips encoded records.
04

Append records to a segment file

Write encoded records to disk through append-only segment storage and report the byte address for each write.

  • Centralize paths for the segment file, persisted index, and reports.
  • Create a segment writer that opens the segment in append mode and creates the data directory when needed.
  • Read the current file size before writing so the append offset becomes the record address.
  • Write every byte of the encoded record and treat short writes as failures or loop until complete.
  • Add an explicit sync mode that calls FileHandle sync after the append.
  • Replace the put placeholder with a real append path that prints segment path, offset, length, and timestamp.
A segment file that grows by encoded record length and CLI output that exposes the address of every appended record.
05

Sequentially scan segment files

Read records from disk in order without using an index, then inspect one record by byte offset.

  • Create an async generator that yields offset, length, and decoded record data for each segment entry.
  • Read the fixed header first so payload lengths are known before the full record Buffer is requested.
  • Read the remaining payload and handle empty files, partial headers, and partial payloads as separate cases.
  • Decode complete records and advance the scanner by the decoder's record length.
  • Replace the scan placeholder and print offset, length, type, key, value, and timestamp in file order.
  • Add offset inspection so a record reported by put can be decoded directly from its byte position.
A scanner that treats the segment as the durable record of writes and an inspect path for direct offset reads.
06

Add checksums

Compute and verify CRC-32 checksums so corrupted record bytes are detected during reads.

  • Reset copied mutable segment data so checksum-aware records start from a clean file.
  • Create checksum helpers based on Node's zlib CRC-32 support.
  • Compute the checksum after header and payload bytes are final, then store it in the checksum field.
  • Verify checksums during decode, with an explicit option for checksum-disabled reads when needed.
  • Report checksum failures from scans with the segment offset and a failure exit code.
  • Add a repeatable corruption helper or test that flips one payload byte at a known offset.
  • Restore a clean checksum-valid working segment before continuing into index work.
Checksum-valid records, repeatable corruption detection, offset-aware scan failures, and a clean segment for the next phase.
07

Build an in-memory index

Map keys to latest record offsets so get can seek directly to the newest value.

  • Create an index module that builds a Map from segment scan output and applies each record in order.
  • Store key, type, segment, offset, length, and timestamp in each index entry.
  • Add point reads that decode one record from a segment path and byte offset.
  • Build the index during CLI startup, find the requested key, seek to the latest offset, and print the value.
  • Test repeated puts for the same key so the newest offset wins after replay.
An index derived from the append-only log, plus direct lookup behavior that reads the latest record by byte address.
08

Persist the index

Save and reload the index from disk so startup can skip a full replay when the cached view matches the segment.

  • Persist the index as data/index.json to keep the phase focused on cache validity.
  • Store version, segment, segment size, update timestamp, and an array of entries.
  • After successful puts, update the in-memory index and write the persisted index for the new segment state.
  • Write the index atomically through a temp file in data and rename it into place.
  • During get, load the persisted index when segment size matches and replay the segment when it does not.
  • Write an index-load report with mode, entry count, segment size, index load time, replay time, and lookup time.
A validated persisted index cache and a report that shows whether lookup used persisted-index or full-replay mode.
09

Recover from partial writes

Detect records cut off mid-write and truncate the segment to the end of the last complete valid record.

  • Create a recovery module that returns a report object and owns the truncation write.
  • Scan from offset zero with checksum verification while tracking last valid offset, last valid length, and valid record count.
  • Stop at the first incomplete header, incomplete payload, invalid magic, invalid version, invalid type, or checksum failure.
  • Truncate the segment to the end of the last valid record, then rebuild and persist the index.
  • Write a recovery report with original size, recovered size, lost bytes, failure offset, failure reason, and valid record count.
An explicit recovery command that preserves the valid prefix, removes a broken tail, rebuilds indexes, and leaves an audit report.
10

Add tombstones for deletes

Support deletion by appending DELETE records instead of rewriting existing records.

  • Teach the encoder to create DELETE records with a key and zero-length value.
  • Replace the delete placeholder with a command that appends a tombstone record.
  • Update index application so a key can have a latest deleted state with offset, length, and timestamp.
  • Make get report deleted keys differently from missing keys.
  • Keep scan historical so old PUT records and later DELETE tombstones still print in file order.
Deletes represented as durable tombstones, lookup output that separates deleted from missing, and historical scan output.
11

Add compaction

Rewrite live records into a smaller segment and drop overwritten records plus tombstones.

  • Create the compaction module and route the compact command to it.
  • Build the current index from the segment and select only live PUT entries.
  • Read each live record by offset and write it into a compacting segment with new offsets.
  • Sync the compacting file, move the current segment to a backup, rename compacting into place, and write a fresh index.
  • Add startup cleanup for leftover compacting or backup files from interrupted compactions.
  • Write a compaction report with before/after size, records read, records written, dropped tombstones, dropped overwritten records, and duration.
A smaller live segment, a fresh index with new offsets, deterministic startup cleanup, and a compaction report.
12

Measure Buffer ownership

Compare copy-heavy and view-heavy parsing modes during large scans.

  • Wire copy, slice, and subarray payload modes through scan.
  • Generate a large deterministic dataset so external memory and ArrayBuffer memory move enough to measure.
  • Sample rss, heapUsed, external, and arrayBuffers before scan, during scan, and after cleanup.
  • Add retention mode so decoded key/value Buffers can stay alive until the final memory sample.
  • Write a buffer ownership report that compares parser mode, record count, retained mode, and memory fields.
A memory report that compares copy, slice, and subarray behavior during retained and non-retained scans.
13

Add benchmarks

Benchmark puts, gets, scans, recovery, and compaction, then write machine-readable and readable reports.

  • Create a benchmark runner that owns setup, measurement, cleanup, and report generation.
  • Generate deterministic data for sequential puts, repeated puts, deletes, recovery, and compaction.
  • Measure sequential puts, repeated puts, indexed gets, full scans, recovery after a truncated tail, and compaction after overwrites and deletes.
  • Capture segment size, index size, rss, heapUsed, external, arrayBuffers, recovery time, compaction time, Node version, platform, CPU count, and timestamp.
  • Write JSON and Markdown benchmark reports with ops/sec, file size, memory, sync mode, dataset size, and checksum context.
Benchmark reports that measure disk size, memory, throughput, recovery cost, and compaction cost from a clean generated dataset.
Storage evidence

The reports make the file behavior visible.

Lab 02 treats generated files as part of the learning surface: format notes, segment inspection, index-load timing, recovery evidence, compaction deltas, Buffer ownership measurements, and benchmark summaries.

docs/format.mdrecord layout, byte order, and payload rules
segment-0001.datappend-only binary records
data/index.jsonlatest key-to-offset entries
index-load.jsonpersisted-index versus replay timing
recovery.jsonvalid prefix, broken tail, and lost bytes
compaction.jsonbefore/after size and dropped records
buffer-ownership.jsoncopy, slice, and subarray memory data
bench.mdthroughput, size, memory, and recovery results

Choose your NodeBook package.

Switch between one-reader pricing and team licenses for up to 25 team members.

Individual pricing is for one reader and one personal purchase record.TEAM LICENSES INCLUDE UP TO 25 MEMBERS. DOWNLOADED CONTENT MAY BE
SHARED INTERNALLY WITH UP TO 30 PEOPLE TOTAL.
Downloadable book bundle

Digital Bundle

Volume I as EPUB, light/dark PDFs, slides, cheatsheets, and future updates.

$19.99$49.99
One-time purchase
  • Volume I EPUB for offline reading
  • Light and dark PDF editions
  • Slide decks for chapter review
  • Cheatsheets for quick lookup
  • Future Digital Bundle updates
  • Lifetime access to the files
Get Digital Bundle
This is the downloadable Volume I study bundle. It does not include Node Runtime Labs.
Best value
Everything together

NodeBook Pro

All Labs plus the downloadable Volume I bundle in one purchase. Save $9.99 vs buying separately.

$49.99$99.99
One-time purchase
Node Runtime Labs+Digital Bundle
  • Everything in Node Runtime Labs
  • Everything in the Digital Bundle
  • Seven included lab projects
  • Three upcoming labs when released
  • Future updates for both products
  • Lifetime access to purchased files
Get NodeBook Pro
This includes both paid products: Node Runtime Labs and the Digital Bundle.
Premium labs
Complex runtime projects

Node Runtime Labs

Seven long-form builds: recorder, binary store, stream workbench, resolver, watcher, task runtime, and protocol gateway.

$39.99$79.99
One-time purchase
  • Node Runtime Flight Recorder
  • Binary File Store / Append-Only Log Database
  • Stream Processing Workbench
  • Module Resolution Inspector
  • Atomic File Watcher + Incremental Build Cache
  • Async Task Runtime / Local Job Orchestrator
  • Custom Binary Protocol Gateway
  • Three more labs upcoming
Get Labs Bundle
This is the paid labs bundle. It does not include EPUB, PDFs, slides, or cheatsheets.
Team downloadable files

Digital Bundle Team

Volume I files, slides, cheatsheets, and updates for a small engineering team.

$59.99
One-time team license
  • Everything in the Digital Bundle
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • Single purchase for the team
  • Future Digital Bundle updates
  • Lifetime access to purchased files
Get Team Bundle
Covers up to 25 team members and internal sharing with up to 30 people.
Team value
Everything for the team

NodeBook Pro Team

Node Runtime Labs plus the downloadable Volume I bundle in one team license.

$149.99
One-time team license
Node Runtime Labs+Digital Bundle
  • Everything in NodeBook Pro
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • One receipt and license record
  • Future updates for both products
  • Save $29.99 vs team products separately.
Get Pro Team
Covers up to 25 team members and internal sharing with up to 30 people.
Team labs
Runtime training projects

Node Runtime Labs Team

Seven long-form builds for onboarding, study groups, and internal training.

$119.99
One-time team license
  • Node Runtime Flight Recorder
  • Binary File Store / Append-Only Log Database
  • Stream Processing Workbench
  • Module Resolution Inspector
  • Supports up to 25 team members
  • Share internally with up to 30 people
  • Three more labs upcoming
Get Team Labs
Covers up to 25 team members and internal sharing with up to 30 people.
See complete pricing breakdown
Other labs in the bundle

Lab 02 sits beside six more builds.

The bundle includes seven runtime projects covering process observation, binary storage, streams, module resolution, file watching, async orchestration, and custom protocols.

Binary File Store / Append-Only Log Database Lab | NodeBook Runtime Labs