Lab 02 · Node Runtime Labs

Binary log database.

Build a small local database from byte layout up: binary records, append-only segment files, direct indexed reads, corruption checks, recovery, tombstones, compaction, Buffer ownership measurements, and benchmark reports.

Read the phase plan Get the lab

14Phases

AdvancedDifficulty

0 depsPackage installs

Storage commandsAppend-only database loop

01Help
$ npm run db -- --help
Checks the command router before storage exists.
02Write
$ npm run put -- name ish
Appends a binary record to the segment file.
03Read
$ npm run get -- name
Uses the latest offset from the in-memory index.
04Recover
$ npm run recover
Truncates broken tails after a simulated crash.
05Bench
$ npm run bench
Compares replay, persisted index, and compaction cost.

Write records, read by index, then test recovery paths.

What this lab builds

The finished tool is a real file-backed store.

Binary record format

Define a 28-byte header with magic bytes, version, record type, reserved bytes, key length, value length, timestamp, checksum, and payload offsets.

Append-only storage

Build a CLI that encodes records, appends them to a segment file, tracks byte offsets, scans records sequentially, and inspects one record by offset.

Indexes and recovery

Replay the segment into an in-memory index, persist the index, validate segment size, detect corruption, and truncate a partial tail to the last valid record.

Compaction and reports

Represent deletes as tombstones, compact live records into a replacement segment, measure Buffer ownership modes, and write benchmark reports.

Complete phase plan

Every phase in Lab 02.

The build starts with a command shell and ends with benchmark reports. Each phase adds one storage capability and one artifact a developer can inspect.

00Create the projectCreate the storage engine shell with a runnable CLI, fixed folders, npm scripts, help output, strict command errors, and command placeholders.

Tasks

Create a private ESM package with a Node.js v24-or-newer engine requirement.
Separate source files, mutable database state, generated reports, and repeatable fixtures into dedicated folders.
Create the CLI command router and make help succeed without creating database files.
Add scripts for db, put, get, scan, delete, inspect, inspect-format, encode, decode, recover, compact, bench, and test.
Print command names, default paths, and the main command shapes in the help output.
Reject unknown commands through stderr with a failure exit code.
Add placeholders for put, get, scan, and inspect so routing is proven before storage logic exists.

Artifact

A project shell with package metadata, folders, command routing, help output, npm scripts, and strict placeholder behavior.

01Define the record formatDesign the byte layout for one database record and document the file format beside the implementation constants.

Tasks

Create a format module that owns magic bytes, version, record types, header size, and field offsets.
Define the 28-byte header with magic bytes, version, type, reserved bytes, key length, value length, timestamp, and checksum.
Choose big-endian numeric fields and UTF-8 key/value payload bytes.
Define PUT as record type 1 and DELETE as record type 2.
Add an inspect-format command that prints offset, width, field name, and validation rule from the shared constants.
Write docs/format.md with record types, payload order, byte order, and checksum coverage.

Artifact

A documented binary disk contract with one source of truth for header offsets, widths, type values, and payload rules.

02Encode records into BuffersConvert key/value data into one contiguous Buffer that matches the record format.

Tasks

Create the encoder module and expose one record construction function.
Convert keys and values into UTF-8 Buffers and validate byte length instead of string length.
Reject empty keys, unknown record types, and fields that cannot fit into unsigned 32-bit length slots.
Allocate exactly header size plus key bytes plus value bytes, then write the fixed header fields.
Copy key bytes after the header and value bytes immediately after key bytes.
Add an encode CLI, a hex dump helper, and tests for length, header fields, and payload order.

Artifact

An encoder that produces inspectable record bytes and tests that protect the format before disk writes enter the project.

03Decode records from BuffersParse a Buffer back into record data, validate malformed input, and prove encode/decode round trips.

Tasks

Create the decoder module and return type, key, value, timestamp, checksum, byte lengths, record length, and payload offset.
Validate incomplete headers, invalid magic bytes, unsupported versions, and unknown record types before reading payload data.
Compute the full record length from header fields and reject incomplete payload buffers.
Decode key and value bytes into UTF-8 strings only after boundary validation succeeds.
Add copy, slice, and subarray payload ownership modes for later memory measurement.
Add a decode CLI and round-trip tests for ASCII, non-ASCII, invalid headers, and incomplete payloads.

Artifact

A decoder that validates fixed headers, calculates payload boundaries, supports payload ownership modes, and round-trips encoded records.

04Append records to a segment fileWrite encoded records to disk through append-only segment storage and report the byte address for each write.

Tasks

Centralize paths for the segment file, persisted index, and reports.
Create a segment writer that opens the segment in append mode and creates the data directory when needed.
Read the current file size before writing so the append offset becomes the record address.
Write every byte of the encoded record and treat short writes as failures or loop until complete.
Add an explicit sync mode that calls FileHandle sync after the append.
Replace the put placeholder with a real append path that prints segment path, offset, length, and timestamp.

Artifact

A segment file that grows by encoded record length and CLI output that exposes the address of every appended record.

05Sequentially scan segment filesRead records from disk in order without using an index, then inspect one record by byte offset.

Tasks

Create an async generator that yields offset, length, and decoded record data for each segment entry.
Read the fixed header first so payload lengths are known before the full record Buffer is requested.
Read the remaining payload and handle empty files, partial headers, and partial payloads as separate cases.
Decode complete records and advance the scanner by the decoder's record length.
Replace the scan placeholder and print offset, length, type, key, value, and timestamp in file order.
Add offset inspection so a record reported by put can be decoded directly from its byte position.

Artifact

A scanner that treats the segment as the durable record of writes and an inspect path for direct offset reads.

06Add checksumsCompute and verify CRC-32 checksums so corrupted record bytes are detected during reads.

Tasks

Reset copied mutable segment data so checksum-aware records start from a clean file.
Create checksum helpers based on Node's zlib CRC-32 support.
Compute the checksum after header and payload bytes are final, then store it in the checksum field.
Verify checksums during decode, with an explicit option for checksum-disabled reads when needed.
Report checksum failures from scans with the segment offset and a failure exit code.
Add a repeatable corruption helper or test that flips one payload byte at a known offset.
Restore a clean checksum-valid working segment before continuing into index work.

Artifact

Checksum-valid records, repeatable corruption detection, offset-aware scan failures, and a clean segment for the next phase.

07Build an in-memory indexMap keys to latest record offsets so get can seek directly to the newest value.

Tasks

Create an index module that builds a Map from segment scan output and applies each record in order.
Store key, type, segment, offset, length, and timestamp in each index entry.
Add point reads that decode one record from a segment path and byte offset.
Build the index during CLI startup, find the requested key, seek to the latest offset, and print the value.
Test repeated puts for the same key so the newest offset wins after replay.

Artifact

An index derived from the append-only log, plus direct lookup behavior that reads the latest record by byte address.

08Persist the indexSave and reload the index from disk so startup can skip a full replay when the cached view matches the segment.

Tasks

Persist the index as data/index.json to keep the phase focused on cache validity.
Store version, segment, segment size, update timestamp, and an array of entries.
After successful puts, update the in-memory index and write the persisted index for the new segment state.
Write the index atomically through a temp file in data and rename it into place.
During get, load the persisted index when segment size matches and replay the segment when it does not.
Write an index-load report with mode, entry count, segment size, index load time, replay time, and lookup time.

Artifact

A validated persisted index cache and a report that shows whether lookup used persisted-index or full-replay mode.

09Recover from partial writesDetect records cut off mid-write and truncate the segment to the end of the last complete valid record.

Tasks

Create a recovery module that returns a report object and owns the truncation write.
Scan from offset zero with checksum verification while tracking last valid offset, last valid length, and valid record count.
Stop at the first incomplete header, incomplete payload, invalid magic, invalid version, invalid type, or checksum failure.
Truncate the segment to the end of the last valid record, then rebuild and persist the index.
Write a recovery report with original size, recovered size, lost bytes, failure offset, failure reason, and valid record count.

Artifact

An explicit recovery command that preserves the valid prefix, removes a broken tail, rebuilds indexes, and leaves an audit report.

10Add tombstones for deletesSupport deletion by appending DELETE records instead of rewriting existing records.

Tasks

Teach the encoder to create DELETE records with a key and zero-length value.
Replace the delete placeholder with a command that appends a tombstone record.
Update index application so a key can have a latest deleted state with offset, length, and timestamp.
Make get report deleted keys differently from missing keys.
Keep scan historical so old PUT records and later DELETE tombstones still print in file order.

Artifact

Deletes represented as durable tombstones, lookup output that separates deleted from missing, and historical scan output.

11Add compactionRewrite live records into a smaller segment and drop overwritten records plus tombstones.

Tasks

Create the compaction module and route the compact command to it.
Build the current index from the segment and select only live PUT entries.
Read each live record by offset and write it into a compacting segment with new offsets.
Sync the compacting file, move the current segment to a backup, rename compacting into place, and write a fresh index.
Add startup cleanup for leftover compacting or backup files from interrupted compactions.
Write a compaction report with before/after size, records read, records written, dropped tombstones, dropped overwritten records, and duration.

Artifact

A smaller live segment, a fresh index with new offsets, deterministic startup cleanup, and a compaction report.

12Measure Buffer ownershipCompare copy-heavy and view-heavy parsing modes during large scans.

Tasks

Wire copy, slice, and subarray payload modes through scan.
Generate a large deterministic dataset so external memory and ArrayBuffer memory move enough to measure.
Sample rss, heapUsed, external, and arrayBuffers before scan, during scan, and after cleanup.
Add retention mode so decoded key/value Buffers can stay alive until the final memory sample.
Write a buffer ownership report that compares parser mode, record count, retained mode, and memory fields.

Artifact

A memory report that compares copy, slice, and subarray behavior during retained and non-retained scans.

13Add benchmarksBenchmark puts, gets, scans, recovery, and compaction, then write machine-readable and readable reports.

Tasks

Create a benchmark runner that owns setup, measurement, cleanup, and report generation.
Generate deterministic data for sequential puts, repeated puts, deletes, recovery, and compaction.
Measure sequential puts, repeated puts, indexed gets, full scans, recovery after a truncated tail, and compaction after overwrites and deletes.
Capture segment size, index size, rss, heapUsed, external, arrayBuffers, recovery time, compaction time, Node version, platform, CPU count, and timestamp.
Write JSON and Markdown benchmark reports with ops/sec, file size, memory, sync mode, dataset size, and checksum context.

Artifact

Benchmark reports that measure disk size, memory, throughput, recovery cost, and compaction cost from a clean generated dataset.

Storage evidence

The reports make the file behavior visible.

Lab 02 treats generated files as part of the learning surface: format notes, segment inspection, index-load timing, recovery evidence, compaction deltas, Buffer ownership measurements, and benchmark summaries.

docs/format.mdrecord layout, byte order, and payload rules

segment-0001.datappend-only binary records

data/index.jsonlatest key-to-offset entries

index-load.jsonpersisted-index versus replay timing

recovery.jsonvalid prefix, broken tail, and lost bytes

compaction.jsonbefore/after size and dropped records

buffer-ownership.jsoncopy, slice, and subarray memory data

bench.mdthroughput, size, memory, and recovery results

Choose your
NodeBook package.

Buy a single volume or lock in every volume at once. Switch between one-reader pricing and team licenses for up to 25 members.

Choose volume

Individual pricing is for one reader and one personal purchase record.

Downloadable book bundle

Digital Bundle

Volume I as EPUB, light and dark PDFs, slides, cheatsheets, and future updates.

$19.99$49.99

One-time purchase

Volume I EPUB for offline reading
Light and dark PDF editions
Slide decks for chapter review
Cheatsheets for quick lookup
Future Digital Bundle updates
Lifetime access to the files

Get Digital Bundle

This is the downloadable Volume I study bundle. It does not include Node Runtime Labs.

Best value

Everything for this volume

NodeBook Pro

Volume I Labs plus its downloadable bundle in one purchase. Save $9.99 vs buying the Digital Bundle and Labs separately.

$49.99$99.99

One-time purchase

Node Runtime LabsDigital Bundle

Everything in Node Runtime Labs
Everything in the Digital Bundle
Volume I labs and book bundle
Future updates for both products
Lifetime access to purchased files

Get NodeBook Pro

Includes both paid products for Volume I.

Premium labs

Complex runtime projects

Node Runtime Labs

Volume I long-form builds with checkpoints, hints, debugging notes, and expected output.

$39.99$79.99

One-time purchase

Volume I runtime lab projects
Phase-by-phase build instructions
Checkpoints, hints, and rubrics
Debugging notes and expected output
Reflection questions
Future lab updates

Get Labs Bundle

This is the paid labs bundle for Volume I. It does not include EPUB, PDFs, slides, or cheatsheets.

See complete pricing breakdown

Other labs in the bundle

Lab 02 sits beside six more builds.

The bundle includes seven runtime projects covering process observation, binary storage, streams, module resolution, file watching, async orchestration, and custom protocols.

Lab 01