← the Atlas

Performance — Where Kolu Can Get Faster

Analysis·seedling·proposed·

A living tuning map of the Kolu monorepo. Built from a 77-agent survey (10 subsystem investigators → adversarial verification → synthesis), it ledgers what's already shipped and ranks the real-but-bounded opportunities that remain — so we keep Kolu nimble and fast, by measurement, over time.

This is the Atlas hub for keeping Kolu nimble and fast — a living map of the monorepo’s performance surfaces, the wins already banked, and the opportunities worth tuning next. It was built by a survey workflow: 10 investigators, one per subsystem, each reading real source; adversarial verification of every finding against the code (this repo’s history shows plausible, code-cited perf diagnoses are often wrong — see memory-learnings and dock-and-eventloop-1308); then synthesis into the themes below. Of 66 raw findings, 35 survived as real or partial, 6 were confirmed already-shipped, and 25 were dropped as speculative or mechanically wrong.

The map

Green is banked; amber is open. The opportunities cluster on the client render path, the wire, the kaval backend, mobile timers, and the dev loop — with the GPU/canvas/diff hot paths already green.

Kolu's performance surfacesshippedto tuneCLIENT — render & loadReactivity — metadataderivation re-runsCanvas render looprAF-coalesced ✓ #1368WebGL contextscapped ✓ #1399Bundle — Code tab171 kB · deferredMarkdown / Pierreworker pool ✓ #1363Memory lifecyclestoresByKey freed ✓WIRE — surface / oRPC subscriptionsCollection key-set re-published in fullbase64 stdio framing (+33% bytes)BACKEND — server + kaval PTY hostkaval — Channel queues (item-bounded),scrollback snapshot per attachserver — full metadata objectpublished per live-field updateMOBILE — batteryheartbeat 15s · staleness ticker· per-terminal visibility listeners— wake regardless of visibilityDEV-LOOP / CIvitest serial across 44 packages· nix pnpmDeps hash double-fetch(2m45s darwin / 25s linux)

Top of the backlog

The actionable shortlist, ranked by leverage. Impact and effort are the verified estimates (post-adversarial-correction), not the original claims. The keystone (row 1) is now shipped; it clears the way for the rest of the reactivity cluster.

# Opportunity Surface Impact Effort The fix, in one line
1 Stabilize the terminalIds memo reference shipped Reactivity med med Done #1425 — the memo keeps its prior array when the id order is unchanged, so terminalIds() stops notifying downstream on non-display metadata writes (displayInfos still re-runs correctly for git/cwd/parentId changes via its own field-level subscription). Clears the rest of the reactivity cluster.
2 Per-key collection deltas on the wire Wire med med Publish {added/changed/removed} keys, not the full key array, on every upsert/remove.
3 Heartbeat: reconnect-on-resume fixed; hidden-tab battery open Mobile med med Reconnect-on-resume shipped — a measured wall-vs-monotonic clock gap voids a probe a sleep/freeze interrupted, and a focus / tab-visible event re-probes at once. Hidden-tab radio still wakes ≈240×/h; but gating the probe on hidden blinds the watchdog (a hidden tab still runs), so a battery fix must lengthen the hidden interval, not stop it.
4 Workspace-level test parallelism Dev-loop med low Add --workspace-concurrency so 44 packages don’t test one-at-a-time.
5 Lazy-load the Code tab — measured, deferred Bundle med high A/B build: the Code-tab tree is 171 kB gzip / 23% of the eager chunk. But activeTab defaults to code, so lazy-loading defers it async past first paint rather than skipping it for most desktop sessions — a faster-first-paint-vs-cold-flash trade, TTI untraced. Deferred. (investigation)
6 One-shot Nix pnpmDeps hash check Dev-loop med high Compute the hash once instead of two sequential builds (2m45s on darwin).
7 Avoid the Markdown Source⇄Rendered toggle re-sanitize shipped Markdown med med Done #1446FileView keeps both toggle modes alive (the #818 pattern), so a flip is a visibility change, not a remount: the full marked→DOMPurify→Shiki→innerHTML pipeline runs 0× per toggle (was 1×). Replaces the refuted “stabilize the resolver reference” claim — a measured no-op.

Already banked

So this map isn’t re-litigated: the wins below are shipped and verified — do not re-report them as opportunities. Remaining slivers inside them are noted in their themes as “remaining within …”.

Frontend hot paths

The client is where users feel speed. The canvas, WebGL, and diff paths are already green; what’s left is reactive over-derivation, eager bundle weight, and a few markdown tree-walks — all bounded at current scale.

Reactivity granularity

Low Keystone — stabilize the terminalIds memo reference — ✓ shipped

terminalIds was a createMemo running meta.keys().filter(...) (useTerminalMetadata.ts), returning a new array reference every run even when the contents were identical. The dependent displayInfos memo tracked that reference, so any single terminal’s metadata mutation re-ran buildTerminalDisplayInfos for all terminals (terminalDisplay.ts), allocating 4–5 intermediate collections each pass, and re-evaluated every tile’s Show gate on getDisplayInfo (TerminalCanvas.tsx). Done #1425 — the memo now carries a sameTerminalIdOrder equals gate, so it keeps the prior array whenever the top-level id set is unchanged — the set-shaped re-run path (terminalIds() returning a fresh reference) now fires only on a real add / remove / reorder, not on every metadata mutation. The accessor still re-runs cheaply; what it no longer does is notify downstream when the set is identical. Proven by a re-run-count regression test (useTerminalMetadata.test.ts), so banking it didn’t need the live trace the coverage gaps flag as still-pending. (Verification had corrected the original “O(n³)” claim to O(n log n) — the cost was wasted allocations + re-derivation, not algorithmic blowup.) One nuance the gate does not change: displayInfos reads the display-relevant fields (git, cwd, parentId) of each terminal’s metadata inside its own scope, so it carries a second, field-level subscription to those paths. Because the surface store writes via reconcile, PR / agent / foreground churn — the dominant ~1/s updates — never reached displayInfos even before this gate; a real git/cwd/parentId change still re-runs it, which is correct (the displayed identity genuinely changed). The gate closes the set-reference path; the field-level path is already as narrow as it should be.

Bundle & startup weight

Medium Lazy-load the Code tab — measured (171 kB gzip), deferred

An A/B production build measured the Code-tab tree — @kolu/solid-pierre’s FileTree, the @kolu/solid-markdown renderer, the diff/source view wrappers, and the comment system — at 629 kB raw / 171 kB gzip (23%) of the eager index chunk (it was a static import in RightPanel). Lazy-loading it works (built, lens/codex/simplify/police-reviewed, e2e-verified 115/115 on a pu box) — but it’s deferred, not shipped, because the value is narrower than the framing: activeTab defaults to code and the desktop panel opens by default, so on a typical desktop session CodeTab loads anyway, just async past first paint rather than skipped. The “many who only open terminals never parse it” win holds only for mobile (drawer closed) / collapsed panel; the common desktop case is a faster-first-paint-vs-cold-load-flash trade whose perceptual net is the untraced cold-start TTI — a speculative bet against a visible regression on the default surface. Two premises this refuted: Shiki grammars are already lazy (a dynamic import("shiki"), never on the eager path — and @pierre/diffs already runs in a Worker), and “lazy-load Image on first use” is mechanically impossible (ImageAddon must precede the image escape sequence). Full write-up + the unblock path (TTI trace, default-tab decision, or idle-preload): bundle-codetab-lazyload.

Markdown & code-tab rendering

Medium Markdown preview render cost — the resolver 'fix' was a no-op; the toggle remount is real

The original claim — BrowseFileDispatcher passes resolveImageSrc as an inline arrow (BrowseFileDispatcher.tsx:374-376), so Markdown’s memo re-runs — is a measured no-op. Stabilizing the reference eliminates zero sanitizeHtml runs, confirmed three ways: a faithful jsdom reproduction on the repo’s real Solid build (fresh arrow vs stable callback ⇒ byte-identical memo counts), FileView’s own design (FileView.tsx:91-98 — the appliance is re-rendered on every props.file snapshot, a remount), and the Solid compiler (the inline-arrow prop is static, never a reactive dependency). The defense-in-depth marker is inert too — sanitize re-parses from raw markdown each run, so a marker never survives to the next. The real cost the reproduction surfaced — now fixed: active() returned only the active branch, so a Source⇄Rendered toggle remounted and re-sanitized the whole doc (50-image doc: ~50 image-resolutions + a full parse/sanitize/highlight/DOM-reparse per toggle). Fixed #1446FileView now keeps both toggle modes alive (the #818 RightPanel pattern), so a flip is a visibility change, not a remount; the pipeline runs 0× per toggle. A per-slot heldFile snapshot defers a hidden mode’s refresh (no double-render on edit, no render(file) API change), and each comment overlay took a per-instance CSS-highlight name so the two kept-alive surfaces don’t contend. Proven by an e2e (the rendered preview element survives the round-trip). The content-keyed sanitize cache alternative was dropped as accidental complexity. Full write-up + reproduction: markdown-image-resolver-and-toggle.

Memory lifecycle

After the shipped storesByKey and history-browser fixes, the residual is one belt-and-suspenders item: the useComments persistedPref factory hand-rolls a per-terminalId signal (useComments.ts:36-83). Consumers wrap it in createMemo, so owners do auto-dispose — the leak claim was overstated — but moving to makePersisted from @solid-primitives would make owner cleanup automatic. low

Backend, wire & streaming

The path from server to client. The strongest new item is per-key wire deltas; the kaval items are hardening/observability wins, not the production OOM fix — the #1420 RCA explicitly ruled out the Channel queue, so the real leak lives elsewhere (likely scrollback/snapshots) and needs a dedicated heap snapshot.

Surface wire & subscriptions

Medium Publish per-key collection deltas, not the full key set

Every upsert/remove publishes the entire key array via keysBus.publish(Array.from(...)) — a fresh object each time — which crosses the wire and triggers client mapArray reconciliation (surface/server.ts:1218-1223, useCollection.ts:60-65). mapArray avoids per-key subscription churn, so the cost is wasted allocations + transmission + memo recompute, not thrashing. Fix: publish discriminated {added:[k]}/{changed:[k]}/{removed:[k]} deltas, emitting the full set only on init; let useCollection apply them. Measure: keysBus publish frequency and payload sizes during terminal spawn/metadata churn.

kaval memory & streaming

Low Bound subscriber queues by bytes, not just item count

Each subscriber queue caps at maxQueue (10k items) with no byte bound (kaval/channel.ts:54-132), so a stalled subscriber on a 1 KB/event PTY could pin ~10 MB before being dropped. Fix: track queue byte size at publish and drop when either item-count or a new maxQueueBytes is exceeded. Note: the #1420 RCA rules this out as the OOM source — this is a known-constant memory cap, not the leak fix.

Mobile battery & wake-ups

Timers and listeners that run regardless of visibility — the clearest new cluster, and the one most likely to matter on a real phone (where it is, so far, unmeasured).

Medium Heartbeat probe: reconnect-on-resume fixed; hidden-tab battery still open

createHeartbeat() runs system.live / identity.info() every 15s while the socket is OPEN. Two distinct costs hid behind one finding:

  • Spurious reconnect on resume — FIXED #1598. A laptop sleep / tab freeze / app-switch paused the event loop; the probe’s 10s timeout fired overdue on resume and forced a reconnect over a still-healthy socket (the brief “Disconnected” flash — a regression from the default-on watchdog #1545). The watchdog now compares elapsed WALL time against elapsed MONOTONIC time across each probe and voids-and-re-probes a window a suspension crossed — the browser-leg analog of the ssh leg’s wall-clock-gap wake watcher #1078 — and a window-focus / tab-visible wake event re-probes immediately; the full-screen overlay is grace-windowed too, so a sub-second reconnect never flashes.
  • Hidden-tab radio wake — still open. The probe still runs ≈240×/hour while backgrounded, forcing the mobile radio idle→active. The tempting fix — stop the interval while document.visibilityState === 'hidden' — is a coverage regression: a hidden tab is still running, so its probe timeout is real, and gating it blinds the watchdog to a genuine half-open during a long background. A battery fix must keep watching — lengthen the hidden-tab interval (30–60s), not stop it. Measure: packet-capture probes/hour backgrounded on a real phone — target under ~10, vs ~240 today.

Dev-loop & CI

Iteration speed is a performance surface too. Two confirmed-real wins, both with concrete cost.

Medium Parallelize tests across the workspace

pnpm -r serializes package test runs (no --workspace-concurrency), so on a multi-core machine the 44 packages run roughly one-at-a-time even though each vitest threads internally — workspace-level parallelism is unused (package.json:7, per-package vitest.config.ts). Fix: enable workspace concurrency in the test:unit recipe; consider vitest --shard for the slowest packages (e.g. git/index.test.ts). Measure: just test-unit baseline vs --workspace-concurrency N.

Medium Compute the Nix pnpmDeps hash once, not twice

ci::pnpm-hash-fresh runs two sequential nix builds (the second --rebuild) with no cache reuse, so pnpm install runs fully twice — measured at 2m45s on darwin / 25s on linux (ci/mod.just:82-84, default.nix:154-159). Fix: compute the actual hash once into a temp derivation, then compare against the declared hash in a pure eval step, deferring the fetch to a single store-locked derivation. Measure: cold-store nix build .#pnpmDeps --no-link then --rebuild --no-link, total wall-clock before/after.

From static reads to live traces

Every finding above is a static code read. Verification corrected several overstated claims (no “high” survived; “O(n³)” was O(n log n); a “100–400 KB” snapshot is likely ~4 KB) precisely because nobody had a number. The next round moves from reading to measuring — these are the gaps this map does not yet rest on: