Performance — Where Kolu Can Get Faster

This is the Atlas hub for keeping Kolu nimble and fast — a living map of the monorepo’s performance surfaces, the wins already banked, and the opportunities worth tuning next. It was built by a survey workflow: 10 investigators, one per subsystem, each reading real source; adversarial verification of every finding against the code (this repo’s history shows plausible, code-cited perf diagnoses are often wrong — see memory-learnings and dock-and-eventloop-1308); then synthesis into the themes below. Of 66 raw findings, 35 survived as real or partial, 6 were confirmed already-shipped, and 25 were dropped as speculative or mechanically wrong.

The honest headline — Kolu is already measurement-disciplined

Verification downgraded every “high-impact” claim to medium. The big wins are already in: off-screen aura mounting, rAF-coalesced canvas gestures, the worker-pool diff highlighter, lazy/capped WebGL, and the storesByKey cleanup. What remains is mostly real-but-bounded structural inefficiency, not acute regressions — so the discipline this map enforces is: measure before you tune. Two clusters carry the most leverage — (1) reactivity granularity in the terminal-metadata pipeline (an unstable array reference made terminalIds() notify downstream on every metadata write, even ones that left the id set unchanged) and the wire layer’s full-set broadcasts; and (2) mobile battery hygiene (timers that wake the device regardless of visibility). Nearly every item below ends in “…but confirm with a trace first.”

The map

Green is banked; amber is open. The opportunities cluster on the client render path, the wire, the kaval backend, mobile timers, and the dev loop — with the GPU/canvas/diff hot paths already green.

Top of the backlog

The actionable shortlist, ranked by leverage. Impact and effort are the verified estimates (post-adversarial-correction), not the original claims. The keystone (row 1) is now shipped; it clears the way for the rest of the reactivity cluster.

#	Opportunity	Surface	Impact	Effort	The fix, in one line
1	Stabilize the `terminalIds` memo reference shipped	Reactivity	med	med	Done #1425 — the memo keeps its prior array when the id order is unchanged, so `terminalIds()` stops notifying downstream on non-display metadata writes (`displayInfos` still re-runs correctly for `git`/`cwd`/`parentId` changes via its own field-level subscription). Clears the rest of the reactivity cluster.
2	Per-key collection deltas on the wire	Wire	med	med	Publish `{added/changed/removed}` keys, not the full key array, on every upsert/remove.
3	Heartbeat: reconnect-on-resume fixed; hidden-tab battery open	Mobile	med	med	Reconnect-on-resume shipped — a measured wall-vs-monotonic clock gap voids a probe a sleep/freeze interrupted, and a focus / tab-visible event re-probes at once. Hidden-tab radio still wakes ≈240×/h; but gating the probe on `hidden` blinds the watchdog (a hidden tab still runs), so a battery fix must lengthen the hidden interval, not stop it.
4	Workspace-level test parallelism	Dev-loop	med	low	Add `--workspace-concurrency` so 44 packages don’t test one-at-a-time.
5	Lazy-load the Code tab — measured, deferred	Bundle	med	high	A/B build: the Code-tab tree is 171 kB gzip / 23% of the eager chunk. But `activeTab` defaults to `code`, so lazy-loading defers it async past first paint rather than skipping it for most desktop sessions — a faster-first-paint-vs-cold-flash trade, TTI untraced. Deferred. (investigation)
6	One-shot Nix `pnpmDeps` hash check	Dev-loop	med	high	Compute the hash once instead of two sequential builds (2m45s on darwin).
7	Avoid the Markdown Source⇄Rendered toggle re-sanitize shipped	Markdown	med	med	Done #1446 — `FileView` keeps both toggle modes alive (the #818 pattern), so a flip is a visibility change, not a remount: the full marked→DOMPurify→Shiki→`innerHTML` pipeline runs 0× per toggle (was 1×). Replaces the refuted “stabilize the resolver reference” claim — a measured no-op.

Already banked

So this map isn’t re-litigated: the wins below are shipped and verified — do not re-report them as opportunities. Remaining slivers inside them are noted in their themes as “remaining within …”.

WebGL context cap #1416 #1399 — admit the whole working set under a 12-context cap; killed the focus-churn VRAM leak on Chrome+AMD.
OpenCode-derived wins — @pierre/diffs 1.2.10 + Shiki 4.2.0 #1360, off-thread diff highlighting #1363, the canvas gesture-p99 harness + rAF-coalesced pan/zoom #1368. Full write-up: opencode-perf.
Compositor paint storms — canvas tile-aura + dock CSS animations moved to compositor-friendly properties #1354 #1308.
Off-screen work elimination — covered tiles reuse the viewport box; no redundant ResizeObserver fit() cycles on hidden terminals.
Memory — storesByKey released on terminal deletion; per-terminal history-browser state reset on repo change #610.
Reactivity keystone #1425 — the terminalIds memo keeps a stable reference when the top-level id order is unchanged (sameTerminalIdOrder equals gate), so the accessor stops notifying downstream on non-display metadata writes; proven by a re-run-count regression test. (displayInfos keeps its own field-level subscriptions to the display-relevant metadata — git / cwd / parentId — via the surface store’s reconcile writes, so PR/agent/foreground churn never reached it even before this gate; a real git/cwd/parentId change still re-runs it, correctly — that path is left intact, by design.)
Markdown toggle keep-alive #1446 — FileView keeps both Source ⇄ Rendered modes alive, so toggling a .md preview is a visibility flip, not a remount + full re-sanitize of the doc (the marked→DOMPurify→Shiki→innerHTML pipeline runs 0× per toggle, was 1×); a per-slot heldFile snapshot keeps reload-on-edit intact with no render(file) API change, and each comment overlay took a per-instance CSS-highlight name so the two kept-alive surfaces don’t contend. The companion “stabilize the markdown image resolver reference” claim was refuted as a measured no-op.
Nix dev-shell eval — 35× faster (docs/nix-eval-perf-report.md).

Frontend hot paths

The client is where users feel speed. The canvas, WebGL, and diff paths are already green; what’s left is reactive over-derivation, eager bundle weight, and a few markdown tree-walks — all bounded at current scale.

Reactivity granularity

Low Keystone — stabilize the terminalIds memo reference — ✓ shipped

terminalIds was a createMemo running meta.keys().filter(...) (useTerminalMetadata.ts), returning a new array reference every run even when the contents were identical. The dependent displayInfos memo tracked that reference, so any single terminal’s metadata mutation re-ran buildTerminalDisplayInfos for all terminals (terminalDisplay.ts), allocating 4–5 intermediate collections each pass, and re-evaluated every tile’s Show gate on getDisplayInfo (TerminalCanvas.tsx). Done #1425 — the memo now carries a sameTerminalIdOrder equals gate, so it keeps the prior array whenever the top-level id set is unchanged — the set-shaped re-run path (terminalIds() returning a fresh reference) now fires only on a real add / remove / reorder, not on every metadata mutation. The accessor still re-runs cheaply; what it no longer does is notify downstream when the set is identical. Proven by a re-run-count regression test (useTerminalMetadata.test.ts), so banking it didn’t need the live trace the coverage gaps flag as still-pending. (Verification had corrected the original “O(n³)” claim to O(n log n) — the cost was wasted allocations + re-derivation, not algorithmic blowup.) One nuance the gate does not change: displayInfos reads the display-relevant fields (git, cwd, parentId) of each terminal’s metadata inside its own scope, so it carries a second, field-level subscription to those paths. Because the surface store writes via reconcile, PR / agent / foreground churn — the dominant ~1/s updates — never reached displayInfos even before this gate; a real git/cwd/parentId change still re-runs it, which is correct (the displayed identity genuinely changed). The gate closes the set-reference path; the field-level path is already as narrow as it should be.

getSubTerminalIds O(n) scan, called per top-level terminal inside the display derivation → O(n²) per metadata update (useTerminalMetadata.ts:54-56, terminalDisplay.ts:80). Fix with a Map<ParentId, TerminalId[]> index in the same memo. low impact.
terminalLabel O(n) indexOf per access (useTerminalMetadata.ts:94-96) — real, but only 2 call sites, both at event boundaries; bundle it with the index work above, don’t chase it alone. low
Per-tile geometry arithmetic (onScreen, tileTransformCSS in CanvasTile.tsx:114-130) recomputes per pan/zoom frame — but the big win (not mounting off-screen auras) already shipped; the residual ~4 ops/tile/rAF is likely below noise. Remaining within the canvas work. low

Bundle & startup weight

Medium Lazy-load the Code tab — measured (171 kB gzip), deferred

An A/B production build measured the Code-tab tree — @kolu/solid-pierre’s FileTree, the @kolu/solid-markdown renderer, the diff/source view wrappers, and the comment system — at 629 kB raw / 171 kB gzip (23%) of the eager index chunk (it was a static import in RightPanel). Lazy-loading it works (built, lens/codex/simplify/police-reviewed, e2e-verified 115/115 on a pu box) — but it’s deferred, not shipped, because the value is narrower than the framing: activeTab defaults to code and the desktop panel opens by default, so on a typical desktop session CodeTab loads anyway, just async past first paint rather than skipped. The “many who only open terminals never parse it” win holds only for mobile (drawer closed) / collapsed panel; the common desktop case is a faster-first-paint-vs-cold-load-flash trade whose perceptual net is the untraced cold-start TTI — a speculative bet against a visible regression on the default surface. Two premises this refuted: Shiki grammars are already lazy (a dynamic import("shiki"), never on the eager path — and @pierre/diffs already runs in a Worker), and “lazy-load Image on first use” is mechanically impossible (ImageAddon must precede the image escape sequence). Full write-up + the unblock path (TTI trace, default-tab decision, or idle-preload): bundle-codetab-lazyload.

Eager per-terminal addons — Search/Image/Serialize are instantiated per terminal (Terminal.tsx:490-510) though conditional; but ImageAddon can’t lazy-on-first-use (it must precede the image escape sequence), and Serialize/Search minify to ~10–15 kB gzip + add async to the hot path. Low value. low
WebglAddon static import — the GPU renderer sits in the eager bundle though it’s lazily constructed; but every terminal needs it, so it’s core, not deferrable. low

Markdown & code-tab rendering

Medium Markdown preview render cost — the resolver 'fix' was a no-op; the toggle remount is real

The original claim — BrowseFileDispatcher passes resolveImageSrc as an inline arrow (BrowseFileDispatcher.tsx:374-376), so Markdown’s memo re-runs — is a measured no-op. Stabilizing the reference eliminates zero sanitizeHtml runs, confirmed three ways: a faithful jsdom reproduction on the repo’s real Solid build (fresh arrow vs stable callback ⇒ byte-identical memo counts), FileView’s own design (FileView.tsx:91-98 — the appliance is re-rendered on every props.file snapshot, a remount), and the Solid compiler (the inline-arrow prop is static, never a reactive dependency). The defense-in-depth marker is inert too — sanitize re-parses from raw markdown each run, so a marker never survives to the next. The real cost the reproduction surfaced — now fixed: active() returned only the active branch, so a Source⇄Rendered toggle remounted and re-sanitized the whole doc (50-image doc: ~50 image-resolutions + a full parse/sanitize/highlight/DOM-reparse per toggle). Fixed #1446 — FileView now keeps both toggle modes alive (the #818 RightPanel pattern), so a flip is a visibility change, not a remount; the pipeline runs 0× per toggle. A per-slot heldFile snapshot defers a hidden mode’s refresh (no double-render on edit, no render(file) API change), and each comment overlay took a per-instance CSS-highlight name so the two kept-alive surfaces don’t contend. Proven by an e2e (the rendered preview element survives the round-trip). The content-keyed sanitize cache alternative was dropped as accidental complexity. Full write-up + reproduction: markdown-image-resolver-and-toggle.

sanitizeHtml does 6 sequential full-tree walks per parse (sanitize.ts:359-410) — memo-gated on content, so it only bites very large documents; collapse into a single walk if/when that surfaces. low
File-search ancestor recompute per keystroke (fileSearch.ts:50-62) — measured at 0.076 ms/200 calls, below perception; a module-level memo is cheap insurance, not urgent. low

Memory lifecycle

After the shipped storesByKey and history-browser fixes, the residual is one belt-and-suspenders item: the useComments persistedPref factory hand-rolls a per-terminalId signal (useComments.ts:36-83). Consumers wrap it in createMemo, so owners do auto-dispose — the leak claim was overstated — but moving to makePersisted from @solid-primitives would make owner cleanup automatic. low

Backend, wire & streaming

The path from server to client. The strongest new item is per-key wire deltas; the kaval items are hardening/observability wins, not the production OOM fix — the #1420 RCA explicitly ruled out the Channel queue, so the real leak lives elsewhere (likely scrollback/snapshots) and needs a dedicated heap snapshot.

Surface wire & subscriptions

Medium Publish per-key collection deltas, not the full key set

Every upsert/remove publishes the entire key array via keysBus.publish(Array.from(...)) — a fresh object each time — which crosses the wire and triggers client mapArray reconciliation (surface/server.ts:1218-1223, useCollection.ts:60-65). mapArray avoids per-key subscription churn, so the cost is wasted allocations + transmission + memo recompute, not thrashing. Fix: publish discriminated {added:[k]}/{changed:[k]}/{removed:[k]} deltas, emitting the full set only on init; let useCollection apply them. Measure: keysBus publish frequency and payload sizes during terminal spawn/metadata churn.

Full metadata object per live-field update (terminalEndpoint/metadata.ts:96-136) — real, but upstream dedup gates (prResultEqual, agentInfoEqual) already cap cadence to PR-poll 30s / screen-scrape 1s; splitting live vs persisted deltas is lower priority. low
base64 stdio framing adds ~33% (links/stdio-codec.ts:25-64) — framing is already swappable; a length-prefixed binary frame is the upgrade, gated on measured large-payload ops (git diff, fsListAll). low
useCollection subscribes to all keys even if one is consumed (useTerminalMetadata.ts:34) — bounded, since rendered terminals genuinely need metadata; only worth lazy-subscribing in 50+ terminal workspaces with most invisible. low
Three parallel git-status subscriptions per Code tab (CodeTab.tsx:314-349) — real duplication, but documented as load-bearing (the passive subs swallow BASE_BRANCH_NOT_FOUND while the active one revives after fetch). Do not coalesce blindly. low

kaval memory & streaming

Low Bound subscriber queues by bytes, not just item count

Each subscriber queue caps at maxQueue (10k items) with no byte bound (kaval/channel.ts:54-132), so a stalled subscriber on a 1 KB/event PTY could pin ~10 MB before being dropped. Fix: track queue byte size at publish and drop when either item-count or a new maxQueueBytes is exceeded. Note: the #1420 RCA rules this out as the OOM source — this is a known-constant memory cap, not the leak fix.

No backpressure / drop-visibility on proc.onData fan-out (ptyHost.ts:544-548) — publish() is fire-and-forget; add a dropped-subscriber counter/metric. (The original “O(N) push” claim was wrong — push is O(1).) low
Per-attach scrollback serialization — banked #1573: an already-aborted attach now does zero serialize(), and a burst of attaches to one PTY within a publish-epoch shares one memoized snapshot (ptyHost.ts attach() + Entry.snapshotCache). The leverage turned out to be the reconnect storm, not one snapshot’s size: a WebSocket disconnect reissues ~60 attaches, and the measured transient was 2–3.2 GB of concurrent full serializes (a filled 10 K/213-col snapshot is ~4 MB, not the ~4 KB guessed here) — now collapsed to O(live-terminal count). Bounding each snapshot to a viewport is the follow-up (PR2, kaval memory). banked
Exit-code tombstones FIFO-evicted, no TTL (ptyHost.ts:39-42) — intentional bounded design; a missing tombstone falls back harmlessly to 0. Add a TTL only if measurement shows the fallback is hit often. low

Mobile battery & wake-ups

Timers and listeners that run regardless of visibility — the clearest new cluster, and the one most likely to matter on a real phone (where it is, so far, unmeasured).

Medium Heartbeat probe: reconnect-on-resume fixed; hidden-tab battery still open

createHeartbeat() runs system.live / identity.info() every 15s while the socket is OPEN. Two distinct costs hid behind one finding:

Spurious reconnect on resume — FIXED #1598. A laptop sleep / tab freeze / app-switch paused the event loop; the probe’s 10s timeout fired overdue on resume and forced a reconnect over a still-healthy socket (the brief “Disconnected” flash — a regression from the default-on watchdog #1545). The watchdog now compares elapsed WALL time against elapsed MONOTONIC time across each probe and voids-and-re-probes a window a suspension crossed — the browser-leg analog of the ssh leg’s wall-clock-gap wake watcher #1078 — and a window-focus / tab-visible wake event re-probes immediately; the full-screen overlay is grace-windowed too, so a sub-second reconnect never flashes.
Hidden-tab radio wake — still open. The probe still runs ≈240×/hour while backgrounded, forcing the mobile radio idle→active. The tempting fix — stop the interval while document.visibilityState === 'hidden' — is a coverage regression: a hidden tab is still running, so its probe timeout is real, and gating it blinds the watchdog to a genuine half-open during a long background. A battery fix must keep watching — lengthen the hidden-tab interval (30–60s), not stop it. Measure: packet-capture probes/hour backgrounded on a real phone — target under ~10, vs ~240 today.

Every-minute staleness ticker fires globally regardless of visibility (terminal/staleness.ts:26-57) — gate the setInterval on visibilitychange (reuse the refitOnTabVisible pattern). low
N per-terminal visibilitychange listeners for re-fit (refitOnTabVisible.ts) — collapse to one shared App-root listener fanning out to a Set of debouncedFit callbacks. low
WebGL cap oversized for phones — WEBGL_CONTEXT_CAP=12 suits desktop; mobile shows 1–2 tiles. Largely mitigated (Terminal.tsx:185-196 requires visible && holdsWebgl), but a layout-specific budget (1 on phone) would be tighter. Remaining within #1399. low
getBoundingClientRect per terminal tap for link detection (Terminal.tsx:572-591) — guarded to genuine taps, but each forces a sync layout read; cache the rect against the ResizeObserver. low

Dev-loop & CI

Iteration speed is a performance surface too. Two confirmed-real wins, both with concrete cost.

Medium Parallelize tests across the workspace

pnpm -r serializes package test runs (no --workspace-concurrency), so on a multi-core machine the 44 packages run roughly one-at-a-time even though each vitest threads internally — workspace-level parallelism is unused (package.json:7, per-package vitest.config.ts). Fix: enable workspace concurrency in the test:unit recipe; consider vitest --shard for the slowest packages (e.g. git/index.test.ts). Measure: just test-unit baseline vs --workspace-concurrency N.

Medium Compute the Nix pnpmDeps hash once, not twice

ci::pnpm-hash-fresh runs two sequential nix builds (the second --rebuild) with no cache reuse, so pnpm install runs fully twice — measured at 2m45s on darwin / 25s on linux (ci/mod.just:82-84, default.nix:154-159). Fix: compute the actual hash once into a temp derivation, then compare against the declared hash in a pure eval step, deferring the fetch to a single store-locked derivation. Measure: cold-store nix build .#pnpmDeps --no-link then --rebuild --no-link, total wall-clock before/after.

From static reads to live traces

Every finding above is a static code read. Verification corrected several overstated claims (no “high” survived; “O(n³)” was O(n log n); a “100–400 KB” snapshot is likely ~4 KB) precisely because nobody had a number. The next round moves from reading to measuring — these are the gaps this map does not yet rest on:

No live client trace. Capture LCP/INP/CLS and a flame chart of a real 20+ terminal session (chrome-devtools) to confirm which reactivity items actually surface — this gates backlog item #2 (#1 shipped, proven by a deterministic re-run-count test rather than a trace; #7’s resolver-reference claim was refuted by deterministic reproduction — no trace needed — and the Markdown toggle re-sanitize it surfaced instead shipped, proven by an e2e rather than a trace).
The #1420 OOM root cause is still unidentified. The Channel-queue RCA ruled itself out; a dedicated kaval heap snapshot needs to find the real scrollback/snapshot retention path.
Mobile rests on mechanism, not on-device traces. Battery wake-ups, GPU memory across swipes, and keystroke-to-paint on low-end Android are all unmeasured.
Wire payloads are uncounted. No captured byte sizes for full-key-set / full-object publishes or base64 framing across representative repos.
Bundle composition is now measured (item #5) — an A/B build attributes the Code-tab tree at 171 kB gzip of the eager chunk and the lazy-load works (e2e 115/115), but it’s deferred: with activeTab defaulting to code, the untraced question is cold-start LCP/INP — does deferring it (faster first paint vs a cold-load Code-tab flash) net out perceptually.
Server CPU under load (git-status polling, 1s agent screen-scrape, PR polling) hasn’t been profiled in aggregate, only mechanism-by-mechanism.