Performance — Where Kolu Can Get Faster
A living tuning map of the Kolu monorepo. Built from a 77-agent survey (10 subsystem investigators → adversarial verification → synthesis), it ledgers what's already shipped and ranks the real-but-bounded opportunities that remain — so we keep Kolu nimble and fast, by measurement, over time.
This is the Atlas hub for keeping Kolu nimble and fast — a living map of the
monorepo’s performance surfaces, the wins already banked, and the opportunities
worth tuning next. It was built by a survey workflow: 10 investigators, one
per subsystem, each reading real source; adversarial verification of every
finding against the code (this repo’s history shows plausible, code-cited perf
diagnoses are often wrong — see memory-learnings
and dock-and-eventloop-1308);
then synthesis into the themes below. Of 66 raw findings, 35 survived as real
or partial, 6 were confirmed already-shipped, and 25 were dropped as
speculative or mechanically wrong.
The map
Green is banked; amber is open. The opportunities cluster on the client render path, the wire, the kaval backend, mobile timers, and the dev loop — with the GPU/canvas/diff hot paths already green.
Top of the backlog
The actionable shortlist, ranked by leverage. Impact and effort are the verified estimates (post-adversarial-correction), not the original claims. The keystone (row 1) is now shipped; it clears the way for the rest of the reactivity cluster.
| # | Opportunity | Surface | Impact | Effort | The fix, in one line |
|---|---|---|---|---|---|
| 1 | Stabilize the terminalIds memo reference shipped |
Reactivity | med | med | Done #1425 — the memo keeps its prior array when the id order is unchanged, so terminalIds() stops notifying downstream on non-display metadata writes (displayInfos still re-runs correctly for git/cwd/parentId changes via its own field-level subscription). Clears the rest of the reactivity cluster. |
| 2 | Per-key collection deltas on the wire | Wire | med | med | Publish {added/changed/removed} keys, not the full key array, on every upsert/remove. |
| 3 | Heartbeat: reconnect-on-resume fixed; hidden-tab battery open | Mobile | med | med | Reconnect-on-resume shipped — a measured wall-vs-monotonic clock gap voids a probe a sleep/freeze interrupted, and a focus / tab-visible event re-probes at once. Hidden-tab radio still wakes ≈240×/h; but gating the probe on hidden blinds the watchdog (a hidden tab still runs), so a battery fix must lengthen the hidden interval, not stop it. |
| 4 | Workspace-level test parallelism | Dev-loop | med | low | Add --workspace-concurrency so 44 packages don’t test one-at-a-time. |
| 5 | Lazy-load the Code tab — measured, deferred | Bundle | med | high | A/B build: the Code-tab tree is 171 kB gzip / 23% of the eager chunk. But activeTab defaults to code, so lazy-loading defers it async past first paint rather than skipping it for most desktop sessions — a faster-first-paint-vs-cold-flash trade, TTI untraced. Deferred. (investigation) |
| 6 | One-shot Nix pnpmDeps hash check |
Dev-loop | med | high | Compute the hash once instead of two sequential builds (2m45s on darwin). |
| 7 | Avoid the Markdown Source⇄Rendered toggle re-sanitize shipped | Markdown | med | med | Done #1446 — FileView keeps both toggle modes alive (the #818 pattern), so a flip is a visibility change, not a remount: the full marked→DOMPurify→Shiki→innerHTML pipeline runs 0× per toggle (was 1×). Replaces the refuted “stabilize the resolver reference” claim — a measured no-op. |
Already banked
So this map isn’t re-litigated: the wins below are shipped and verified — do not re-report them as opportunities. Remaining slivers inside them are noted in their themes as “remaining within …”.
- WebGL context cap #1416 #1399 — admit the whole working set under a 12-context cap; killed the focus-churn VRAM leak on Chrome+AMD.
- OpenCode-derived wins —
@pierre/diffs1.2.10 + Shiki 4.2.0 #1360, off-thread diff highlighting #1363, the canvas gesture-p99 harness + rAF-coalesced pan/zoom #1368. Full write-up: opencode-perf. - Compositor paint storms — canvas tile-aura + dock CSS animations moved to compositor-friendly properties #1354 #1308.
- Off-screen work elimination — covered tiles reuse the viewport box; no redundant
ResizeObserverfit()cycles on hidden terminals. - Memory —
storesByKeyreleased on terminal deletion; per-terminal history-browser state reset on repo change #610. - Reactivity keystone #1425 — the
terminalIdsmemo keeps a stable reference when the top-level id order is unchanged (sameTerminalIdOrderequalsgate), so the accessor stops notifying downstream on non-display metadata writes; proven by a re-run-count regression test. (displayInfoskeeps its own field-level subscriptions to the display-relevant metadata —git/cwd/parentId— via the surface store’sreconcilewrites, so PR/agent/foreground churn never reached it even before this gate; a realgit/cwd/parentIdchange still re-runs it, correctly — that path is left intact, by design.) - Markdown toggle keep-alive #1446 —
FileViewkeeps both Source ⇄ Rendered modes alive, so toggling a.mdpreview is a visibility flip, not a remount + full re-sanitize of the doc (the marked→DOMPurify→Shiki→innerHTMLpipeline runs 0× per toggle, was 1×); a per-slotheldFilesnapshot keeps reload-on-edit intact with norender(file)API change, and each comment overlay took a per-instance CSS-highlight name so the two kept-alive surfaces don’t contend. The companion “stabilize the markdown image resolver reference” claim was refuted as a measured no-op. - Nix dev-shell eval — 35× faster (
docs/nix-eval-perf-report.md).
Frontend hot paths
The client is where users feel speed. The canvas, WebGL, and diff paths are already green; what’s left is reactive over-derivation, eager bundle weight, and a few markdown tree-walks — all bounded at current scale.
Reactivity granularity
Low Keystone — stabilize the terminalIds memo reference — ✓ shipped
terminalIds was a createMemo running meta.keys().filter(...)
(useTerminalMetadata.ts), returning a new array reference every run even
when the contents were identical. The dependent displayInfos memo tracked
that reference, so any single terminal’s metadata mutation re-ran
buildTerminalDisplayInfos for all terminals (terminalDisplay.ts),
allocating 4–5 intermediate collections each pass, and re-evaluated every tile’s
Show gate on getDisplayInfo (TerminalCanvas.tsx). Done #1425 —
the memo now carries a sameTerminalIdOrder equals gate, so it keeps the prior array
whenever the top-level id set is unchanged — the set-shaped re-run path
(terminalIds() returning a fresh reference) now fires only on a real add /
remove / reorder, not on every metadata mutation. The accessor still re-runs
cheaply; what it no longer does is notify downstream when the set is identical.
Proven by a re-run-count regression test (useTerminalMetadata.test.ts), so banking
it didn’t need the live trace the coverage gaps
flag as still-pending. (Verification had corrected the original “O(n³)” claim to
O(n log n) — the cost was wasted allocations + re-derivation, not algorithmic
blowup.) One nuance the gate does not change: displayInfos reads the
display-relevant fields (git, cwd, parentId) of each terminal’s metadata
inside its own scope, so it carries a second, field-level subscription to those
paths. Because the surface store writes via reconcile, PR / agent / foreground
churn — the dominant ~1/s updates — never reached displayInfos even before this
gate; a real git/cwd/parentId change still re-runs it, which is correct (the
displayed identity genuinely changed). The gate closes the set-reference path;
the field-level path is already as narrow as it should be.
getSubTerminalIdsO(n) scan, called per top-level terminal inside the display derivation → O(n²) per metadata update (useTerminalMetadata.ts:54-56,terminalDisplay.ts:80). Fix with aMap<ParentId, TerminalId[]>index in the same memo. low impact.terminalLabelO(n)indexOfper access (useTerminalMetadata.ts:94-96) — real, but only 2 call sites, both at event boundaries; bundle it with the index work above, don’t chase it alone. low- Per-tile geometry arithmetic (
onScreen,tileTransformCSSinCanvasTile.tsx:114-130) recomputes per pan/zoom frame — but the big win (not mounting off-screen auras) already shipped; the residual ~4 ops/tile/rAF is likely below noise. Remaining within the canvas work. low
Bundle & startup weight
Medium Lazy-load the Code tab — measured (171 kB gzip), deferred
An A/B production build measured the Code-tab tree — @kolu/solid-pierre’s
FileTree, the @kolu/solid-markdown renderer, the diff/source view wrappers, and
the comment system — at 629 kB raw / 171 kB gzip (23%) of the eager index
chunk (it was a static import in RightPanel). Lazy-loading it works (built,
lens/codex/simplify/police-reviewed, e2e-verified 115/115 on a pu box) — but it’s
deferred, not shipped, because the value is narrower than the framing:
activeTab defaults to code and the desktop panel opens by default, so on a
typical desktop session CodeTab loads anyway, just async past first paint rather
than skipped. The “many who only open terminals never parse it” win holds only for
mobile (drawer closed) / collapsed panel; the common desktop case is a
faster-first-paint-vs-cold-load-flash trade whose perceptual net is the untraced
cold-start TTI — a speculative bet against a visible regression on the default
surface. Two premises this refuted: Shiki grammars are already lazy (a dynamic
import("shiki"), never on the eager path — and @pierre/diffs already runs in a
Worker), and “lazy-load Image on first use” is mechanically impossible (ImageAddon
must precede the image escape sequence). Full write-up + the unblock path (TTI
trace, default-tab decision, or idle-preload):
bundle-codetab-lazyload.
- Eager per-terminal addons —
Search/Image/Serializeare instantiated per terminal (Terminal.tsx:490-510) though conditional; butImageAddoncan’t lazy-on-first-use (it must precede the image escape sequence), andSerialize/Searchminify to ~10–15 kB gzip + add async to the hot path. Low value. low WebglAddonstatic import — the GPU renderer sits in the eager bundle though it’s lazily constructed; but every terminal needs it, so it’s core, not deferrable. low
Markdown & code-tab rendering
Medium Markdown preview render cost — the resolver 'fix' was a no-op; the toggle remount is real
The original claim — BrowseFileDispatcher passes resolveImageSrc as an inline
arrow (BrowseFileDispatcher.tsx:374-376), so Markdown’s memo re-runs — is a
measured no-op. Stabilizing the reference eliminates zero sanitizeHtml
runs, confirmed three ways: a faithful jsdom reproduction on the repo’s real Solid
build (fresh arrow vs stable callback ⇒ byte-identical memo counts), FileView’s
own design (FileView.tsx:91-98 — the appliance is re-rendered on every props.file
snapshot, a remount), and the Solid compiler (the inline-arrow prop is static,
never a reactive dependency). The defense-in-depth marker is inert too — sanitize
re-parses from raw markdown each run, so a marker never survives to the next.
The real cost the reproduction surfaced — now fixed: active() returned only
the active branch, so a Source⇄Rendered toggle remounted and re-sanitized the
whole doc (50-image doc: ~50 image-resolutions + a full
parse/sanitize/highlight/DOM-reparse per toggle). Fixed #1446 —
FileView now keeps both toggle modes alive (the #818 RightPanel pattern), so a
flip is a visibility change, not a remount; the pipeline runs 0× per toggle.
A per-slot heldFile snapshot defers a hidden mode’s refresh (no double-render on
edit, no render(file) API change), and each comment overlay took a per-instance
CSS-highlight name so the two kept-alive surfaces don’t contend. Proven by an e2e
(the rendered preview element survives the round-trip). The content-keyed sanitize
cache alternative was dropped as accidental complexity. Full write-up + reproduction:
markdown-image-resolver-and-toggle.
sanitizeHtmldoes 6 sequential full-tree walks per parse (sanitize.ts:359-410) — memo-gated on content, so it only bites very large documents; collapse into a single walk if/when that surfaces. low- File-search ancestor recompute per keystroke (
fileSearch.ts:50-62) — measured at 0.076 ms/200 calls, below perception; a module-level memo is cheap insurance, not urgent. low
Memory lifecycle
After the shipped storesByKey and history-browser fixes, the residual is one
belt-and-suspenders item: the useComments persistedPref factory hand-rolls a
per-terminalId signal (useComments.ts:36-83). Consumers wrap it in
createMemo, so owners do auto-dispose — the leak claim was overstated — but
moving to makePersisted from @solid-primitives would make owner cleanup
automatic. low
Backend, wire & streaming
The path from server to client. The strongest new item is per-key wire deltas; the kaval items are hardening/observability wins, not the production OOM fix — the #1420 RCA explicitly ruled out the Channel queue, so the real leak lives elsewhere (likely scrollback/snapshots) and needs a dedicated heap snapshot.
Surface wire & subscriptions
Medium Publish per-key collection deltas, not the full key set
Every upsert/remove publishes the entire key array via
keysBus.publish(Array.from(...)) — a fresh object each time — which crosses the
wire and triggers client mapArray reconciliation (surface/server.ts:1218-1223,
useCollection.ts:60-65). mapArray avoids per-key subscription churn, so the
cost is wasted allocations + transmission + memo recompute, not thrashing.
Fix: publish discriminated {added:[k]}/{changed:[k]}/{removed:[k]} deltas,
emitting the full set only on init; let useCollection apply them. Measure:
keysBus publish frequency and payload sizes during terminal spawn/metadata
churn.
- Full metadata object per live-field update (
terminalEndpoint/metadata.ts:96-136) — real, but upstream dedup gates (prResultEqual,agentInfoEqual) already cap cadence to PR-poll 30s / screen-scrape 1s; splitting live vs persisted deltas is lower priority. low base64stdio framing adds ~33% (links/stdio-codec.ts:25-64) — framing is already swappable; a length-prefixed binary frame is the upgrade, gated on measured large-payload ops (git diff,fsListAll). lowuseCollectionsubscribes to all keys even if one is consumed (useTerminalMetadata.ts:34) — bounded, since rendered terminals genuinely need metadata; only worth lazy-subscribing in 50+ terminal workspaces with most invisible. low- Three parallel git-status subscriptions per Code tab (
CodeTab.tsx:314-349) — real duplication, but documented as load-bearing (the passive subs swallowBASE_BRANCH_NOT_FOUNDwhile the active one revives after fetch). Do not coalesce blindly. low
kaval memory & streaming
Low Bound subscriber queues by bytes, not just item count
Each subscriber queue caps at maxQueue (10k items) with no byte bound
(kaval/channel.ts:54-132), so a stalled subscriber on a 1 KB/event PTY could
pin ~10 MB before being dropped. Fix: track queue byte size at publish and
drop when either item-count or a new maxQueueBytes is exceeded. Note: the
#1420 RCA rules this out as the OOM source — this is a known-constant memory cap,
not the leak fix.
- No backpressure / drop-visibility on
proc.onDatafan-out (ptyHost.ts:544-548) —publish()is fire-and-forget; add a dropped-subscriber counter/metric. (The original “O(N) push” claim was wrong —pushis O(1).) low - Per-attach scrollback serialization — banked #1573: an already-aborted attach now does zero
serialize(), and a burst of attaches to one PTY within a publish-epoch shares one memoized snapshot (ptyHost.tsattach()+Entry.snapshotCache). The leverage turned out to be the reconnect storm, not one snapshot’s size: a WebSocket disconnect reissues ~60 attaches, and the measured transient was 2–3.2 GB of concurrent full serializes (a filled 10 K/213-col snapshot is ~4 MB, not the ~4 KB guessed here) — now collapsed to O(live-terminal count). Bounding each snapshot to a viewport is the follow-up (PR2, kaval memory). banked - Exit-code tombstones FIFO-evicted, no TTL (
ptyHost.ts:39-42) — intentional bounded design; a missing tombstone falls back harmlessly to0. Add a TTL only if measurement shows the fallback is hit often. low
Mobile battery & wake-ups
Timers and listeners that run regardless of visibility — the clearest new cluster, and the one most likely to matter on a real phone (where it is, so far, unmeasured).
Medium Heartbeat probe: reconnect-on-resume fixed; hidden-tab battery still open
createHeartbeat() runs system.live / identity.info() every 15s while the
socket is OPEN. Two distinct costs hid behind one finding:
- Spurious reconnect on resume — FIXED #1598. A laptop sleep / tab freeze / app-switch paused the event loop; the probe’s 10s timeout fired overdue on resume and forced a reconnect over a still-healthy socket (the brief “Disconnected” flash — a regression from the default-on watchdog #1545). The watchdog now compares elapsed WALL time against elapsed MONOTONIC time across each probe and voids-and-re-probes a window a suspension crossed — the browser-leg analog of the ssh leg’s wall-clock-gap wake watcher #1078 — and a window-focus / tab-visible wake event re-probes immediately; the full-screen overlay is grace-windowed too, so a sub-second reconnect never flashes.
- Hidden-tab radio wake — still open. The probe still runs ≈240×/hour while
backgrounded, forcing the mobile radio idle→active. The tempting fix — stop the
interval while
document.visibilityState === 'hidden'— is a coverage regression: a hidden tab is still running, so its probe timeout is real, and gating it blinds the watchdog to a genuine half-open during a long background. A battery fix must keep watching — lengthen the hidden-tab interval (30–60s), not stop it. Measure: packet-capture probes/hour backgrounded on a real phone — target under ~10, vs ~240 today.
- Every-minute staleness ticker fires globally regardless of visibility (
terminal/staleness.ts:26-57) — gate thesetIntervalonvisibilitychange(reuse therefitOnTabVisiblepattern). low - N per-terminal
visibilitychangelisteners for re-fit (refitOnTabVisible.ts) — collapse to one shared App-root listener fanning out to a Set ofdebouncedFitcallbacks. low - WebGL cap oversized for phones —
WEBGL_CONTEXT_CAP=12suits desktop; mobile shows 1–2 tiles. Largely mitigated (Terminal.tsx:185-196requiresvisible && holdsWebgl), but a layout-specific budget (1 on phone) would be tighter. Remaining within #1399. low getBoundingClientRectper terminal tap for link detection (Terminal.tsx:572-591) — guarded to genuine taps, but each forces a sync layout read; cache the rect against theResizeObserver. low
Dev-loop & CI
Iteration speed is a performance surface too. Two confirmed-real wins, both with concrete cost.
Medium Parallelize tests across the workspace
pnpm -r serializes package test runs (no --workspace-concurrency), so on a
multi-core machine the 44 packages run roughly one-at-a-time even though each
vitest threads internally — workspace-level parallelism is unused
(package.json:7, per-package vitest.config.ts). Fix: enable workspace
concurrency in the test:unit recipe; consider vitest --shard for the slowest
packages (e.g. git/index.test.ts). Measure: just test-unit baseline vs
--workspace-concurrency N.
Medium Compute the Nix pnpmDeps hash once, not twice
ci::pnpm-hash-fresh runs two sequential nix builds (the second
--rebuild) with no cache reuse, so pnpm install runs fully twice — measured at
2m45s on darwin / 25s on linux (ci/mod.just:82-84, default.nix:154-159).
Fix: compute the actual hash once into a temp derivation, then compare against
the declared hash in a pure eval step, deferring the fetch to a single
store-locked derivation. Measure: cold-store nix build .#pnpmDeps --no-link
then --rebuild --no-link, total wall-clock before/after.
From static reads to live traces
Every finding above is a static code read. Verification corrected several overstated claims (no “high” survived; “O(n³)” was O(n log n); a “100–400 KB” snapshot is likely ~4 KB) precisely because nobody had a number. The next round moves from reading to measuring — these are the gaps this map does not yet rest on:
- No live client trace. Capture LCP/INP/CLS and a flame chart of a real 20+ terminal session (chrome-devtools) to confirm which reactivity items actually surface — this gates backlog item #2 (#1 shipped, proven by a deterministic re-run-count test rather than a trace; #7’s resolver-reference claim was refuted by deterministic reproduction — no trace needed — and the Markdown toggle re-sanitize it surfaced instead shipped, proven by an e2e rather than a trace).
- The #1420 OOM root cause is still unidentified. The Channel-queue RCA ruled itself out; a dedicated kaval heap snapshot needs to find the real scrollback/snapshot retention path.
- Mobile rests on mechanism, not on-device traces. Battery wake-ups, GPU memory across swipes, and keystroke-to-paint on low-end Android are all unmeasured.
- Wire payloads are uncounted. No captured byte sizes for full-key-set / full-object publishes or base64 framing across representative repos.
- Bundle composition is now measured (item #5) — an A/B build attributes the Code-tab tree at 171 kB gzip of the eager chunk and the lazy-load works (e2e 115/115), but it’s deferred: with
activeTabdefaulting tocode, the untraced question is cold-start LCP/INP — does deferring it (faster first paint vs a cold-load Code-tab flash) net out perceptually. - Server CPU under load (git-status polling, 1s agent screen-scrape, PR polling) hasn’t been profiled in aggregate, only mechanism-by-mechanism.