← the Atlas

Kaval sessions — dial daemons, multiplex the canvas

feature · seedling ·accepted ·

A direction reframe for R-2 now kaval ships. Dial kavals as first-class endpoints — local over a unix socket, remote over ssh (provisioned via surface-nix-host) — with hosts auto-detected from ssh config. The same substrate supports either switching one kaval at a time (tmux-style) or multiplexing several on one canvas (per-tile piggybacks on its kaval); the canvas metaphor leans multiplex. First step — kaval-tui create, then --host (reach + provision in one PR).

A reframe of remote-terminals’s R-2, proposed now kaval ships. Old R-2 made remote a heavyweight per-tile backend (a RemoteTerminalBackend, a parent-side screen mirror, a bespoke agent). This proposes a lighter substrate: a kaval is a daemon you dial — local over a unix socket, remote over ssh — and Kolu can put a tile on any kaval. Whether the canvas shows one kaval at a time (tmux-style switch) or multiplexes several (a tile per kaval, side by side) is a UI choice on the same substrate — and because kolu is a canvas, it leans multiplex.

kolulocalprodssh+ hostfrom ssh config
build · ~/applocal
▸ vite build — watching…
deploy · prodssh
$ kubectl rollout status…
↳ one canvas, tiles on different kavals — local always present, remotes added on demand

The substrate — a kaval is a daemon you dial

kolu-server already keys its pty-host endpoint + status by hostId (a Map, instantiated as one — local — today). This instantiates more keys: a registry of kaval endpoints, each reached by a driver — unix socket (local, shipped) or ssh stdio (HostSession, R-2’s only new transport, the same closure shipped by @kolu/surface-nix-host).

kolu canvas — a tile lives on a kaval(local + ssh, switched or multiplexed)kolu-serverkaval @ local — survives deployskaval @ prod — provisioned + dialed (R-2)per tile: resolveEndpoint(location) → its kaval endpointendpoint registry — Map<hostId, endpoint>local — unix socketprod — ssh stdio (HostSession) a tile's PTYdial unix socketssh: nix copy --derivation → realise → dial
One substrate. kolu-server holds a registry of kaval endpoints keyed by hostId; a tile resolves to its endpoint via the R-1 seam, and each endpoint is dialed by a driver (unix socket / ssh stdio). Switch keeps one endpoint live and swaps; multiplex keeps N live at once. Remote is just the ssh driver provisioning the same kaval closure.

There is one backend, not many — and one backend means one shape, not one host. A backend is three surfaces bound to an endpoint — PTY · fs · git — and local vs remote is only the transport each is dialed over. The PTY surface already works this way (kaval at a unix socket since B2, or ssh stdio); fs and git extend the same pattern — served at the host, mirrored over the same HostSession link (mirrorRemoteCollection, the R-1.5 remote-process-monitor pattern). No reach-over-ssh per op, no polling.

At the endpointlocalremote — same HostSession link
PTY — kaval (fds + OSC taps)unix socketssh stdio
provider DAG + fs/git watchers — run where the files arein kolu-serverin kolu-watcher → stream metadata
Code-tab fs — browse · read · watchnode fs (native)remote fs surface · snapshot-then-delta

The awareness providers (cwd · title · foreground · command · agent-detection) are tap-fed, so they are transport-free. kaval stays the durable survivor; kolu-watcher — a host-resident process running the provider DAG + fs/git — re-runs fresh (the #1031 line, now applied remote-side). The kolu- prefix is honest: kolu-watcher runs kolu’s own logic — it is coupled, kolu’s soul — unlike the generic, kolu-agnostic kaval (which earns a standalone name). So R-1’s TerminalBackend interface — renamed TerminalEndpoint and shorn of its speculative resolver in P0 ( #1364 ) — anticipated a second (remote) implementation the kaval model dissolves: not two backends, one backend whose surfaces bind to an endpoint. Adding a kaval is adoption you already shippedadoptOrEnsure (B3.3) adopts a live survivor’s PTYs (else provisions + spawns), now run per endpoint — and multiple local kavals fall out for free (kaval is namespaced per kolu-server port, kaval-<port>/, #1313 ).

Switch or multiplex?

The open question — and the same substrate answers either, so it is a later, reversible UI call that the spike below does not depend on:

switch (tmux-style)multiplex (canvas) — leaning
the canvasone kaval in viewtiles span kavals
per-tile locationunneededadditive — the R-1 discriminator, finally used
live endpointsone, swappedN concurrent
host hintone current pointera light per-tile tag; ChromeBar shows the active kavals
why it fits kolusimplestthe canvas is spatial + heterogeneous by design

The lean is multiplex: tmux-switch is a terminal-multiplexer metaphor, but kolu is a canvas — “watch a local build beside a prod-ssh tile” is the point of a canvas, not a corner case. The cost over switch is modest and clean — per-tile location returns, but additively (no RemoteTerminalBackend, no screen mirror — just which endpoint holds this tile’s PTY), and the ChromeBar shows only the kavals this canvas actually has tiles on.

Phases

PhaseShipsNote
P0 · backend cleanup shipped #1364 collapsed R-1’s anticipated backend polymorphism — a backend is PTY · fs · git surfaces bound to an endpoint (TerminalEndpoint, renamed from TerminalBackend; no RemoteTerminalBackend); byte-identical locally. The interface’s own doc-comment is now the seam P3 fills with the remote implkolu-server only · landed · grounds P3
P1 · kaval-tui create shipped #1370 spawns a plain $SHELL (or a given command) and prints its id — the attach-needs-something prerequisite; thin over terminal.spawn, the whole input composed client-side (no kolu policy)unblocks the create → attach half of the spike
P2 · kaval-tui --host shipped #1373 reach + provision in one PR (ssh): provision the daemon’s closure, run kaval --stdio — which fronts the durable daemon (adopt-or-start the host’s kaval, relay the link to its unix socket) — and dial it as the same ptyHostSurface client, so a remote PTY survives the link (create on prod, attach later). ssh-config auto-detect deferredthe spikeis the production driver; validated on a pu box
P2.5 · upstream frontDaemonOverStdio shipped #1374 extracted P2’s kaval --stdio durable-fronting bridge (stdioBridge.ts) into @kolu/surface-daemon as frontDaemonOverStdio — the durable counterpart to @kolu/surface’s serveOverStdio (ephemeral per-link agent ↔ a gate-held daemon fronted over ssh-stdio): a contract-agnostic relay + adopt-or-spawn, parameterized by socket path + daemon-spawn commandlanded right after #1373, while the bridge was fresh — homes the durable transport as a library primitive before P3’s remote pair builds on it. The drishti companion is a verified-compatible kolu-pin bump (drishti CI green against the new surface-daemon API) — not a durable-observer rewrite; see the note below
P2.7 · skip redundant provisioning shipped #1377 surface-nix-host re-ships the closure on every dial — a warm, already-provisioned host still runs nix copy --derivation → realise → pin (the copy reports “copying 0 paths”, the realise rebuilds nothing, the pin re-points an identical root). Make the shared provisionAgent ask nix whether the closure is already realisable on the host in one cheap round-trip that doubles as the pin, falling through to the full copy only on a miss — keyed on store-realisability, not the dangle-prone GC rootone internal change to the shared ensure verb, so drishti’s process-monitor-agent provisioning gets the same warm-skip free; same signature ⇒ not API-facing ⇒ no drishti PR compelled
P2.8 · multiplex the ssh connections shipped #1378 P2.7 cut the warm path’s work (≈5 ssh ops → 3) but each survivor was still a separate ssh handshake — nothing reused the tunnel. Now one connection is shared across the arch probe, the provision check, and the agent dial via ssh ControlMaster=auto/ControlPersist, added as -o flags to the existing SSH_OPT_PAIRS single-source-of-truth in host.ts (no ~/.ssh/config touch; a kolu-private %C-addressed ControlPath), plus an in-memory per-host arch cache. Warm dial → one handshake + near-instant channelsthe deferred “per-connection villain” P2.7 named; the master-lifecycle edges (stale sockets, multi-process contention, keepalive-governs-the-tunnel) are all delegated to ssh’s auto — no -O exit teardown that would defeat the cross-invocation warmth
P3 · kolu dials remotesN-endpoint registry + the ssh driver; the remote pair — kaval + kolu-watcher (provider DAG + fs/git surface) — mirrored over the HostSession link; location additive on recordsbuilds on P0 + P2 · carries the remote seams (composeSpawnInput inversion · paste/upload PRECONDITION_FAILED guard · drvPath via resolveSystem)
P4 · the canvas UI + validatebuild multiplex (the lean) — active-kaval strip (local always present), per-tile host hint, ssh-config picker — then validate the feel (does N-at-once beat tmux-switch in daily use?); fall back to switch if it doesn’t earn its keepthe granularity call lands and is validated here
P1–P2 — the spike — in detail. kaval-tui is the minimal client; it proves the transport before any Kolu UI, and because it and kolu-server dial the same socket through the same client, the spike is the production driver. P1 (create) and P2 (--host) have both landed ( #1370 · #1373 ); the prose below reads as the record for both.
kaval-tui · ssh nix@prod (provision + dial)
$ kaval-tui list --host nix@prod
↳ ssh nix@prod · nix copy --derivation → realise … kaval up
↳ dialing the remote kaval over ssh stdio … connected · 0 terminals
 
$ kaval-tui create --host nix@prod
↳ spawned a8f1… plain shell on prod
$ kaval-tui attach a8f1… --host nix@prod
↳ ~/app on prod — remote PTY survives · detach/reattach anytime

Pick-up anchors

Concrete hooks — every path below is verified against the tree. Both phases live in packages/kaval-tui/src/main.ts (a cleye CLI: command() entries in a commands array, dispatched by a closed if (argv.command === …) chain) and call the already-complete terminal.spawn / surface contract — neither needs a contract bump. Both are now the record of what landed.

P1 · kaval-tui create shipped #1370 — the thin spawn, as it landed:

ConcernWhere it landed
subcommand + dispatchthe create command() beside list/snapshot/attach and the else if (argv.command === "create") arm (main.ts cmdCreate)
the callconn.client.surface.terminal.spawn(input){ id, pid, cwd }; the existing PtyTuiClient (connect.ts) exposed it — no client change
input shapePtyHostSpawnInput (packages/kaval/src/ptyHostSurface.ts) — { argv, cwd, env, initFiles }. Thin path: argv is the command, else [$SHELL] falling back to DEFAULT_SPAWN_SHELL, with initFiles: [] — deliberately not kolu-server’s policy compose (composeSpawnInput, packages/server/src/ptyHost/index.ts), which layers env + shell-init for the rich client. Composed by the pure buildCreateInput in packages/kaval-tui/src/create.ts
minimal shape it sharesthe in-tree spawnInput() helpers — packages/kaval/src/contractCorpus.testlib.ts and packages/kaval-tui/src/attach.test.ts (the latter delegates to buildCreateInput, so the test shape can’t drift)
the resolved callspositional [command...] (a plain $SHELL by default, the command after --), no auto-attach (print the id and exit — the attach with … hint to stderr), and --json{ id, pid, cwd } for scripts

P2 · kaval-tui --host <ssh> shipped #1373 — reach + provision, as it landed:

ConcernWhere it landed
the flag--host beside socketFlag (now spread as endpointFlags) on every subcommand; mutually exclusive with --socket, guarded in main()
connect branchconnectPtyHostViaHost (packages/kaval-tui/src/hostConnect.ts) returns the same Connection shape the unix path does, so every cmd*() — including P1’s create — consumes it unchanged. markConnected() after the first RPC disarms the 30s connect-watchdog so a long attach isn’t reaped
dial + provisiongetHostSession<typeof ptyHostSurface.contract>({ host, binary: "kaval", resolveDrvPath })provisionAgent (nix copy --derivation → realise → pin GC root) then stdioLink over ssh … kaval --stdio. session.pin() for the client; session.destroy() for dispose
durable frontingkaval --stdio (packages/kaval/src/stdioBridge.ts) — the one daemon-side addition. Adopt-or-start the host’s durable kaval (pid-gated) and raw-byte-relay the ssh link to its unix socket (identical peer framing, no decode), so createattach survive across links. After P2.5 ( #1374 ) the kaval shim is a thin composition — it imports @kolu/surface-daemon’s frontDaemonOverStdio + socketPath; the generic relay (packages/surface-daemon/src/frontDaemonOverStdio.ts) is the one that’s contract-blind node:*-only, which is what keeps the daemon-closure allow-list intact (buildId.closure.test.ts@kolu/surface-daemon is a hashed root, so the shim’s edge into it is in-closure, not an external)
arch → drvPathresolveSystem(host) feeds a per-system { system → kaval .drv } map baked into the kaval-tui Nix wrapper (KAVAL_AGENT_DRVS_JSON; built in flake.nix, baked in default.nix); openssh/nix on the wrapper PATH for the provision
open callssh-config host auto-detect (the “from ssh config” nicety) was deferred — not load-bearing for the spike

P2.5 — the durable stdio front, upstreamed. shipped #1374 P2’s kaval --stdio bridge (packages/kaval/src/stdioBridge.ts) was a member of a real class — a remote, in-process stateful session that must outlive any one client link (detach → reattach hits the same live state), reached over ssh-stdio: the mosh / tmux / dtach / abduco lineage, generalized from a PTY to any @kolu/surface daemon. It is a different primitive from the ephemeral remote-agent the library already serves — mini-ci, remote-process-monitor, and drishti all re-run fresh per link, which @kolu/surface’s serveOverStdio covers (there the --stdio process is the server) — whereas the durable bridge is a contract-agnostic proxy fronting a separate, gate-held daemon. So, right after #1373 merged while it was fresh, it’s homed as a named library concept rather than left to ossify as kaval-private before P3’s remote pair builds on the same transport:

P2.7 — skip redundant provisioning. shipped #1377 Reproduced against a pu box: kaval-tui list --host run twice in a row re-provisions on the second call too — nix copy --derivation prints “copying 0 paths”, the realise rebuilds nothing, the pin re-points an identical GC root. provisionAgent (packages/surface-nix-host/src/nixCopy.ts) runs copy → realise → pin unconditionally, and HostSession.spawn calls it on every dial, so a warm host pays the full ~30s provision — roughly half of it (copy + realise + pin) provably wasted — just to list its terminals. The fix keeps the function’s job — ensure the agent’s closure is realised + pinned on the host — but reaches that postcondition with less work:

provisionAgent(host, drv) — the one 'ensure' verbkaval --stdio AND drishti both dial it1 ssh · nix-store --realise <drv> --add-root … --indirectinstant when the closure is already presentWARM — present: repin + return outDONE · copy skipped (1 ssh)COLD — drv absent on hostnix copy --derivation → realise → pin okfast-fail → fall through
provisionAgent gains a warm fast-path: try the cheap realise+pin FIRST (nix makes it instant when the closure is already on the host, and it doubles as the presence check + the GC-root refresh); only fall through to the full nix-copy when nix reports the closure isn't there. Keyed on nix's own store-realisability — not on reading the moving-result GC root, which can dangle past a store GC or a box reimage.

P2.8 — multiplex the ssh connections. shipped #1378 P2.7 removed the redundant work, but the warm path still opened three separate ssh connections — the arch probe (resolveSystem), the provision check (P2.7’s nix-store --realise probe), and the agent dial (ssh … kaval --stdio) — each paying its own handshake because nothing reused the tunnel. That per-connection cost, not the work, became the dominant warm-path latency. The fix is ssh connection multiplexing, and it needs no change to the user’s ~/.ssh/config: every ssh kolu spawns already draws its options from one place — SSH_OPT_PAIRS in host.ts, rendered into both the -o argv (SSH_COMMON_OPTS) and the NIX_SSHOPTS env that nix copy’s own ssh reads — so the ControlMaster opts are just rendered alongside them (kept in their own controlMaster.ts module since, unlike the static keepalive policy, the ControlPath is env-derived and its dir is a side effect).

kaval-tui --host — one dialssh ControlMaster — ONE tunnel(opened by the first op; ControlPersist keeps it warm)arch probe · resolveSystemprovision check · nix-store --realise (P2.7)agent dial · kaval --stdio -o ControlMaster=auto+ a kolu-private ControlPathchannel ~50mschannel ~50mschannel ~50ms
One ssh ControlMaster tunnel, three riders. The first op (the arch probe) opens the master; ControlPersist keeps it warm so the provision check and the agent dial ride it as near-instant channels instead of fresh ~5s handshakes. All via -o flags on the commands kolu already spawns — the ControlPath is a kolu-private socket, never ~/.ssh.

Measured (real ssh, median of 7, the exact keepalive + ControlMaster opts the code emits). The warm path issues 3 ssh round-trips; multiplexing turns 2 of the 3 handshakes into channel reuse. The per-connection saving is the host’s handshake latency, so it scales with distance:

hostfresh handshakemultiplexed channelwarm-path ssh cost: before (3×handshake) → after (1 handshake + 2 channels)
linux pu box (kolu-ci-1, Tailscale)5.42 s0.70 s16.3 s → 6.8 s (2.4×)
darwin (rasam, Tailscale WAN)2.89 s0.48 s8.7 s → 3.9 s (2.25×)

(This isolates the ssh-connection portion of a warm dial — the only thing P2.8 changes; the realise probe’s own remote work is unchanged from P2.7.)

The remote process model (P3)

kaval and kolu-watcher are two separate processes — forced, not incidental. kaval is the durable survivor (it must outlive deploys); kolu-watcher (provider DAG + fs/git) re-runs fresh (it must be the current build’s code — the #1031 line); one process cannot be both. Locally that role isn’t even a separate process — the DAG + fs/git watchers run inside kolu-server (local.ts: “the DAG runs HERE, in kolu-server”), with kaval the separate daemon. Remotely, kolu-server isn’t on the host, so it splits out as its own kolu-watcher process beside kaval.

And because kaval serves a unix socket (serveOverUnixSocket), reaching it over the network needs a remote stdio endpoint — and a HostSession is one ssh subprocess per host. So the remote host exposes one ssh endpoint — kolu-watcher: kolu-server dials it over one HostSession stdioLink, and kolu-watcher locally fronts kaval (its unix socket) for PTY attach/control + taps while serving its own fs/git + metadata surface (mirrorRemoteCollection). The closure is shipped first via provisionAgent (nix copy --derivation → remote realise → pin a GC root). The diagram shows all three boundaries — host (dashed), process (boxed), package (packages/…):

Boundaries:hostprocesspackages/…= packagehost · local — where kolu-server runsprocess · browserpackages/clientthe canvas (SolidJS)process · kolu-serverpackages/serverpackages/surface-daemon-supervisorpackages/surface-nix-hostpackages/integrations/gitlocal tiles: the provider DAG + fs/git watchers run HEREwsssh · HostSession (stdioLink)pty-host proxied · fs/git · metadatahost · remote (prod) — one ssh endpoint: kolu-watcher, fronting kavalprocess · kolu-watcher — re-runs fresh (P3)packages/server (R-1.6 provider DAG)packages/integrations/gitpackages/surfacefronts kaval; runs the DAG + native fs/git;serves pty-host + fs/git over sshprocess · kaval — durable · SURVIVESpackages/kavalpackages/surface-daemonpackages/terminal-protocolfds + OSC taps; single-instancepid-gate; serves a unix socketunix sockettaps + pty-host

If the host already has a kaval running, it is adopted — the same generic adoptOrEnsure spine (endpoint.ts), per endpoint, that the local boot uses. A duplicate is impossible by construction: a second launch hits the pid-gate’s atomic link(2) EEXIST, liveness-probes the holder (kill(pid,0)), and exits 0 (“yielding to the live instance”) without binding a socket — the running daemon is never disturbed. Re-provisioning is idempotent (nix realise no-ops if the closure is already present; an older build coexists and the moving GC root migrates to the new one). On dial:

the remote kaval is…kolu does
absentprovision + spawn fresh (adoptOrEnsure → ensure)
live · wire-compatibleadopt — connect, never kill; its PTYs reconcile in (older startedAt)
live · a build behindadopt + the B3.4 “update pending” nudge (an orthogonal axis)
live · contract-skewedrecycle (kill → respawn) — only on a typed DaemonContractSkewError, no retry
unreachable (bad dial)bounded retry, then degraded — never killed (a bad dial can’t cost live PTYs)

One kaval per remote host, adopted. Dialing a host joins its kaval — you reconnect to “the prod terminals” — and never clobbers it. The local per-port key (kaval-<port>/, #1313 ), which isolates two dev servers under always-recycle, isn’t needed across ssh: a remote host runs one shared kaval and dials adopt it (the table above), so two reconnects join the same terminals rather than racing to kill them. (The pid-gate is host-local — pid + filesystem on that host — not a cross-host or cross-container lock.)

Status

Accepted (the approach); P0 + P1 + P2 shipped. R-2 in remote-terminals is committed to this kaval-endpoint substrate — one backend, no complecting. P0 (backend cleanup) landed ( #1364 ): the TerminalEndpoint interface is now the live seam, its own doc-comment naming P3’s remote transport. P1 (kaval-tui create) landed ( #1370 ): the spike’s create → attach half works locally. P2 (kaval-tui --host) landed ( #1373 ): the genuinely-new ssh reach + provision works end to end, with the remote daemon durable behind kaval --stdio. P2.5 landed (spun out of P2’s review, right after #1373) — upstreamed kaval --stdio’s durable-fronting bridge to @kolu/surface-daemon as frontDaemonOverStdio, the durable counterpart to serveOverStdio (the drishti companion is a verified-compatible pin bump, not a durable-observer rewrite — that was assessed and set aside as illusory benefit + an identity conflict). P2.7 (skip redundant provisioning) shipped ( #1377 ) — a warm-host fast-path in surface-nix-host’s shared provisionAgent so kaval and drishti stop re-shipping an already-present closure on every dial. P2.8 (multiplex the ssh connections) shipped ( #1378 ) — shares one ssh tunnel (ControlMaster=auto, no user config) across the warm path’s three round-trips + caches the arch probe, the deferred per-connection-cost follow-up P2.7 named (measured ~2.3–2.4× on the warm dial’s ssh cost). Next is P3 (kolu dials remotes) — the N-endpoint registry reusing this same getHostSession driver — and P4 builds multiplex and validates the feel against tmux-switch, the one reversible call, made on real use, not up front.