pulam-web: a dead mirror lies as an empty fleet — surface the connection
pulam-web shows a green connected dot + no terminals when the backend↔remote ssh mirror is actually down/failed (build mismatch, unreachable host). Root cause — the browser only ever sees its own socket health, never the mirror's. Fix (one PR) — graduate a thin onState→cell projection into @kolu/surface-nix-host, declare a gate-closed connection cell on the surface, pipe the session's health into it, and gate the browser UI on it (drishti's proven shape, now shared).
pulam-web can sit in a state where a host reads as a healthy, empty fleet when its data link is dead — a green “connected” dot and “0 terminals / no terminals”, while that host actually has live terminals running on it. It is a confident lie: nothing on screen says “this host is unreachable”. Reproduced in the wild by a kolu build-version mismatch between the two ends of the mirror (gist: remote kaval speaks pty-host 3.2, local pulam needs 3.3, so the remote agent exits and the session gives up — terminal failed — yet the browser stays green). Filed as #1564. The mismatch is only the trigger; any dead backend↔remote leg collapses into the same silent empty state.
The fix is small and already proven next door: drishti does not have this bug — it surfaces the mirror’s health to the browser as a first-class connection cell and gates its UI on it. pulam-web is the lone consumer that never plugged in. Shipped in #1568 (with the linked drishti gate PR adopting the shared cell).
User-facing description
The host card’s status indicator and body are now driven by the mirror’s real health, not by the browser’s own socket. The shape mirrors kolu’s own connection language and drishti’s exactly:
copying/connecting/disconnectedare in-flight — an amber, pulsing dot with a live status line (Copying agent to remote…,Connecting… 18s,Host unreachable — retrying…). The browser is honestly told work is happening; no terminal list is painted yet.failedis terminal — a solid red dot and a card carrying the real failure:lastError, the tail of the connection log (the gist’spty-host 3.2 vs 3.3line lands here verbatim), and a ↻ Reconnect button (the only recovery short of a reload).connectedis the only state that paints the terminal list — and the only state in which “no terminals” is allowed to mean an empty host (seezestabove).
Architecture-level changes
The volatility is already a receptacle — pulam-web just never plugged in. The hard, changing thing here is the remote link’s lifecycle: nix copy, ssh dial, reconnect/backoff, the connect watchdog, the network-vs-remote failure-cause split, the give-up-into-failed. That volatility was lifted into @kolu/surface-nix-host long ago — HostSession exposes it as a domain-agnostic, snapshot-then-delta onState callback carrying HostSessionState (connection · lastError · failureCause · progressLines · remoteProgressLines). The mirror loop (pumpRemoteSurface) graduated out of drishti into the same package. What pulam-web is missing is the wiring that carries that already-owned state onto the browser surface — and this PR graduates the common pieces of that wiring so a third consumer can’t neglect it either: a composable connection cell fragment (schema + gate-closed default) apps compose at the mirror seam via mirroredSurface(base) — never hand-spread (the seam reserves the connection name and throws on a collision) — plus the node-side onState → cell projection. Only the UI and the per-site cell implementation stay app-local (see the verdict below).
drishti is the worked reference, three pieces:
- A browser-facing
connectioncell on its surface, seededDEFAULT_CONNECTIONwithstate: "connecting"— gate-closed by default, so “healthy-empty before the first real frame” is structurally unrepresentable (drishti common/src/surface.ts). - The parent pipes
session.onStatestraight into that cell (router.ts—session.onState(s => connection.set({ state: s.connection, lastError, failureCause, progressLines }))). The agent serves an inert stub of the cell; only the parent writes it. - The client gates all content on
state === "connected"and renders a state-driven dot + overlay otherwise (connectionColors.ts’sSTATEmap +ConnectingOverlay/FailedCard).
Implementation details
One PR, threading seams that already exist — the new state rides the same per-host ws the awareness collection already uses, so there’s no new socket and no second surface. (Plus the surface gate’s linked drishti PR, step 7 — mandated, not deferred.)
1 — Add the composable cell to a new browser-safe @kolu/surface-nix-host/connection subpath, plus the node-side projection on the main entry. Two faces, split by the browser boundary:
- New subpath
@kolu/surface-nix-host/connection(newexportsmap entry; imports onlyzod+@kolu/surface— no node/ssh code, so it’s safe in the browser bundle):CONNECTION_STATES(the literal tuple),ConnectionInfo(type),ConnectionInfoSchema(zod),DEFAULT_CONNECTION(state: "connecting"— the gate-closed default),connectionCell— the get-only{ schema, default, verbs }descriptor — andmirroredSurface(base), the composer that addsconnectionCellat the mirror seam (reserving theconnectionname). Apps reach formirroredSurface(base), not a hand-spread ofconnectionCellintocells. To keep one source of truth, move theConnectionStateliteral tuple here and havehostSession.tsderive itsConnectionStatetype from it (typeof CONNECTION_STATES[number]). - Main entry
@kolu/surface-nix-host(node-side, no new runtime dep — pure TS over the existingsession.onState,hostSession.ts:265):projectConnection(s: HostSessionState): ConnectionInfo(state: s.connection,progressLines: [...s.progressLines], …) andpipeSessionStateToCell(session, set): () => void(session.onState(s => set(projectConnection(s))), returning the unsubscribe).
// @kolu/surface-nix-host/connection — browser-safe (zod + @kolu/surface only)
export const CONNECTION_STATES = [
"copying", "connecting", "connected", "disconnected", "failed",
] as const;
export const ConnectionInfoSchema = z.object({
state: z.enum(CONNECTION_STATES),
lastError: z.string().nullable(),
failureCause: z.enum(["network", "remote"]).nullable(),
progressLines: z.array(z.string()).readonly(),
});
export type ConnectionInfo = z.infer<typeof ConnectionInfoSchema>;
// Gate-closed by default: a freshly-composed cell reads "connecting", so
// "healthy-empty before the first frame" is structurally unrepresentable.
export const DEFAULT_CONNECTION: ConnectionInfo = {
state: "connecting", lastError: null, failureCause: null, progressLines: [],
};
// The composable cell — the fragment `mirroredSurface(base)` spreads in at the
// mirror seam; apps compose via `mirroredSurface`, never hand-spread this.
// `verbs: ["get"]` makes it READ-ONLY over the wire: the parent host owns it
// (writes server-side off `session.onState`), so a remote client must never be
// able to `connection.set` the health to `connected` and forge the gate's
// signal. Without it, a no-`patchSchema` cell defaults to `["get", "set"]`.
export const connectionCell = {
schema: ConnectionInfoSchema,
default: DEFAULT_CONNECTION,
verbs: ["get"],
} as const;
// @kolu/surface-nix-host (node main entry)
import type { ConnectionInfo } from "./connection";
export const projectConnection = (s: HostSessionState): ConnectionInfo => ({
state: s.connection,
lastError: s.lastError,
failureCause: s.failureCause,
progressLines: [...s.progressLines],
});
// The dual of pumpRemoteSurface (which streams DATA out); this streams STATE
// out. Returns the unsubscribe.
export const pipeSessionStateToCell = <C>(
session: HostSession<C>,
set: (info: ConnectionInfo) => void,
): (() => void) => session.onState((s) => set(projectConnection(s)));
2 — Compose the cell at the mirror seam, NOT into the base surface. The base terminalWorkspaceSurface (packages/terminal-workspace/src/surface.ts) stays connection-free — link health is not a property of the daemon’s own surface (a direct/local link has no remote to be down). pulam-web’s browser/mirror surface wraps it: pulamSurface = mirroredSurface(terminalWorkspaceSurface) (packages/pulam-web/src/shared/contract.ts). mirroredSurface adds the gate-closed get-only connection cell; the gate-closed state: "connecting" seed comes baked in. drishti composes the same mirroredSurface(base) at its own mirror (the step-7 companion). Only the browser-safe @kolu/surface-nix-host/connection subpath + import types reach the client bundle (the node main entry never does).
// packages/pulam-web/src/shared/contract.ts
import { mirroredSurface } from "@kolu/surface-nix-host/connection";
import { terminalWorkspaceSurface } from "@kolu/terminal-workspace/surface";
// The browser-facing surface = base + the get-only `connection` cell.
export const pulamSurface = mirroredSurface(terminalWorkspaceSurface);
3 — Only the re-serve implements the cell; the daemon serves the connection-free base. There is no inert per-site stub. The daemon (packages/pulam/src/daemon.ts) and every direct/local serve implement terminalWorkspaceSurface, which has no connection cell — fail-fast doesn’t ask for one. Only pulam-web’s re-serve implements the augmented pulamSurface, backing connection with a seeded local store (seedConnectionCell()) it writes from the session. The store is NOT folded by the mirror sink (it’s the session’s state, not the daemon’s data) and writes go through the framework-wrapped ctx.cells.connection.set (persist + PUBLISH the delta) so a browser already subscribed across a reconnect hears the new state.
// packages/pulam-web/src/server/reserve.ts — re-serve of the MIRRORED surface
const connection = seedConnectionCell(); // gate-closed "connecting" seed
const fragment = implementSurface(pulamSurface, {
cells: { version: { store: versionStore }, connection },
// collections / streams folded/forwarded from the daemon's base surface…
});
// Expose the framework-wrapped setter on the ReServe result:
setConnection: (info: ConnectionInfo) => fragment.ctx.cells.connection.set(info),
4 — The pump carries session state into the cell — by construction. pumpRemoteSurface (the reconnect-mirror loop) takes a connection setter; passing it makes the pump wire pipeSessionStateToCell(session, set) itself for the session’s life. So packages/pulam-web/src/server/hostEntry.ts doesn’t pipe it by hand — it hands the setter to the pump, and the session’s existing lifecycle (first-version markConnected, the connect watchdog, give-up-into-failed) drives every transition. Pumping a session carries its health by construction (#1564), so it can’t be wired wrong.
// packages/pulam-web/src/server/hostEntry.ts
const reServe = buildReServe({ log: hostLog });
void pumpRemoteSurface({
source: terminalWorkspaceSurface,
session,
makeSink: () => reServe.makeSink(() => session.markConnected()),
// …live holders + onLinkDown reset…
connection: { set: reServe.setConnection }, // ← pump wires pipeSessionStateToCell
});
5 — Gate the client on the cell, not on version.pending(). In packages/pulam-web/src/client/HostGroup.tsx: read app.cells.connection.use() and make the top-level gate connection.state === "connected" — replacing the version.pending() “connecting…” gate (~line 289) and the awareness.keys().length “no terminals” decision (~line 297). Off-connected renders the state-driven body: provisioning / connecting+elapsed / reconnecting (refined by failureCause) / the failed card (lastError + progressLines tail + a Reconnect button that hits a small POST /api/reconnect?host= route calling registry.getSession(host).reconnect()). Only connected reaches the existing awareness rendering, where “no terminals” is finally honest. The header dot reads a pulam-web-local STATE map keyed by connection.state (its own palette — the UI the panel keeps app-local). statusForHost (the browser↔backend socket) stays a secondary indicator and no longer decides healthy-vs-empty. DEFAULT_VERSION keeps its real job (a version snapshot) and simply stops being abused as a link-live proxy.
// packages/pulam-web/src/client/HostGroup.tsx
const connection = app.cells.connection.use({ onError });
const info = (): ConnectionInfo => connection.value() ?? DEFAULT_CONNECTION;
<Show
when={info().state === "connected"}
fallback={<ConnectionView info={info()} host={props.host} />}
>
{/* the existing awareness rendering — "no terminals" is honest only here */}
</Show>
6 — Hermetic test. Extend reserve.test.ts (the existing agent→mirror→re-serve→browser-store proof): drive a session.onState sequence ending in failed with no awareness keys, and assert the browser-consumed connection cell reads failed with its lastError — the surface carries the down state — instead of the old empty/healthy path. Goes red if the gate ever reverts to the socket/version proxy. Visual proof is pulam-web over a real ws against a deliberately build-mismatched host (a chrome-devtools or e2e still of the failed card).
7 — The linked drishti PR (surface gate — ships with this one). Adding exports to @kolu/surface-nix-host is API-facing, so .claude/rules/surface.md requires a linked drishti PR with green CI. drishti already hand-rolls both halves — its own ConnectionSchema/DEFAULT_CONNECTION in drishti-common and the inline session.onState(s => cell.set({ state: s.connection, … })) in router.ts. The companion bumps drishti’s kolu pin and replaces them: wrap its mirror surface in mirroredSurface(base) (dropping its copy of the schema + default and its hand-spread cell) and let the pump wire pipeSessionStateToCell / projectConnection (dropping the inline mapping). Its green CI proves the lift behaviour-preserving — and makes drishti the second consumer that earns the extraction.
Unaffected, stated plainly: kolu’s Code tab and pulam-tui consume terminalWorkspaceSurface directly and track their link in-process. The base surface stays connection-free, so there is no new cell for them to read and nothing to stub — they build unchanged. The connection cell exists only on mirroredSurface(base), which only a re-serving parent (pulam-web, drishti) serves.
Done when a host whose mirror is copying/connecting/disconnected/failed renders an honest state (failure cause + Reconnect on failed) instead of green + “no terminals”; a genuinely-empty connected host still reads “no terminals”; the hermetic test asserts the failed mirror reaches the browser as a down state; and both kolu and the linked drishti PR are green. Issue #1564 tracks it.
The follow-on layer: client.health() — one complete fact, two policies
The connection cell above kills the one lie #1564 was filed for (a dead mirror painting green). But it left a class of the same lie open one level down: a surfaceClient runs many subscriptions, and any of them can be silently dead — a cell that 500s on resubscribe, a raw snapshot feed that stalled — while the surface still paints as if whole. The cell answers “is the mirror up?”; it can’t answer “is every subscription the browser depends on actually live?”. So the same review run grew a second, lower primitive: client.health() — a total subscription-health fact, { live: boolean, subs: [{ name, pending, error }] }. Every framework subscription enrols at its birth site (cell, the keys-stream, each per-key value, a stream); a raw stream joins structurally through client.rawStream(name, proc, input, { onItem }), which throws if driven outside a reactive owner — so a raw stream cannot silently escape the fact the way a hand-rolled loop could. Transport liveness is one leg, threaded by the socket owner: connectSurface passes { live: () => status() === "live" } off its own createSocketStatus, so health().live is the real socket state, not a constant true — and, as of the round-5 collapse, it carries more than the socket (see the callout below). The socket owner is now connectSurface (single-surface) or connectSurfaces (the multi-surface seam: one socket → a surfaceClients bundle + one merged fact), each wiring a default-on half-open heartbeat that probes the reserved system.live, so live means bytes are flowing, not merely no close event fired — a silent half-open ws reads not-live without any consumer hand-building a watchdog.
The fact carries no triage — no “connecting vs degraded” verdict, no human string. That precedence is policy, and policy is a separate primitive: <SurfaceGate>, which derives connecting > degraded > ready from the fact in exactly one place. Its default is stale-while-degraded (a sub erroring keeps the last-good children on screen, with a non-blocking notice) — the gentler of the two policies, and the right default for a fleet board; hard-gating (blank the surface on any sub error) is the explicit opt-in. The split exists because two real consumers disagree on the policy, and a framework that bakes one in forces the other to hand-roll a parallel gate (the very thing #1564 was):
- pulam-web hard-gates.
HostGroupmounts<SurfaceGate ready={hostBodyReady}>whose predicate is justh.live && no sub errors— and as of round-5h.livealready carries the mirror state: theconnectioncell’sliveWhenpredicate AND-foldsstate === "connected"intoliveby construction (the callout below), so the gate no longer hand-ANDsconnInfo.state === "connected". A half-open/reconnecting ws or a non-connectedmirror both fail the gate closed off that one boolean. A persistent error must win over the body and never collapse to a healthy-looking empty host (the #1524 lesson), so it is the outermost gate. (A transient error self-heals: each sub’serror()clears on its next frame, so the host recovers without a reload — the zest launchd-restart fix.) The samehostBodyReadypredicate also governs the header dot, because the dot is now the shared<HostStatusPip>(callout below) — one verdict, gate and dot. - drishti renders stale-while-degraded. It mounts its OWN
<SurfaceGate>with the framework default policy (stale-while-degraded): a sub error or a transport blip keeps the body visible under a non-blocking amber notice, the opposite of pulam-web’s hard gate over the same fact. It joins its raw metric feed structurally viaapp.rawStream(so the throw guards the real adopter, not just the example), and folds its admin-vs-app sibling clients into one fact withsurfaceClientsHealth(Leak D) — now threading the admin socket’slive, so a dead control plane flips the merged fact, not a constanttrue.