← the Atlas

pulam-web: a dead mirror lies as an empty fleet — surface the connection

Bugs·seedling·implemented·

pulam-web shows a green connected dot + no terminals when the backend↔remote ssh mirror is actually down/failed (build mismatch, unreachable host). Root cause — the browser only ever sees its own socket health, never the mirror's. Fix (one PR) — graduate a thin onState→cell projection into @kolu/surface-nix-host, declare a gate-closed connection cell on the surface, pipe the session's health into it, and gate the browser UI on it (drishti's proven shape, now shared).

pulam-web can sit in a state where a host reads as a healthy, empty fleet when its data link is dead — a green “connected” dot and “0 terminals / no terminals”, while that host actually has live terminals running on it. It is a confident lie: nothing on screen says “this host is unreachable”. Reproduced in the wild by a kolu build-version mismatch between the two ends of the mirror (gist: remote kaval speaks pty-host 3.2, local pulam needs 3.3, so the remote agent exits and the session gives up — terminal failed — yet the browser stays green). Filed as #1564. The mismatch is only the trigger; any dead backend↔remote leg collapses into the same silent empty state.

The fix is small and already proven next door: drishti does not have this bug — it surfaces the mirror’s health to the browser as a first-class connection cell and gates its UI on it. pulam-web is the lone consumer that never plugged in. Shipped in #1568 (with the linked drishti gate PR adopting the shared cell).

User-facing description

Today — the lie (mirror is failed, build mismatch)
pureintent· 0 terminals
no terminals
After — honest states, driven by the mirror's real health
pureintent· provisioning…provisioning agent…
Copying agent to remote… (nix copy)
pureintent· connecting…connecting…
Connecting… 18s
pureintent· reconnecting…reconnecting…
Host unreachable — retrying…
pureintent· failedfailed
Remote connection failed
Gave up after repeated connection failures.
kaval speaks pty-host 3.2, pulam needs 3.3 — run them from the same build remote agent exited (code=1)
↻ Reconnect
zest· 0 terminals
no terminals
↑ the only place “no terminals” is honest: a genuinely-connected host with an empty roster.

The host card’s status indicator and body are now driven by the mirror’s real health, not by the browser’s own socket. The shape mirrors kolu’s own connection language and drishti’s exactly:

Architecture-level changes

pulam-web is a 3-tier bridge — only tier-1 health reaches the browser browser SolidJS · surfaceClient renders the host card pulam-web backend re-serves the surface owns the HostSession remote pulam the real terminals over ssh / stdio ws ssh stdio tier-1 health — browser↔backend socket connectSurface status (connecting · live · reconnecting · down) — ✓ already surfaced tier-2 health — backend↔remote mirror (HostSession.onState) copying → connecting → connected → disconnected → failed (+ failureCause, progressLines) TODAY: never crosses to the browser → a dead mirror paints green + “no terminals” FIX: a `connection` cell parent writes session.onState → browser gates on it drishti already does this (its reference shape); pulam-web is the lone consumer that never plugged in. The two health channels stay SEPARATE — conflating socket status with mirror health is the bug.
pulam-web is browser ↔ backend ↔ remote pulam. Tier-1 health (the browser↔backend socket) is already surfaced. Tier-2 health (the backend↔remote mirror, tracked by HostSession.onState as copying→…→failed) lives on the backend and TODAY never crosses to the browser — so a dead mirror paints green + “no terminals”. The fix adds a `connection` cell the parent writes from session.onState and the browser gates on. drishti already does exactly this.

The volatility is already a receptacle — pulam-web just never plugged in. The hard, changing thing here is the remote link’s lifecycle: nix copy, ssh dial, reconnect/backoff, the connect watchdog, the network-vs-remote failure-cause split, the give-up-into-failed. That volatility was lifted into @kolu/surface-nix-host long ago — HostSession exposes it as a domain-agnostic, snapshot-then-delta onState callback carrying HostSessionState (connection · lastError · failureCause · progressLines · remoteProgressLines). The mirror loop (pumpRemoteSurface) graduated out of drishti into the same package. What pulam-web is missing is the wiring that carries that already-owned state onto the browser surface — and this PR graduates the common pieces of that wiring so a third consumer can’t neglect it either: a composable connection cell fragment (schema + gate-closed default) apps compose at the mirror seam via mirroredSurface(base) — never hand-spread (the seam reserves the connection name and throws on a collision) — plus the node-side onState → cell projection. Only the UI and the per-site cell implementation stay app-local (see the verdict below).

drishti is the worked reference, three pieces:

  1. A browser-facing connection cell on its surface, seeded DEFAULT_CONNECTION with state: "connecting"gate-closed by default, so “healthy-empty before the first real frame” is structurally unrepresentable (drishti common/src/surface.ts).
  2. The parent pipes session.onState straight into that cell (router.tssession.onState(s => connection.set({ state: s.connection, lastError, failureCause, progressLines }))). The agent serves an inert stub of the cell; only the parent writes it.
  3. The client gates all content on state === "connected" and renders a state-driven dot + overlay otherwise (connectionColors.ts’s STATE map + ConnectingOverlay/FailedCard).

Implementation details

One PR, threading seams that already exist — the new state rides the same per-host ws the awareness collection already uses, so there’s no new socket and no second surface. (Plus the surface gate’s linked drishti PR, step 7 — mandated, not deferred.)

1 — Add the composable cell to a new browser-safe @kolu/surface-nix-host/connection subpath, plus the node-side projection on the main entry. Two faces, split by the browser boundary:

// @kolu/surface-nix-host/connection — browser-safe (zod + @kolu/surface only)
export const CONNECTION_STATES = [
  "copying", "connecting", "connected", "disconnected", "failed",
] as const;

export const ConnectionInfoSchema = z.object({
  state: z.enum(CONNECTION_STATES),
  lastError: z.string().nullable(),
  failureCause: z.enum(["network", "remote"]).nullable(),
  progressLines: z.array(z.string()).readonly(),
});
export type ConnectionInfo = z.infer<typeof ConnectionInfoSchema>;

// Gate-closed by default: a freshly-composed cell reads "connecting", so
// "healthy-empty before the first frame" is structurally unrepresentable.
export const DEFAULT_CONNECTION: ConnectionInfo = {
  state: "connecting", lastError: null, failureCause: null, progressLines: [],
};

// The composable cell — the fragment `mirroredSurface(base)` spreads in at the
// mirror seam; apps compose via `mirroredSurface`, never hand-spread this.
// `verbs: ["get"]` makes it READ-ONLY over the wire: the parent host owns it
// (writes server-side off `session.onState`), so a remote client must never be
// able to `connection.set` the health to `connected` and forge the gate's
// signal. Without it, a no-`patchSchema` cell defaults to `["get", "set"]`.
export const connectionCell = {
  schema: ConnectionInfoSchema,
  default: DEFAULT_CONNECTION,
  verbs: ["get"],
} as const;
// @kolu/surface-nix-host (node main entry)
import type { ConnectionInfo } from "./connection";

export const projectConnection = (s: HostSessionState): ConnectionInfo => ({
  state: s.connection,
  lastError: s.lastError,
  failureCause: s.failureCause,
  progressLines: [...s.progressLines],
});

// The dual of pumpRemoteSurface (which streams DATA out); this streams STATE
// out. Returns the unsubscribe.
export const pipeSessionStateToCell = <C>(
  session: HostSession<C>,
  set: (info: ConnectionInfo) => void,
): (() => void) => session.onState((s) => set(projectConnection(s)));

2 — Compose the cell at the mirror seam, NOT into the base surface. The base terminalWorkspaceSurface (packages/terminal-workspace/src/surface.ts) stays connection-free — link health is not a property of the daemon’s own surface (a direct/local link has no remote to be down). pulam-web’s browser/mirror surface wraps it: pulamSurface = mirroredSurface(terminalWorkspaceSurface) (packages/pulam-web/src/shared/contract.ts). mirroredSurface adds the gate-closed get-only connection cell; the gate-closed state: "connecting" seed comes baked in. drishti composes the same mirroredSurface(base) at its own mirror (the step-7 companion). Only the browser-safe @kolu/surface-nix-host/connection subpath + import types reach the client bundle (the node main entry never does).

// packages/pulam-web/src/shared/contract.ts
import { mirroredSurface } from "@kolu/surface-nix-host/connection";
import { terminalWorkspaceSurface } from "@kolu/terminal-workspace/surface";

// The browser-facing surface = base + the get-only `connection` cell.
export const pulamSurface = mirroredSurface(terminalWorkspaceSurface);

3 — Only the re-serve implements the cell; the daemon serves the connection-free base. There is no inert per-site stub. The daemon (packages/pulam/src/daemon.ts) and every direct/local serve implement terminalWorkspaceSurface, which has no connection cell — fail-fast doesn’t ask for one. Only pulam-web’s re-serve implements the augmented pulamSurface, backing connection with a seeded local store (seedConnectionCell()) it writes from the session. The store is NOT folded by the mirror sink (it’s the session’s state, not the daemon’s data) and writes go through the framework-wrapped ctx.cells.connection.set (persist + PUBLISH the delta) so a browser already subscribed across a reconnect hears the new state.

// packages/pulam-web/src/server/reserve.ts — re-serve of the MIRRORED surface
const connection = seedConnectionCell(); // gate-closed "connecting" seed
const fragment = implementSurface(pulamSurface, {
  cells: { version: { store: versionStore }, connection },
  // collections / streams folded/forwarded from the daemon's base surface…
});
// Expose the framework-wrapped setter on the ReServe result:
setConnection: (info: ConnectionInfo) => fragment.ctx.cells.connection.set(info),

4 — The pump carries session state into the cell — by construction. pumpRemoteSurface (the reconnect-mirror loop) takes a connection setter; passing it makes the pump wire pipeSessionStateToCell(session, set) itself for the session’s life. So packages/pulam-web/src/server/hostEntry.ts doesn’t pipe it by hand — it hands the setter to the pump, and the session’s existing lifecycle (first-version markConnected, the connect watchdog, give-up-into-failed) drives every transition. Pumping a session carries its health by construction (#1564), so it can’t be wired wrong.

// packages/pulam-web/src/server/hostEntry.ts
const reServe = buildReServe({ log: hostLog });
void pumpRemoteSurface({
  source: terminalWorkspaceSurface,
  session,
  makeSink: () => reServe.makeSink(() => session.markConnected()),
  // …live holders + onLinkDown reset…
  connection: { set: reServe.setConnection }, // ← pump wires pipeSessionStateToCell
});

5 — Gate the client on the cell, not on version.pending(). In packages/pulam-web/src/client/HostGroup.tsx: read app.cells.connection.use() and make the top-level gate connection.state === "connected" — replacing the version.pending() “connecting…” gate (~line 289) and the awareness.keys().length “no terminals” decision (~line 297). Off-connected renders the state-driven body: provisioning / connecting+elapsed / reconnecting (refined by failureCause) / the failed card (lastError + progressLines tail + a Reconnect button that hits a small POST /api/reconnect?host= route calling registry.getSession(host).reconnect()). Only connected reaches the existing awareness rendering, where “no terminals” is finally honest. The header dot reads a pulam-web-local STATE map keyed by connection.state (its own palette — the UI the panel keeps app-local). statusForHost (the browser↔backend socket) stays a secondary indicator and no longer decides healthy-vs-empty. DEFAULT_VERSION keeps its real job (a version snapshot) and simply stops being abused as a link-live proxy.

// packages/pulam-web/src/client/HostGroup.tsx
const connection = app.cells.connection.use({ onError });
const info = (): ConnectionInfo => connection.value() ?? DEFAULT_CONNECTION;

<Show
  when={info().state === "connected"}
  fallback={<ConnectionView info={info()} host={props.host} />}
>
  {/* the existing awareness rendering — "no terminals" is honest only here */}
</Show>

6 — Hermetic test. Extend reserve.test.ts (the existing agent→mirror→re-serve→browser-store proof): drive a session.onState sequence ending in failed with no awareness keys, and assert the browser-consumed connection cell reads failed with its lastError — the surface carries the down state — instead of the old empty/healthy path. Goes red if the gate ever reverts to the socket/version proxy. Visual proof is pulam-web over a real ws against a deliberately build-mismatched host (a chrome-devtools or e2e still of the failed card).

7 — The linked drishti PR (surface gate — ships with this one). Adding exports to @kolu/surface-nix-host is API-facing, so .claude/rules/surface.md requires a linked drishti PR with green CI. drishti already hand-rolls both halves — its own ConnectionSchema/DEFAULT_CONNECTION in drishti-common and the inline session.onState(s => cell.set({ state: s.connection, … })) in router.ts. The companion bumps drishti’s kolu pin and replaces them: wrap its mirror surface in mirroredSurface(base) (dropping its copy of the schema + default and its hand-spread cell) and let the pump wire pipeSessionStateToCell / projectConnection (dropping the inline mapping). Its green CI proves the lift behaviour-preserving — and makes drishti the second consumer that earns the extraction.

Unaffected, stated plainly: kolu’s Code tab and pulam-tui consume terminalWorkspaceSurface directly and track their link in-process. The base surface stays connection-free, so there is no new cell for them to read and nothing to stub — they build unchanged. The connection cell exists only on mirroredSurface(base), which only a re-serving parent (pulam-web, drishti) serves.

Done when a host whose mirror is copying/connecting/disconnected/failed renders an honest state (failure cause + Reconnect on failed) instead of green + “no terminals”; a genuinely-empty connected host still reads “no terminals”; the hermetic test asserts the failed mirror reaches the browser as a down state; and both kolu and the linked drishti PR are green. Issue #1564 tracks it.

The follow-on layer: client.health() — one complete fact, two policies

The connection cell above kills the one lie #1564 was filed for (a dead mirror painting green). But it left a class of the same lie open one level down: a surfaceClient runs many subscriptions, and any of them can be silently dead — a cell that 500s on resubscribe, a raw snapshot feed that stalled — while the surface still paints as if whole. The cell answers “is the mirror up?”; it can’t answer “is every subscription the browser depends on actually live?”. So the same review run grew a second, lower primitive: client.health() — a total subscription-health fact, { live: boolean, subs: [{ name, pending, error }] }. Every framework subscription enrols at its birth site (cell, the keys-stream, each per-key value, a stream); a raw stream joins structurally through client.rawStream(name, proc, input, { onItem }), which throws if driven outside a reactive owner — so a raw stream cannot silently escape the fact the way a hand-rolled loop could. Transport liveness is one leg, threaded by the socket owner: connectSurface passes { live: () => status() === "live" } off its own createSocketStatus, so health().live is the real socket state, not a constant true — and, as of the round-5 collapse, it carries more than the socket (see the callout below). The socket owner is now connectSurface (single-surface) or connectSurfaces (the multi-surface seam: one socket → a surfaceClients bundle + one merged fact), each wiring a default-on half-open heartbeat that probes the reserved system.live, so live means bytes are flowing, not merely no close event fired — a silent half-open ws reads not-live without any consumer hand-building a watchdog.

The fact carries no triage — no “connecting vs degraded” verdict, no human string. That precedence is policy, and policy is a separate primitive: <SurfaceGate>, which derives connecting > degraded > ready from the fact in exactly one place. Its default is stale-while-degraded (a sub erroring keeps the last-good children on screen, with a non-blocking notice) — the gentler of the two policies, and the right default for a fleet board; hard-gating (blank the surface on any sub error) is the explicit opt-in. The split exists because two real consumers disagree on the policy, and a framework that bakes one in forces the other to hand-roll a parallel gate (the very thing #1564 was):