Sleeping terminals · the kolu Atlas

You asked for Sleep: leave a Claude Code terminal blocked for days, sleep it (its PTY, xterm, WebGL context, and agent all released — gone, like closing it), and wake it later in place with the agent resumed.

The model is one move: make the terminal a sum — Terminal = active | sleeping. A dormant terminal carries the same persisted base — cwd, git, intent, its canvasLayout slot, the last agent command — so it keeps the canvas position, dock order, and persistence the live terminal had: it stays the same record under the same id in the one terminal registry the canvas already iterates, just without a PTY. Sleep flips its state flag in place and releases the live resources; wake flips it back and re-spawns through the path a server reboot already uses. Three phased PRs; the first is a zero-behavior-change foundation.

A terminal is active or sleeping

The fold already lives in the schema. surface.ts splits a terminal’s fields into a persisted base (cwd · git · intent · theme · canvasLayout · the last agent command — survives a restart) and a live overlay (agent status · foreground · live-PR · the PTY/xterm/attach handles — “never persisted; a restore must re-derive it”). That partition is active-vs-sleeping, so the sum maps onto bases that already exist — exactly one field is added:

const ActiveTerminalSchema =
  PersistedTerminalFieldsSchema.merge(LiveTerminalFieldsSchema)   // live overlay present
    .extend({ state: z.literal("active") });

const SleepingTerminalSchema =
  PersistedTerminalFieldsSchema                                   // base only — overlay absent by type
    .extend({ state: z.literal("sleeping"), sleptAt: z.number() });

const TerminalMetadataSchema =                                   // the wire / collection shape
  z.discriminatedUnion("state", [ActiveTerminalSchema, SleepingTerminalSchema]);

type Terminal =
  | ({ state: "active" }                    & PersistedTerminalFields & LiveTerminalFields)
  | ({ state: "sleeping"; sleptAt: number } & PersistedTerminalFields);

// Presence reads the union; touching a live field MUST narrow.
const placeTile  = (t: Terminal) => t.canvasLayout;             // both arms — no narrow
const routeInput = (t: Terminal) => {
  if (t.state !== "active") return;                             // compiler-forced narrow
  send(t /* now carries pr · agent · foreground + a live PTY */);
};

sleptAt is the sleeping arm’s analogue of the live overlay — the only new scalar. An active terminal is base + overlay; a sleeping terminal is base + sleptAt. Sleeping is one record whose state flag says whether its PTY is currently spawned — sleep clears the overlay and sets the flag, wake re-derives the overlay and clears the flag, and the id never changes.

kolu · canvas

apimain☾▢×

● claude · working

> running the load test to repro…

⎿ 1,204 reqs · 3 failures

↻ marching-ants aura on the border

apimain · asleepWake

fix the auth race that only repros…

> claude --model sonnet

⎿ analyzing 14 files…

✓ wrote a failing test

☾ frozen · asleep 3dPTY released

Two terminals, one asleep — same card, same chrome, both draggable and resizable. The sleeping one (right) is the SAME terminal record with its PTY released: a frozen last frame, dimmed, with ☾ and a Wake button. Click it to focus it like any terminal; Wake respawns its PTY and resumes the agent.

kolu · dock

api

●load testworking

◐review prawaiting you

☾auth raceasleep · 3dWake

One list. A sleeping terminal is a row like any other — same group, same selection, dimmed with a ☾ state pip. Clicking it FOCUSES it (it can be the active/selected tile); a small Wake brings it back. No separate section, because there is no separate kind.

Presence reads the union, liveness narrows

one registry · presence reads the union · liveness narrows

canvas · dock · minimap · arrange · cycle · switcher

read the Terminal union — presence (exists, on canvas, focusable, draggable, has a dock row)

↓

terminal registry — Terminal = active | sleeping (one store, stable id)

active → base + live overlay (PTY · xterm · agent)
sleeping → base + sleptAt (overlay absent by type)
setCanvasLayout · setTheme · rename → write the base of BOTH arms

↓state === "active" narrow

live fields — PTY/xterm · agent stream · input routing

active arm only

One registry holds the Terminal union under a stable id. Presence consumers read the union; a consumer that touches a live field must narrow state === 'active' — the compiler refuses a PTY/agent field on a bare terminal, so a sleeping terminal can sit on the canvas and in the MRU yet can never be an input or WebGL target. Sleep flips the state flag in place; the id, the layout slot, and the persisted base never move.

Putting the discriminant on the terminal buys two structural properties:

Type-safe presence vs liveness. A consumer that reads agent/foreground/pr/PTY/xterm must first narrow state === "active"; the compiler refuses a live field on the bare union. There is no live-only list to read from by mistake — so once a sleeping terminal reaches a presence surface (canvas, dock, minimap, arrange, cycle, switcher) it cannot be mis-rendered. Reaching it is a runtime fact, not a type one: the sum stops mis-rendering, it does not by itself deliver presence — a sleeping record appears because it is the same entry in the one registry the client already subscribes to, so it rides the existing id list with nothing extra. Input routing resolves inside the active narrow, so a sleeping terminal can be the active/selected/panned-to tile yet is never an input target. The WebGL budget keys on the active arm, so a sleeping terminal holds no WebGL context.
One persistence channel, one write sink. SavedTerminal is already PersistedTerminalFields + id — the sleeping arm’s exact payload minus sleptAt. A restored terminal and a slept terminal are the same on-disk shape, distinguished only by state, so the session snapshot serializes one list and the boot path rehydrates both arms through one seam. And because the record keeps its stable id, the ordinary write sinks (setCanvasLayout, setTheme, rename) find a sleeping entry and mutate its base in place — a sleeping tile drags, resizes, renames, and re-themes like any other, with only PTY input fenced (by type, not a guard).

The phases

Each is one reviewable PR; each leaves master shippable.

Phase	What lands	Why separable
1 — Seat the sum	Add `state`; flip `TerminalMetadataSchema` to a `discriminatedUnion`; presence surfaces read the `Terminal` union off the terminal store; ship with only the `active` arm constructed	Zero behavior change — a pure structural move that makes the narrowing seam exist before any sleep logic depends on it
2 — Sleep / Wake (in place)	Populate the `sleeping` arm: sleep flips `state→sleeping` on the same record (capturing the last agent command) and releases PTY/xterm/agent; wake flips it back and re-spawns through the existing session-restore path, resuming the agent exactly as a reboot does; the sleeping tile stays a full canvas citizen	The user-facing core; reuses the proven restore path, no separate store, no merge seam, no minted id
3 — Frozen screenshot body	Capture just before sleep, write under `KOLU_STATE_DIR`, serve through a small static image route; the reference rides the record — and the captured frame is what the live→frozen swap cross-fades into, so the sleep transition turns seamless here	Isolated surface — one capture, one route, one fade

(The original plan had a fourth phase — “unify wake with session-restore.” The stable-id model makes wake literally session-restore-of-one from the start, so that unification is no longer a separate step; it is how Phase 2 is built.)

Phase 1 — seat the sum (zero behavior change) · shipped

#1449. TerminalMetadataSchema flipped from a flat .merge to z.discriminatedUnion("state", …) with only the active arm constructed — the UX stayed pixel-identical while the state === "active" seam every later phase leans on came into being. The union flows on the client, where every liveness reader narrows through one activeArm seam, so a live field on a bare terminal no longer compiles. A state.ts migration stamps state: "active" on legacy records and bumps SCHEMA_VERSION — the one sanctioned place the default is supplied; read sites narrow, never coalesce.

Phase 2 — Sleep / Wake (in place)

#1487. Implemented exactly as planned: the one registry holds the Terminal union under a stable id (TerminalProcess is a discriminated process — the sleeping arm’s PTY handle is absent by type), sleep flips in place persist-before-kill, wake re-spawns on the same id and replays the observed lastAgentCommand through resumeAgentCommand, and boot re-seeds sleeping records (adopt-or-reap). The dormant tile surfaces the last-known context it was working — cwd and branch ride the persisted base, while the live PR is snapshotted onto the sleeping arm at sleep and discarded on wake (the PR sensor re-resolves it live). The journey e2e asserts the real outcomes — wake resumes the same conversation, drag a dormant tile then reload, reboot then wake, reboot mid-sleep converges — not counts. (An agent launched through a nix run …#agent wrapper — whose observed head token is nix, not the agent — is not resumed on wake; it wakes to a bare shell, tracked as #1492.)

Re-planned after a first cut (PR #1466, discarded)

The first Phase-2 implementation built the sleeping arm as an immutable record minted with a fresh id into a separate store. Hands-on testing surfaced two bugs in a minute — you couldn’t drag a sleeping tile, and waking one didn’t resume the agent — and an audit found 38 issues of the same class (15 high-severity). Both bugs root to the same choice: an immutable record in a separate store has no write sink (so drag / resize / rename / theme were disabled), and its thin schema stripped the agent into the live overlay (so wake resumed nothing). The tests passed because they asserted invariants (a dormant body renders, a record survives reboot) instead of journeys (sleep a Claude session → wake → keep talking; move a dormant tile). This revision keeps the type — the active | sleeping sum was always right — and replaces the mechanism: one mutable record, stable id, flipped in place, with wake reusing the path a reboot already runs.

Populate the sleeping arm by flipping a flag, not minting a record.

Sleep flips active → sleeping in place. It captures the agent’s resume input (the last agent command) onto the persisted base, flips the state flag on the same record under the same id, writes the session durably, then releases the PTY/xterm/WebGL/agent — persist before kill, so a crash mid-sleep loses nothing. No new id, no second store, no retire-the-predecessor: the record the canvas was already showing simply changes state, so the tile keeps its slot, dock order, selection, and id with zero swap.

Wake is session-restore-of-one — literally the path a reboot runs. kolu already rehydrates terminals on server restart: it re-spawns the PTY in the saved cwd and resumes the agent with resumeAgentCommand. With the persisted agentSession ref (juspay/kolu#1495) that resume targets the exact conversation that was running on this terminal — claude --resume <id>, codex resume <id>, opencode --session <id> — and falls back to the cwd-most-recent form (claude -c &c.) only when no session was ever captured. Wake flips the record back to active and replays that same path on the one record. So wake resumes your agent to exactly the degree a reboot does — the bar you already trust — with no bespoke sleep-only resume. The persisted base carries cwd + the last agent command + the conversation ref, which is everything that path needs; the in-place flip keeps them on the record by default.

One registry, one list — no merge seam. A sleeping terminal is the same entry in the one terminal registry, so it rides the one id list the client already subscribes to: no second store to union, no “three snapshot reads must each include sleeping” seam (the first cut’s most error-prone surface). Liveness is the state discriminant that one canonical classifier reads, so dock, minimap, switcher, and mobile all show a sleeping terminal coherently from a single source — no per-surface sleeping branch to forget.

A sleeping tile is a first-class canvas citizen. Because the record keeps its stable id, the normal write sinks find it and mutate in place — it drags, resizes, renames, and re-themes like any live tile. The only thing it can’t do is take PTY input, and that’s a type fact (the overlay is absent on the sleeping arm), not a runtime lockout. The first cut disabled these because an immutable record had nowhere to write the change — the reset removes the lockout by removing the immutability.

Sleep is manual, Wake is explicit, navigation never wakes. A ☾ Sleep button on the tile title bar, a Sleep/Wake palette command, and a discoverability tip are the only triggers — no global keybind, no auto-sleep. Landing on a sleeping tile (cycle, MRU, dock click, switcher, mobile swipe) focuses it frozen: it becomes the active/selected tile showing its dormant body and an explicit Wake, never an auto-respawn — so the right panel, inspector, and theme for an active-but-sleeping tile fall back to the frozen, no-live-content view plus a Wake call-to-action. Closing a sleeping tile routes through the same close-confirm dialog, reworded to discard sleeping terminal and driven off the still-persisted git/worktree info — it removes the record (no PTY to kill) and still offers worktree removal.

Non-negotiables for Phase 2

These are requirements, not nice-to-haves — each closes a failure mode the first cut shipped without noticing.

Wake must resume the agent — identical to a reboot. Wake re-spawns through the existing resumeAgentCommand path; the persisted base MUST carry cwd + the last agent command so that path has its inputs. The first cut classified the agent as “live overlay” and stripped it, so wake resumed nothing — this is what the wake-resumes-the-same-conversation journey test below exists to catch.
A sleeping tile drags, resizes, and renames. The stable-id record has a real write sink — never disable a canvas interaction because a tile is sleeping. Only PTY input is fenced, and by type (the overlay is absent on the sleeping arm), not a guard. The first cut disabled drag/resize because its immutable record had nowhere to persist the change.
Close the splits, then sleep the top terminal — and say so. A sleeping record is a single terminal; any sub-terminals are closed (not frozen) before the flip. Confirm it so three splits don’t vanish silently. One top-terminal record carrying that terminal’s base — no orphan or dangling-child record.
Tolerate a corrupt persisted record; never let one poison the set. A malformed record is a saved sleeping entry that no longer validates — its base truncated by a crash mid-write, hand-edited, or left by an older build. Validate shape in the schema, then drop a record that fails the cross-field invariant at the read boundary — never a fatal validator on the persisted collection, so one bad entry can’t break the load for every other terminal. (See the persisted-schema-stays-tolerant code-police rule.)
First-class and visually distinct in every presence surface — from ONE classifier. A sleeping terminal renders — moonlit, ☾, dimmed — in the canvas, dock, minimap, switcher, and mobile; arrange clusters it with its repo; the cycle traverses it. The paint distinction keys on state === "sleeping", decoupled from the staleness / “parked” vocabulary — a fresh slept tile wears its ☾ row and moonlit treatment, never reading as merely idle. The first cut routed sleeping through a parallel check that the switcher/minimap/mobile classifiers never saw — so make the one canonical bucket classifier branch on the discriminant, and verify presence on each surface (omission is a runtime fact). Mobile must render a dormant body, not attempt a live PTY attach. The dock’s activity-window hide is the one place staleness wins over the ☾ (#1593): a tile slept longer ago than the window — keyed on its sleptAt (the deliberate sleep moment), not its last agent transition — routes to parked and drops, so the window compresses yesterday’s dormant terminals too instead of letting them pile up. Keying on sleptAt is load-bearing: a plain shell carries lastActivityAt === 0 (which isStale exempts), so an agent-less dormant tile would otherwise never park; and a just-slept tile whose agent went quiet days ago still keeps its ☾. parked is checked before sleeping in the dock’s classifier. One rowRecencyAt(meta) derives this recency once and the row’s “Xs ago” cell displays it too, so the age a row shows is the exact age the window acts on — no “why is my 3h-ago row hidden?” gap. (“Show all” re-reveals them, and the “N hidden by … window” footer counts them.)
A sleeping tile can be the active tile. Click/select focuses it; input routing narrows to active, so it is never an input target — a type fact, not a runtime guard. Its content panels (right panel, inspector, theme) show the frozen view + Wake, never a live attach.
The boot path reconciles, it doesn’t assume. Sleep persists durably then kills the PTY; a crash in that window can leave a sleeping record on disk with its PTY briefly alive. On cold boot, reconcile each sleeping record against any surviving PTY (adopt-or-reap) so the cold path converges like the adopt path — and the boot/restore seed spawns active terminals only, never waking one.
An e2e that drives the real journeys, asserting outcomes not counts. The first cut’s wake scenario only ran echo and asserted a live shell — it would PASS with a blank new Claude, which is the literal hole the agent-resume bug fell through. This time: (1) wake resumes the same conversation — capture the agent before sleep, wake, assert it re-runs the resume form in the right cwd and lands the prior conversation, not a fresh shell; (2) drag a sleeping tile then reload — the moved layout persisted; (3) reboot then wake — the slept session survives a full restart and still resumes; (4) reboot mid-sleep — the record converges with no orphan PTY. Keep the malformed-record and last-terminal (don’t clear the session) scenarios. Driven like a user; a green count-only test proves nothing.

Trade-offs & when we’d revisit

A migration is mandatory. SavedTerminal is persisted, so adding state needs the state.ts migration above. There is no shipping this as a pure refactor.
One record, mutated in place; the id is stable. state flips on the same record across sleep/wake — sleep releases the live overlay, wake re-derives it, but the id, layout slot, and dock order never move. This reverses the first cut, which minted a fresh id per transition (immutable records) on the theory that “immutability keeps identity un-complected.” In practice that split one terminal’s identity across two ids and two stores, then had to re-knit them with a merge seam and a write-sink lockout — and it stripped the agent session as “live overlay,” so wake resumed nothing. A stable id makes the normal write sink and the existing restore path just work: wake is restore-one, drag is a base write, presence is one classifier. We’d revisit only if mutating in place ever proved to need boot-hydration surgery (it doesn’t — wake replays the path reboot already runs).
Wake resumes the exact conversation that was running — resolved in juspay/kolu#1495. Originally wake reused only the cwd-scoped resumeAgentCommand (claude -c &c.), so a cwd with two conversations could wake on the wrong one. The follow-on persisted the agent’s native session id (agentSession = { kind, id }, captured live from agent.sessionId) and resumes by it (claude --resume <id> &c.), falling back to cwd-most-recent only when no session was ever captured. Exactly the shared improvement to the one restore path this caveat anticipated — it benefits reboot and wake together, never a wake-only fork.
Wake lands a clean resumed session — no repainted scrollback. Identical to a reboot: the conversation is back, but the prior on-screen text is not repainted. Phase 3’s frozen-frame capture is what shows the last visual state during the swap; persisting the live scrollback buffer is a separate, addable follow-on if the blank-screen feel proves unacceptable.