Phantom running_background — Claude shows "running" forever
Why the spinner survives a Claude restart — an orphaned background-launch marker has no completion, so the waiting→running_background promotion never decays. Diagnosis + fix.
Diagnosis & fix plan · implemented in
#1109 · introduced by
#1015 · the background-task sibling of #1017
· verified against the live reload-html transcript, the on-disk task artifacts,
and a 13-agent design/adversarial-verify pass.
The report, reproduced
kolu showed the reload-html worktree with the spinning “working” pip, and it
stayed lit after the user restarted Claude. The transcript tells the story:
$ F=~/.claude/projects/-home-srid-code-kolu--worktrees-reload-html/c43efadd-….jsonl
# newest real assistant message → end_turn (this alone = "waiting", no spinner)
{"type":"assistant","stop_reason":"end_turn","ts":"2026-06-01T22:29:26.464Z"}
# but line 1177 launched a background command, and it never completed:
$ grep -o 'Command running in background with ID: [a-z0-9]*' "$F"
Command running in background with ID: bi8olsr8z # `just ai::apm >/tmp/apm2.log 2>&1`, auto-backgrounded
$ grep -c 'task-id>bi8olsr8z' "$F"
0 # ← zero completions for this id, anywhere in 3.6 MB
$ grep -oE '<status>completed</status>' "$F" | wc -l
17 # 17 completions exist — all for OTHER task ids
The decisive corroboration that the task is dead, not running: after the
20:52 launch the agent serviced two more human turns (“Merged! I’ll test it next
PRs” at 21:13, “ok” at 22:29) and ended each with stop_reason="end_turn", 96.7
minutes apart. A genuine busy-wait does not sit there servicing new prompts.
How the promotion is wired
Nothing on this path consults wall-clock time or process liveness:
tail 256 KB of JSONL → outstandingBackgroundTasks → [bi8olsr8z] (never completes) → deriveState → waiting → running_background → isWorkingState → working bucket → working pip → animate-spin (RowPips.tsx:143-144)
The promotion exists for a real feature — dynamic-workflow fan-out: when the
agent launches a background task and yields its turn (end_turn) while genuinely
busy-waiting, bare waiting would wrongly read as idle. The defect is that it has
no exit condition other than a completion marker an orphaned task can never
produce.
What was verified
| Claim | Status | Source |
|---|---|---|
deriveState promotes waiting→running_background unconditionally on any outstanding task | the bug | core.ts:383-386 |
outstandingBackgroundTasks reconciles launched − completed from JSONL markers; no staleness check | true | core.ts:441-487 |
For orphaned bi8olsr8z the completion is absent (grep -c ⇒ 0); the launch line is permanent | true | transcript line 1177; core.ts:458-462 |
Newest real message is end_turn; two genuine human turns were serviced after the launch | true | lines 1177/1212/1241; core.ts:349-352 |
| Every line carries an ISO timestamp — staleness is computable in the existing pass | true | launch 20:52:46Z vs newest 22:29:26Z |
The on-disk tasks/bi8olsr8z.output is 0 bytes, mtime precedes the launch, no sidecar | true | /tmp/claude-1000/…/tasks/ |
Provenance — the PR that introduced it, and the issue it predicted
Introduced by
#1015 (“Detect Claude Code’s running-in-background
state for dynamic workflows”, c1e8613b · 2026-05-28), the same commit that added
running_background, outstandingBackgroundTasks, and the launch-marker regexes.
- The backgrounded-Bash/Agent promotion was deliberate. #1015: “The Bash/Agent coverage was added after dog-fooding caught a session busy-waiting on backgrounded CI still reading as
waiting.” So the “narrow the trigger” fix below is a genuine product regression, not a free simplification. - #1015 predicted this exact bug class and filed #1017 — since closed by
#1115 . “An abandoned session with a stale trailing entry reads as
running(needs an mtime/liveness heuristic).” Our bug is the background-task manifestation of #1017.
Precedent + north star:
#1019 closed the sibling
#1018 with a structural transcript
marker (isInterruptMarker → waiting) rather than a timer — the shape of the
human-turn guard. #1011 (structured
agent-status side-channel via Claude Code hooks, OPEN) would moot every transcript
heuristic here; until it lands, the transcript is the only source of truth.
Four fix candidates — and why two are traps
Each design was handed to an adversarial verifier told to refute it against the real data. Two look obvious and are wrong on this exact transcript:
| Candidate | Idea | Verdict |
|---|---|---|
| Liveness gate (kolu owns the PTY) | Veto the promotion if the launching Claude process is dead, via the session file’s (pid, procStart) against /proc. | fails — 2/2 A restart reuses the same sessionId and JSONL and re-keys the session file to the new, live foreground pid. “Is the session alive?” returns true → spinner survives. Linux-only besides. |
| On-disk probe (poll the .output file) | Treat a task as dead if its tasks/<id>.output is stale / has no fd-holder. | fails — 2/2 The command redirected output away (>/tmp/apm2.log). The file is 0 bytes, mtime precedes the launch — the probe reads “dead” for live and dead alike, and would prematurely clear the pip for any redirecting background command. |
| Ordering guard (transcript-only) | Drop a task if a genuine human turn appears after its launch marker. | correct, but narrow — 6/2 Fixes this bug deterministically and is restart-robust. But as the sole fix it breaks a supported case: a genuinely-running CI run + interleaved human prompt would be wrongly de-promoted. Adopted as a fallback. |
| Staleness veto (transcript-relative) | Drop a task whose launch predates the transcript’s newest timestamp by more than STALE_BG_MS. | correct & non-breaking — 6/2 96.7 min > threshold ⇒ dropped. Anchored to the transcript’s newest line (not Date.now()), immune to clock skew. Weakness: arbitrary magic number; a fully-quiet orphan never advances newestTs. |
The two “obvious” process-liveness fixes are exactly the ones to avoid — they measure an available-but-wrong signal. After a restart, Claude is alive yet idle; only the transcript’s own record carries the discriminating fact, and it survives the restart.
Recommended fix — only promote a backed outstanding task
The entire behavioral change is one predicate at the promotion site:
// packages/integrations/claude-code/src/core.ts — deriveState (383-386)
let state = stateAndModel.state;
if (state === "waiting") {
const bg = outstanding ?? outstandingBackgroundTasks(lines);
- if (bg.length > 0) state = "running_background";
+ if (bg.some((t) => t.runId !== null)) state = "running_background"; // Workflow runs only — Bash/Agent have no journal
}
outstandingBackgroundTasks stays a faithful “launched − completed” set; only the
promotion policy narrows — the correct Lowy seam, because what shifts is “what
counts as working” (deriveState’s concern). The BackgroundTask.runId field that
already distinguishes the two does all the work; no new state, no new inputs.
Alternative — the transcript-only veto (keep Bash/Agent spinning)
If a detached Bash/Agent busy-wait must keep lighting the pip (preserving
#1015 exactly), the fallback makes the outstanding-set self-expiring instead of
narrowing the trigger — a layered, transcript-only veto inside
outstandingBackgroundTasks. Heavier (a magic threshold + a small classifier) but
behavior-preserving for the live case:
// core.ts — outstandingBackgroundTasks (441-487), sketch of the layered veto
const launched = new Map(); // taskId → { runId, index, atMs }
const completed = new Set();
let newestMs = null; // newest timestamp across ALL entries (metadata too)
let lastHumanTurn = -1; // index of the newest genuine human prompt
lines.forEach((raw, i) => {
let entry; try { entry = JSON.parse(raw); } catch { return; }
const ms = Date.parse(entry.timestamp ?? ""); // NaN-safe
if (!Number.isNaN(ms)) newestMs = Math.max(newestMs ?? ms, ms);
if (entry.type === "queue-operation") { /* …completed.add(id)… */ return; }
if (entry.type !== "user") return;
if (isGenuineHumanTurn(entry)) lastHumanTurn = i; // real prompt, not machinery
// …launched.set(taskId, { runId, index: i, atMs }) …
});
const out = [];
for (const [taskId, { runId, index, atMs }] of launched) {
if (completed.has(taskId)) continue;
if (atMs !== null && newestMs !== null && newestMs - atMs > STALE_BG_MS) continue; // staleness
if (lastHumanTurn > index) continue; // human spoke after
out.push({ taskId, runId });
}
return out;
isGenuineHumanTurn reuses isInterruptMarker + toolResultBlock: a user
entry is a real prompt only if it carries no tool_result block, isn’t an
interrupt marker, and its text doesn’t start with < (filters injected
<command-*> / <task-notification> strings). Both vetoes fail safe (absent
timestamp ⇒ not-stale; classifier defaults to “machinery” on ambiguity), so older
fixtures keep today’s behavior. This is the fallback, not the lead: it carries
the STALE_BG_MS magic number and a heuristic classifier — accidental complexity
the recommended narrowing doesn’t have.
Test plan — close both coverage gaps
The bug shipped because neither layer exercises a launch that never
completes: every promotion unit test supplies a completion marker, and the
running_background e2e scenario mocks the final state rather than deriving it.
- Unit (recommended fix):
deriveState([bashLaunch('bi8olsr8z'), endTurn])⇒waiting; flip the now-wrong “promotes a backgrounded Bash” assertion to ⇒waiting;deriveState([bgLaunch('t1','wf_1'), endTurn])⇒running_background(workflow still promotes); workflow completion still clears. - E2e: add “a backgrounded Bash launch does not spin” driving the real watcher +
deriveState; keep the existingrunning_backgroundscenario green but make its launch a Workflow so it proves the legitimate fan-out still spins.
Open risks & residuals
If we ship the recommended fix: a detached Bash/Agent busy-wait no longer
lights the pip (deliberate; recoverable via the alternative veto scoped to
runId == null). The Workflow-orphan residual is closed by liveOutstandingTasks
gating on a fresh, non-terminal journal. Not a data-level fix — the launch marker
stays in the transcript forever; only the promotion policy reads it differently.
If we ship the alternative veto: STALE_BG_MS is policy and can’t distinguish
an orphaned-and-talked-past task from a genuine long run; a quiet-idle orphan with
no further lines persists until the next write; the human-vs-machinery classifier
keys off injected user strings and must stay in sync with the format.
Shipped in
#1109 : the trigger-narrowing (deriveState promotes
only runId != null) plus the journal-liveness gate (liveOutstandingTasks) —
closing both the Bash orphan and the Workflow-orphan-after-restart. Grounded by a
13-agent design pass: 4 candidate fixes × adversarial verification over the live
transcript, the on-disk artifacts, and the #1015 / #1017 / #1018→#1019 history.