← the Atlas

Phantom running_background — Claude shows "running" forever

bug · budding ·

Why the spinner survives a Claude restart — an orphaned background-launch marker has no completion, so the waiting→running_background promotion never decays. Diagnosis + fix.

Diagnosis & fix plan · implemented in #1109 · introduced by #1015 · the background-task sibling of #1017 · verified against the live reload-html transcript, the on-disk task artifacts, and a 13-agent design/adversarial-verify pass.

The report, reproduced

kolu showed the reload-html worktree with the spinning “working” pip, and it stayed lit after the user restarted Claude. The transcript tells the story:

$ F=~/.claude/projects/-home-srid-code-kolu--worktrees-reload-html/c43efadd-….jsonl

# newest real assistant message → end_turn (this alone = "waiting", no spinner)
{"type":"assistant","stop_reason":"end_turn","ts":"2026-06-01T22:29:26.464Z"}

# but line 1177 launched a background command, and it never completed:
$ grep -o 'Command running in background with ID: [a-z0-9]*' "$F"
Command running in background with ID: bi8olsr8z          # `just ai::apm >/tmp/apm2.log 2>&1`, auto-backgrounded

$ grep -c 'task-id>bi8olsr8z' "$F"
0                                                          # ← zero completions for this id, anywhere in 3.6 MB
$ grep -oE '<status>completed</status>' "$F" | wc -l
17                                                         # 17 completions exist — all for OTHER task ids

The decisive corroboration that the task is dead, not running: after the 20:52 launch the agent serviced two more human turns (“Merged! I’ll test it next PRs” at 21:13, “ok” at 22:29) and ended each with stop_reason="end_turn", 96.7 minutes apart. A genuine busy-wait does not sit there servicing new prompts.

How the promotion is wired

Nothing on this path consults wall-clock time or process liveness:

tail 256 KB of JSONLoutstandingBackgroundTasks[bi8olsr8z] (never completes) → deriveStatewaiting → running_backgroundisWorkingStateworking bucket → working pip → animate-spin (RowPips.tsx:143-144)

The promotion exists for a real feature — dynamic-workflow fan-out: when the agent launches a background task and yields its turn (end_turn) while genuinely busy-waiting, bare waiting would wrongly read as idle. The defect is that it has no exit condition other than a completion marker an orphaned task can never produce.

What was verified

ClaimStatusSource
deriveState promotes waitingrunning_background unconditionally on any outstanding taskthe bugcore.ts:383-386
outstandingBackgroundTasks reconciles launched − completed from JSONL markers; no staleness checktruecore.ts:441-487
For orphaned bi8olsr8z the completion is absent (grep -c ⇒ 0); the launch line is permanenttruetranscript line 1177; core.ts:458-462
Newest real message is end_turn; two genuine human turns were serviced after the launchtruelines 1177/1212/1241; core.ts:349-352
Every line carries an ISO timestamp — staleness is computable in the existing passtruelaunch 20:52:46Z vs newest 22:29:26Z
The on-disk tasks/bi8olsr8z.output is 0 bytes, mtime precedes the launch, no sidecartrue/tmp/claude-1000/…/tasks/

Provenance — the PR that introduced it, and the issue it predicted

Introduced by #1015 (“Detect Claude Code’s running-in-background state for dynamic workflows”, c1e8613b · 2026-05-28), the same commit that added running_background, outstandingBackgroundTasks, and the launch-marker regexes.

Precedent + north star: #1019 closed the sibling #1018 with a structural transcript marker (isInterruptMarkerwaiting) rather than a timer — the shape of the human-turn guard. #1011 (structured agent-status side-channel via Claude Code hooks, OPEN) would moot every transcript heuristic here; until it lands, the transcript is the only source of truth.

Four fix candidates — and why two are traps

Each design was handed to an adversarial verifier told to refute it against the real data. Two look obvious and are wrong on this exact transcript:

CandidateIdeaVerdict
Liveness gate (kolu owns the PTY)Veto the promotion if the launching Claude process is dead, via the session file’s (pid, procStart) against /proc.fails — 2/2 A restart reuses the same sessionId and JSONL and re-keys the session file to the new, live foreground pid. “Is the session alive?” returns true → spinner survives. Linux-only besides.
On-disk probe (poll the .output file)Treat a task as dead if its tasks/<id>.output is stale / has no fd-holder.fails — 2/2 The command redirected output away (>/tmp/apm2.log). The file is 0 bytes, mtime precedes the launch — the probe reads “dead” for live and dead alike, and would prematurely clear the pip for any redirecting background command.
Ordering guard (transcript-only)Drop a task if a genuine human turn appears after its launch marker.correct, but narrow — 6/2 Fixes this bug deterministically and is restart-robust. But as the sole fix it breaks a supported case: a genuinely-running CI run + interleaved human prompt would be wrongly de-promoted. Adopted as a fallback.
Staleness veto (transcript-relative)Drop a task whose launch predates the transcript’s newest timestamp by more than STALE_BG_MS.correct & non-breaking — 6/2 96.7 min > threshold ⇒ dropped. Anchored to the transcript’s newest line (not Date.now()), immune to clock skew. Weakness: arbitrary magic number; a fully-quiet orphan never advances newestTs.

The two “obvious” process-liveness fixes are exactly the ones to avoid — they measure an available-but-wrong signal. After a restart, Claude is alive yet idle; only the transcript’s own record carries the discriminating fact, and it survives the restart.

The entire behavioral change is one predicate at the promotion site:

// packages/integrations/claude-code/src/core.ts — deriveState (383-386)
  let state = stateAndModel.state;
  if (state === "waiting") {
    const bg = outstanding ?? outstandingBackgroundTasks(lines);
-   if (bg.length > 0) state = "running_background";
+   if (bg.some((t) => t.runId !== null)) state = "running_background";  // Workflow runs only — Bash/Agent have no journal
  }

outstandingBackgroundTasks stays a faithful “launched − completed” set; only the promotion policy narrows — the correct Lowy seam, because what shifts is “what counts as working” (deriveState’s concern). The BackgroundTask.runId field that already distinguishes the two does all the work; no new state, no new inputs.

Alternative — the transcript-only veto (keep Bash/Agent spinning)

If a detached Bash/Agent busy-wait must keep lighting the pip (preserving #1015 exactly), the fallback makes the outstanding-set self-expiring instead of narrowing the trigger — a layered, transcript-only veto inside outstandingBackgroundTasks. Heavier (a magic threshold + a small classifier) but behavior-preserving for the live case:

// core.ts — outstandingBackgroundTasks (441-487), sketch of the layered veto
  const launched = new Map();      // taskId → { runId, index, atMs }
  const completed = new Set();
  let newestMs = null;             // newest timestamp across ALL entries (metadata too)
  let lastHumanTurn = -1;          // index of the newest genuine human prompt

  lines.forEach((raw, i) => {
    let entry; try { entry = JSON.parse(raw); } catch { return; }
    const ms = Date.parse(entry.timestamp ?? "");                 // NaN-safe
    if (!Number.isNaN(ms)) newestMs = Math.max(newestMs ?? ms, ms);
    if (entry.type === "queue-operation") { /* …completed.add(id)… */ return; }
    if (entry.type !== "user") return;
    if (isGenuineHumanTurn(entry)) lastHumanTurn = i;             // real prompt, not machinery
    // …launched.set(taskId, { runId, index: i, atMs }) …
  });

  const out = [];
  for (const [taskId, { runId, index, atMs }] of launched) {
    if (completed.has(taskId)) continue;
    if (atMs !== null && newestMs !== null && newestMs - atMs > STALE_BG_MS) continue;  // staleness
    if (lastHumanTurn > index) continue;                                                // human spoke after
    out.push({ taskId, runId });
  }
  return out;

isGenuineHumanTurn reuses isInterruptMarker + toolResultBlock: a user entry is a real prompt only if it carries no tool_result block, isn’t an interrupt marker, and its text doesn’t start with < (filters injected <command-*> / <task-notification> strings). Both vetoes fail safe (absent timestamp ⇒ not-stale; classifier defaults to “machinery” on ambiguity), so older fixtures keep today’s behavior. This is the fallback, not the lead: it carries the STALE_BG_MS magic number and a heuristic classifier — accidental complexity the recommended narrowing doesn’t have.

Test plan — close both coverage gaps

The bug shipped because neither layer exercises a launch that never completes: every promotion unit test supplies a completion marker, and the running_background e2e scenario mocks the final state rather than deriving it.

Open risks & residuals

If we ship the recommended fix: a detached Bash/Agent busy-wait no longer lights the pip (deliberate; recoverable via the alternative veto scoped to runId == null). The Workflow-orphan residual is closed by liveOutstandingTasks gating on a fresh, non-terminal journal. Not a data-level fix — the launch marker stays in the transcript forever; only the promotion policy reads it differently.

If we ship the alternative veto: STALE_BG_MS is policy and can’t distinguish an orphaned-and-talked-past task from a genuine long run; a quiet-idle orphan with no further lines persists until the next write; the human-vs-machinery classifier keys off injected user strings and must stay in sync with the format.


Shipped in #1109 : the trigger-narrowing (deriveState promotes only runId != null) plus the journal-liveness gate (liveOutstandingTasks) — closing both the Bash orphan and the Workflow-orphan-after-restart. Grounded by a 13-agent design pass: 4 candidate fixes × adversarial verification over the live transcript, the on-disk artifacts, and the #1015 / #1017 / #1018→#1019 history.