← the Atlas

Flaky Test Tracker

Reference·seedling·

A backlog of flaky tests — e2e (Cucumber + Playwright) and unit (vitest). Drop a row when you hit one; an agent clears the queue from time to time.

A fix-queue for tests that go red on one CI run and green on the next with no code change. See a flake → add a row. An agent works the backlog over time.

Flake vs. break

A flake fails nondeterministically — timing, ordering, or environment — and would pass on a same-SHA rerun; a break fails the same way every run and is a real defect, so file a bug instead. A one-off rerun via the odu MCP is fine to triage which it is — but a rerun is never how a flake gets fixed (see the routine below).

Backlog

Status: openfixingfixed (then strike the row).

Queue clear — the eight below were fixed in #1440 (struck = done; the reusable patterns live in Common flake classes).

Test Lane Symptom Repro’d in Status Fix
Code tab history survives switching between terminals in different repos (code-tab.feature:714) e2e@aarch64-darwin back button never enabled — waitFor 20s timeout #1440 fixed #1440
Clicking a folder ref reveals and expands the directory (file-ref-link.feature:69) e2e@aarch64-darwin app/core never expands — a fresh-open (cold panel) reveal races fsListAll’s first snapshot via a one-shot resolve with no re-yield, so a commit-marker barrier can’t fix it. Mount the tree first; the fresh-open resolve is covered by lineRef.test.ts #1440 fixed #1440
Selected file survives switching to another terminal and back [branch] (code-tab.feature:148) e2e@aarch64-darwin tree never hydrated — branch-mode gitStatus stuck on BASE_BRANCH_NOT_FOUND #1440 fixed #1440
Regaining window focus repaints a render-stalled terminal (render_recovery.feature:16) e2e@aarch64-darwin AssertionError — screen not repainted on focus regain #1440 fixed #1440
Close sub-terminal via tab close button (sub-terminal.feature:107) e2e@x86_64-linux sub-terminal should have keyboard focuswaitForFunction timeout (focus not restored after close) #1440 fixed #1440
Scroll on terminal does not pan the canvas (canvas.feature:161) e2e@aarch64-darwin tile-centering pan raced the recorded baseline → canvas transform changed unexpectedly #1440 fixed #1440
Tile chrome shows task progress when Claude has tasks (claude-code.feature:77) e2e@aarch64-darwin appended-transcript fs event dropped → task progress 3/5 never showed #1440 fixed #1440
pulam daemon — dials a kaval, serves awareness (daemon.test.ts) unit@x86_64-linux waitFor didn’t catch a transient oRPC stream error from the live awareness collection → test threw (vitest, no retry) #1440 fixed #1440
createPtyHost — routes write() to the child and lists live PTYs (kaval/src/ptyHost.test.ts:373) unit@aarch64-darwin Test timed out in 5000ms — a real-PTY spawn-then-write test stalled past the 5s default on the darwin box; the linux lane passed the same SHA, and the PR touches no packages/kaval file (single-node rerun green) #1497 open
Sub-terminal keeps keyboard focus after close (sub-terminal.feature, sub_terminal_steps.ts:132) e2e@aarch64-darwin the sub-terminal should have keyboard focuswaitForFunction timeout (focus not restored after close); the same flake fixed for the linux lane in #1440 recurring on the loaded darwin box (472/473 scenarios passed), unrelated to the PR’s surface/pulam changes #1497 open
Clicking a folder ref while already browsing expands it in the live tree (file-ref-link.feature:112) e2e@aarch64-darwin lib/ui never reaches aria-expanded=truelocator.waitFor 60s timeout across all 3 retries on both runs. A live change into an already-mounted Pierre tree updates the model but never repaints — the unfixed #1534 swallow-emit class (sibling of the fresh-open :69 case fixed in #1440, whose “mount first” fix doesn’t cover the mounted-tree live update). The linux lane passed the same SHA (482/483), and the PR touches no Code-tab/Pierre/file-ref code (all changes are pulam-web + surface), so unrelated to it. Carried by R-pulamweb-4’s vendored @pierre/trees patch. #1568 open

Logging a flake

When a lane goes red and a single-node rerun comes back green, add a row: test name, recipe@platform lane, the assertion/timeout, the PR it reproduced in (<PrLink pr={…} />), open. No investigation needed to log it.

Keep the tracker in lock-step with your PR. Log a flake your CI surfaced in the same PR that hit it — don’t defer to a later cleanup; and a PR that fixes a flake flips its row to fixed (and strikes it) in that same PR, regenerating docs/atlas/dist/. The queue only stays trustworthy if every PR that touches a flake updates this note alongside its own diff.

Fixing routine

An agent clears the backlog by driving CI to N consecutive green runs (N = 5 by default, or as given) through the odu MCP (run the test lanes → wait_for_settle, repeated). The green streak verifies that a fix is real — it is never a way to wash a flake out.

Non-negotiable rules:

The loop: fail → streak resets to 0, root-cause and fix the test (fixingfixed, link the PR with <PrLink pr={…} />); while there, drop any open row that no longer reproduces. Pass → streak +1. Done = N green runs back-to-back with the backlog cleared.

Common flake classes

The shapes this suite keeps throwing, and the fix that held — reach for these before inventing one (full detail in each linked PR / the fix’s code comment).