Browser Harness QA Signoff

Date: 2026-03-12
Scope: Browser Harness v1, browser-only providers, judge-provider decoupling, fixture-based manual signoff

This inventory is the canonical manual/browser QA checklist for Browser Harness. It complements automated backend, frontend, and Playwright coverage and is the required input to the final playwright-interactive signoff.

Runtime Assumptions

The app is reachable in a browser.
Browser Harness fixtures are reachable from the same app host.
The source dataset already exists in the current workspace.
At least one API provider exists for judge scoring when the dataset requires judge evaluation.

User-Visible Claims To Sign Off

The app exposes a dedicated Browser Harness route and navigation entry.
The page blocks export when the selection is incompatible with v1 constraints.
ChatGPT, Gemini, and Custom flows can each generate a script, capture responses, and import a result.
Imported captures create normal completed runs instead of a parallel browser-only result type.
Imported runs are attributed to browser-only providers rather than API providers.
Judge-required datasets can be configured with an explicit judge provider distinct from the run provider.
Imported Browser Harness runs remain visible in Runs and usable in Comparison.
The Browser Harness page fits desktop and mobile viewports without clipping or horizontal dependency.
No console errors or page errors occur during the signoff flows.

Controls And States

Control / Behavior	Functional check	Visual check	Expected evidence
Dataset selector + section scope	Choose dataset, toggle sections, set `Max items`, confirm selection summary changes	Selection summary and blocked-state copy stay readable	Desktop Browser Harness screenshot
Preflight blocking	Trigger a blocked condition and confirm export is stopped with a specific issue list	Blocked issue card is legible and not clipped	Screenshot + note
Preset selector	Switch between `ChatGPT` and `Gemini`; confirm selector recipe changes without breaking pacing controls	Preset labels, origin, and pacing fields remain readable	Desktop screenshot
Custom selectors	Fill all required selectors and confirm generate is enabled only after required fields exist	Dense selector form stays readable on desktop and mobile	Desktop + mobile screenshots
Generate script	Produce both raw script and bookmarklet	Generated fields are visible and copy affordances are reachable	Screenshot
Fixture links	Open each local fixture page from the Browser Harness page	Fixture link group is visible and understandable	Screenshot
ChatGPT fixture flow	Run exported script, wait for ZIP download, import capture, confirm run lands in `Runs`	No clipped status, provider, or import feedback text	Screenshot + artifact
Gemini fixture flow	Same as above	Same as above	Screenshot + artifact
Custom fixture flow	Same as above, using manual selectors	Same as above	Screenshot + artifact
Runs attribution	Inspect imported run and confirm `Browser Harness` trigger plus browser-only provider attribution	Trigger/provider text is readable and not raw internal enum noise	Runs screenshot
Judge configuration display	Inspect run detail and confirm judge provider/model/source are visible when configured	Judge block is visible and prompt text wraps cleanly	Run detail screenshot
Comparison availability	Open `Comparison` after an imported run settles and confirm it is selectable and readable	Comparison layout remains stable	Comparison screenshot

Exploratory Checks

Try a mismatched origin and confirm the runtime refuses to run on the wrong host.
Use the JSON fallback import path and confirm the app still creates a normal run.

Negative Confirmations

The final signoff note must explicitly confirm:

no viewport clipping
no horizontal overflow
no wrong provider attribution
no silently unscored judge-required run
no broken deep link from Browser Harness import into Runs
no console errors
no page errors

Evidence Contract

Store the evidence bundle under .artifacts/manual-qa/browser-harness/:

browser-harness-signoff.json
desktop screenshots
mobile screenshots
run detail screenshot
comparison screenshot
any console/page error capture if non-empty

If a claim is not covered, record the reason explicitly. “Not tested” is not acceptable for the three fixture flows, run attribution, judge display, and viewport fit.

#Browser Harness QA Signoff

#Runtime Assumptions

#User-Visible Claims To Sign Off

#Controls And States

#Exploratory Checks

#Negative Confirmations