Browser Harness QA Signoff

Date: 2026-03-12
Scope: Browser Harness v1, browser-only providers, judge-provider decoupling, fixture-based manual signoff

This inventory is the canonical manual/browser QA checklist for Browser Harness. It complements automated backend, frontend, and Playwright coverage and is the required input to the final playwright-interactive signoff.

Runtime Assumptions

  • The app is reachable in a browser.
  • Browser Harness fixtures are reachable from the same app host.
  • The source dataset already exists in the current workspace.
  • At least one API provider exists for judge scoring when the dataset requires judge evaluation.

User-Visible Claims To Sign Off

  • The app exposes a dedicated Browser Harness route and navigation entry.
  • The page blocks export when the selection is incompatible with v1 constraints.
  • ChatGPT, Gemini, and Custom flows can each generate a script, capture responses, and import a result.
  • Imported captures create normal completed runs instead of a parallel browser-only result type.
  • Imported runs are attributed to browser-only providers rather than API providers.
  • Judge-required datasets can be configured with an explicit judge provider distinct from the run provider.
  • Imported Browser Harness runs remain visible in Runs and usable in Comparison.
  • The Browser Harness page fits desktop and mobile viewports without clipping or horizontal dependency.
  • No console errors or page errors occur during the signoff flows.

Controls And States

Control / BehaviorFunctional checkVisual checkExpected evidence
Dataset selector + section scopeChoose dataset, toggle sections, set Max items, confirm selection summary changesSelection summary and blocked-state copy stay readableDesktop Browser Harness screenshot
Preflight blockingTrigger a blocked condition and confirm export is stopped with a specific issue listBlocked issue card is legible and not clippedScreenshot + note
Preset selectorSwitch between ChatGPT and Gemini; confirm selector recipe changes without breaking pacing controlsPreset labels, origin, and pacing fields remain readableDesktop screenshot
Custom selectorsFill all required selectors and confirm generate is enabled only after required fields existDense selector form stays readable on desktop and mobileDesktop + mobile screenshots
Generate scriptProduce both raw script and bookmarkletGenerated fields are visible and copy affordances are reachableScreenshot
Fixture linksOpen each local fixture page from the Browser Harness pageFixture link group is visible and understandableScreenshot
ChatGPT fixture flowRun exported script, wait for ZIP download, import capture, confirm run lands in RunsNo clipped status, provider, or import feedback textScreenshot + artifact
Gemini fixture flowSame as aboveSame as aboveScreenshot + artifact
Custom fixture flowSame as above, using manual selectorsSame as aboveScreenshot + artifact
Runs attributionInspect imported run and confirm Browser Harness trigger plus browser-only provider attributionTrigger/provider text is readable and not raw internal enum noiseRuns screenshot
Judge configuration displayInspect run detail and confirm judge provider/model/source are visible when configuredJudge block is visible and prompt text wraps cleanlyRun detail screenshot
Comparison availabilityOpen Comparison after an imported run settles and confirm it is selectable and readableComparison layout remains stableComparison screenshot

Exploratory Checks

  • Try a mismatched origin and confirm the runtime refuses to run on the wrong host.
  • Use the JSON fallback import path and confirm the app still creates a normal run.

Negative Confirmations

The final signoff note must explicitly confirm:

  • no viewport clipping
  • no horizontal overflow
  • no wrong provider attribution
  • no silently unscored judge-required run
  • no broken deep link from Browser Harness import into Runs
  • no console errors
  • no page errors

Evidence Contract

Store the evidence bundle under .artifacts/manual-qa/browser-harness/:

  • browser-harness-signoff.json
  • desktop screenshots
  • mobile screenshots
  • run detail screenshot
  • comparison screenshot
  • any console/page error capture if non-empty

If a claim is not covered, record the reason explicitly. “Not tested” is not acceptable for the three fixture flows, run attribution, judge display, and viewport fit.