Browser Harness Development Notes
This page documents the Browser Harness v1 contracts that matter to backend, frontend, and QA contributors.
The frozen product decisions for this feature live in specs/6_browser_harness_v1.md.
Scope
Browser Harness v1 is intentionally narrow:
- text-only
- primary-only
- prompt-only signed packs
- browser capture imported as a normal run
The implementation goal is reproducible browser capture without leaking answers or checker logic to the third-party page.
Core Data Contracts
Provider
Provider now distinguishes between API-backed and browser-only targets:
Rules:
- only
surface="api"providers appear in provider management, smoke tests, the normal run launcher, and schedules - only Browser Harness import creates
surface="browser"providers - browser-provider upsert is keyed by
preset + origin
Run config
Run config now supports an explicit judge provider:
Worker behavior:
- prefer
config.judge.provider_id - fall back to the run provider for older runs
- write judge failure detail without destroying the imported browser run
REST Endpoints
POST /browser-harness/packs
Builds a prompt-only signed pack for the selected dataset scope.
The response includes:
- dataset identity and version hash
- filtered items with
prompt_text - section metadata
- scoring eligibility
dataset_token
The response excludes:
- answers
- checker logic
- embedded assets
- the full dataset package
POST /browser-harness/imports
Imports a Browser Harness capture from either:
.eval752.zip- JSON fallback payload
Import responsibilities:
- verify
dataset_token - verify dataset/version still matches the current workspace
- create or reuse a browser-only provider
- create a completed run with
triggered_by = browser_harness - persist run items, browser metadata, and logs
- queue
runs.score
The response reports:
run_idprovider_idprovider_name- whether the dataset was reused
- whether scoring was queued
ZIP Package Format
Runtime ZIP exports are store-only archives containing:
manifest.jsonbrowser_harness.jsonresults.jsonl
manifest.json
browser_harness.json
results.jsonl
One JSON object per captured item:
Frontend Runtime
The Browser Harness page generates:
- a raw self-contained script
- a bookmarklet wrapping that script
The runtime:
- validates
window.location.origin - shows a lightweight overlay
- clicks
new chatbefore each item - fills the composer and sends the prompt
- waits for response settle using pacing and selector rules
- records timing, model label, and errors
- downloads ZIP, with JSON fallback if ZIP creation fails
The last capture payload is also written to window.__EVAL752_LAST_BROWSER_HARNESS_RESULT__ for debugging.
Fixture Strategy
The repository ships three deterministic local fixtures:
chatgpt.htmlgemini.htmlcustom.html
Every fixture must support:
- new chat
- send
- busy / done signaling
- assistant turn rendering
- model label visibility
Use fixtures as the signoff target for:
- Playwright E2E
playwright-interactivemanual QA- selector debugging before touching real vendor pages
QA Expectations
Any Browser Harness change should cover:
- backend API tests for pack/import/judge-provider selection
- frontend tests for preflight blocking and selector validation
- Playwright E2E for ChatGPT, Gemini, Custom, and viewport fit
- manual browser QA using the fixture pages
The canonical manual checklist lives in docs/testing/browser-harness-signoff.md.
