Runs + Browser Harness REST API

The /runs and /browser-harness endpoints power the active runs board, Browser Harness import flow, archive, comparison, and drill-down views.

Endpoint Summary

MethodPathDescription
POST/runsCreate a new run
GET/runsList runs for archive and analysis views
GET/runs/activeReturn active-run snapshots for active and recent runs
GET/runs/{run_id}Return run detail summary
GET/runs/{run_id}/logsReturn run logs
GET/runs/{run_id}/itemsReturn grouped item results with rich metadata
POST/runs/{run_id}/retryDispatch a retry
POST/runs/{run_id}/cancelCancel a pending or running run
POST/browser-harness/packsBuild a signed Browser Harness pack
POST/browser-harness/importsImport Browser Harness captures as runs

GET /runs/active

This endpoint returns the read model used to hydrate the active runs board before SSE begins.

Query Parameters

NameTypeDefaultNotes
limitinteger 1-2412Maximum number of active snapshots returned

Response Shape

[
  {
    "run": {
      "id": "run-123",
      "providerName": "OpenAI",
      "datasetName": "MMLU-Pro",
      "modelName": "gpt-4.1",
      "status": "running"
    },
    "progress": {
      "completed": 14,
      "total": 100,
      "correct": 11,
      "incorrect": 3,
      "awaiting": 1,
      "faulted": 0,
      "pending": 85
    },
    "lifecycle": {
      "resumedCount": 1,
      "lastResumedAt": "2026-03-06T19:43:11Z",
      "cancelRequestedAt": null
    },
    "currentItem": {
      "sequence": 15,
      "state": "running",
      "promptText": "Which option best describes...",
      "choices": ["A", "B", "C", "D"]
    },
    "recentItems": []
  }
]

Notes

  • progress.awaiting counts answered items that have not been judged yet
  • progress.faulted counts terminal items that failed or were canceled
  • lifecycle is persisted server-side and used to rebuild the active board after refresh
  • currentItem.activity may include streaming previews and chunk metadata when available

Client Integration Notes

  1. useRunsQuery drives archive and comparison selectors
  2. useActiveRunsQuery hydrates the active board before SSE begins
  3. useRunEvents merges run_status, run_progress, and run_item payloads into local state
  4. on transient SSE errors, clients should invalidate active caches but keep reconnection possible
  5. the Runs page treats /runs/active as the primary board source and fetches full run detail on demand