Run Events SSE API

The /runs/events endpoint streams real-time updates about evaluation runs using Server-Sent Events (SSE). Clients subscribe once and receive run status transitions, progress counters, item-level updates, and log records.

Endpoint Summary

Method	Path	Description
`GET`	`/runs/events`	Open an SSE connection for run updates.

Authentication & CORS

The stream respects the same authentication (cookies/session headers) as other API calls. Clients should set withCredentials: true when constructing the EventSource so cookies are forwarded.
CORS is enabled by default; the backend returns the appropriate Access-Control-Allow-Origin header so long as the request originates from a trusted frontend.

Query Parameters

Name	Type	Default	Notes
`limit`	number	`null`	Optional maximum number of events to deliver before closing the stream. Useful for tests.

Stream Lifecycle

The server immediately sends a comment : ready to signal the connection is live.
If no events occur for 15 seconds, a keep-alive comment : ping is emitted; clients can ignore these.
The connection stays open until the client closes it, the limit is reached, or Redis/back-end errors occur.

Event Types

Each data message is a JSON object with a type property. Four event types are currently supported.

`run_status`

Emitted whenever a run transitions state (for example pending → running → completed, a retry window is scheduled, a cancellation is requested, or a worker resume is recorded).

{
  "type": "run_status",
  "runId": "run-123",
  "status": "running",
  "startedAt": "2025-10-21T08:01:00Z",
  "finishedAt": null,
  "retryCount": 1,
  "retryAfter": "2025-10-21T08:07:30Z",
  "retryRequestedAt": "2025-10-21T08:03:00Z",
  "retryReason": "Worker shutdown requested; retrying run after interruption.",
  "resumedCount": 1,
  "lastResumedAt": "2025-10-21T08:05:11Z",
  "cancelRequestedAt": null
}

status matches the REST schema (pending, running, completed, failed, canceled).
startedAt/finishedAt are ISO-8601 timestamps in UTC and may be null if not yet known.
retryCount reports how many automatic retries have been attempted (0 when never retried).
retryAfter is the next scheduled retry timestamp (UTC). It is null once the run is dispatched or finishes.
retryRequestedAt and retryReason capture the moment and reason the scheduler enqueued the latest retry.
resumedCount and lastResumedAt are present when the worker resumed an existing run after an interruption.
cancelRequestedAt is present when a user asked to cancel an in-flight run. During that phase the UI should keep the current item visible until the provider request settles.

`run_progress`

Reports coarse-grained execution counts so the UI can show progress indicators.

{
  "type": "run_progress",
  "runId": "run-123",
  "completed": 12,
  "total": 32
}

completed represents the number of dataset items processed so far.
total is the expected item count for the run (inclusive of variations).

`run_item`

Emitted when an individual dataset item starts, emits activity snapshots, completes, or fails. This is the event used by the active runs board to keep the current card and recent-history stack in sync.

{
  "type": "run_item",
  "runId": "run-123",
  "sequence": 15,
  "total": 100,
  "phase": "activity",
  "item": {
    "itemId": "item-15",
    "datasetItemId": "item-15",
    "sequence": 15,
    "state": "running",
    "promptText": "Which option best describes...",
    "promptPayload": null,
    "answerPayload": null,
    "choices": ["A", "B", "C", "D"],
    "assets": null,
    "section": {
      "id": "section-1",
      "name": "Core Knowledge",
      "order": 1
    }
  },
  "response": null,
  "score": null,
  "latencyMs": null,
  "activity": {
    "requestStartedAt": "2026-03-05T16:20:12Z",
    "firstChunkAt": "2026-03-05T16:20:15Z",
    "lastChunkAt": "2026-03-05T16:20:31Z",
    "requestFinishedAt": null,
    "transport": "streaming",
    "streamingRequested": true,
    "streamingFallback": false,
    "fallbackReason": null,
    "chunkCount": 17,
    "textChunkCount": 12,
    "reasoningChunkCount": 5,
    "responsePreview": "B",
    "responsePreviewTruncated": false,
    "reasoningPreview": "Comparing the answer choices...",
    "reasoningPreviewTruncated": false
  },
  "at": "2026-03-05T16:20:31Z"
}

phase is one of started, activity, completed, or failed.
sequence is the 1-based item order within the run.
item contains the stage snapshot for the current question card.
response is the current textual response snapshot and may be null until the request finishes.
score and latencyMs may be null while the item is still running.
activity is optional and carries the persisted per-request observability snapshot:
- requestStartedAt, firstChunkAt, lastChunkAt, requestFinishedAt
- transport (streaming or buffered)
- streamingRequested, streamingFallback, fallbackReason
- chunkCount, textChunkCount, reasoningChunkCount
- responsePreview, responsePreviewTruncated
- reasoningPreview, reasoningPreviewTruncated
Activity snapshots may be repeated as the request progresses and may also be echoed on the terminal completed/failed snapshot so the client can settle the final card without an extra REST fetch.

`run_log`

Streams log lines generated by the pipeline (queue events, errors, scoring summaries, etc.).

{
  "type": "run_log",
  "runId": "run-123",
  "id": "log-b5bcd9",
  "level": "info",
  "message": "Run completed successfully.",
  "data": { "durationMs": 320000 },
  "createdAt": "2025-10-21T08:06:31Z"
}

level is one of debug, info, warn, or error.
data contains structured metadata (JSON) when available and may be null.

Client Guidelines

Instantiate once: Create the EventSource when the app shell mounts and share updates via state management (TanStack Query in our SPA).
Handle recoverable SSE failures: Redis/back-end interruptions may close the stream. Retry with backoff when source.onerror fires, but keep the last hydrated /runs/active snapshot so the UI remains coherent during reconnect.
Cache updates: Use status and progress events to update run lists, and use run_item for card-level UI updates; the SPA invalidates stale queries for affected runs and items when necessary.
Preserve in-flight cancellation semantics: If cancelRequestedAt is present but the current item still has an unfinished request, keep that card visible and show a cancellation-requested state instead of collapsing immediately to terminal history.
Cleanup: Always call source.close() when unmounting or when navigating away to avoid leaking sockets.

Migration Notes

2025-11-05: run_status gained retry metadata (retryCount, retryAfter, retryRequestedAt, retryReason). Clients should ignore unknown keys if unused.
2026-03-12: run_status may also include resumedCount, lastResumedAt, and cancelRequestedAt, and run_item.phase now includes activity. Clients that assumed only terminal item phases should update their parsers and reducers accordingly.
When upgrading, confirm any SSE parsers that assume a fixed key order or shape (for example zod schemas) are updated to mark the newer keys optional.
The SPA stores the latest retry/resume/cancel snapshot alongside the run list and active board. Custom consumers can follow the same pattern: hydrate from /runs/active, merge run_status and run_item events, and fall back to a fresh read-model fetch after reconnect.

Example (React)

useEffect(() => {
  const source = new EventSource("/runs/events", { withCredentials: true })

  source.onmessage = (event) => {
    if (!event.data) return
    const payload = JSON.parse(event.data)
    switch (payload.type) {
      case "run_status":
        // update cached run data
        break
      case "run_progress":
        // update progress indicator
        break
      case "run_item":
        // update active item card / history
        break
      case "run_log":
        // append to log panel
        break
    }
  }

  source.onerror = (err) => {
    console.error("SSE error", err)
    source.close()
    // optional: schedule reconnection
  }

  return () => source.close()
}, [])

Troubleshooting

CORS errors: Ensure the frontend origin is allowed and that credentials are enabled in both the client (withCredentials) and the server middleware.
Flooded logs: Filter run_log events by level before displaying in the UI.
Missing updates: Confirm Redis is reachable by the API process; broadcasts fall back silently if Redis is unavailable (logged server-side).

#Run Events SSE API

#Endpoint Summary

#Authentication & CORS

#Query Parameters

#Stream Lifecycle

#Event Types

#run_status

#run_progress

#run_item

#run_log

#Client Guidelines

#Migration Notes

#Example (React)

#Troubleshooting