Dataset Export Guide

The dataset explorer and automation scripts share the same export pipeline: a FastAPI endpoint that streams a .eval752.zip archive. This guide summarizes how to retrieve packages, how filtering impacts the archive, and how the UI exposes the workflow.

Endpoint Summary

GET /datasets/{dataset_id}/export

Query param	Type	Description
`run_id`	string (optional)	Include run metadata, `run_config.json`, and `results.jsonl` for a completed run belonging to the dataset.
`section_ids`	string (repeatable, optional)	Limit the export to specific section IDs. Pass multiple `section_ids` pairs to include more than one section.

Response: 200 OK with Content-Type: application/zip and Content-Disposition: attachment; filename=...eval752.zip.

The archive always contains:

manifest.json — generation metadata, section list, and counts that honor filters.
meta.json — optional description/schema metadata.
sections/*.jsonl — one JSONL file per exported section.
assets/... — decoded binary assets referenced by exported items.
When run_id is provided: run_config.json, results.jsonl, and LightEval config (if available).

Section Filtering

The backend accepts any number of section_ids parameters. IDs that do not belong to the dataset trigger 400 Bad Request to prevent accidental mismatches.
Filters apply to both dataset content and run artifacts. results.jsonl only includes items from the exported sections, so downstream tools never see dangling references.
When no section_ids are specified, all sections are exported.

Example:

curl -L -o dataset-sections.eval752.zip \
  "http://localhost:8000/datasets/$DATASET_ID/export?section_ids=$SECTION_A&section_ids=$SECTION_B"

Run Selection

Passing run_id instructs the service to embed the matching run summary, run_config.json, LightEval config (without API keys), and results.jsonl. The run must belong to the dataset; otherwise the API returns 400 with a descriptive error.

curl -L -o run-packaged.eval752.zip \
  "http://localhost:8000/datasets/$DATASET_ID/export?run_id=$RUN_ID"

Combine run_id with section_ids to focus on a subset of sections while keeping the run metrics for those items only.

UI & UX Notes

The dataset list still exposes a one-click “Export” button for full packages.
The in-app Dataset Explorer now adds an Export view control that pulls the filtered sections a user is reviewing. Microcopy on the button clarifies that search keywords are for previewing only, while section filters define the export scope—reducing surprise for users who expect WYSIWYG downloads.
Run detail panels continue to provide the most direct “export with run results” entry point.

Keep these cues aligned in future design work so that the mental model remains consistent: filters = structural subsets, runs = behavioral context.

Troubleshooting

Symptom	Likely cause	Resolution
`400 Sections do not belong to this dataset`	A `section_ids` value came from another dataset or is stale.	Refresh the dataset list and copy IDs from `GET /datasets/{id}` before retrying.
`400 Run belongs to a different dataset`	The selected run references a separate dataset.	Export from the Run detail view or verify the dataset ID in the URL.
Empty `results.jsonl` even with `run_id`	The filtered sections excluded every item the run evaluated.	Remove the section filter or include the relevant sections.

#Dataset Export Guide

#Endpoint Summary

#Section Filtering

#Run Selection