Working with Datasets

eval_752 gives you three ways to get a dataset into the system:

Pick whichever fits your situation:

Situation	Best path
"I want to create a custom benchmark"	Dataset Builder
"The dataset already exists on Hugging Face"	Hugging Face import
"Someone shared an .eval752.zip with me"	Upload
"I just want to validate the pipeline quickly"	Import a small HF slice (`test[:30]`)

Building a dataset in the browser

Use the Dataset Builder when your benchmark doesn't exist anywhere yet — it lives in your head, a document, or scattered notes.

Open Datasets → Dataset Builder → Open Builder
Step 1: Name, intent, and description
Step 2: Create sections (logical groupings for your items)
Step 3: Add items — type in prompts and answers, duplicate existing items, drag items between sections, attach local images
Step 4: Review counts and composition
Click Publish dataset

Publishing creates a formal dataset you can use in Runs. The draft stays available for future edits.

Attached images appear as real thumbnails while you edit. During runs, image assets are sent to the model if the provider supports vision.

Use this when the dataset already exists on HF.

Open Datasets → Start import wizard
Enter the dataset path, split, and preview limit
Optionally set a display name
Click Preview dataset
Map the detected columns:
- Prompt column (required)
- Answer column (required)
- Choices, type, metadata columns (optional)
Review and import

The wizard stores a normalized copy in your local database. After import, the dataset appears immediately in Runs.

Start small

For a first import, use a small slice like test[:30]. Validating column mappings on 30 rows is much faster than on 30,000.

For pre-packaged datasets:

The package gets unpacked into dataset records. Embedded assets (images) are restored so they continue to work in runs.

Each dataset card in the main list supports:

View — opens the explorer dialog with search, section filters, and pagination
Export — downloads the full dataset as .eval752.zip
Section chips — shows how items are grouped

Inside the explorer you can:

This is the best way to verify a dataset before running an evaluation.

If you have datasets in these formats, convert them to .eval752.zip first using the project's packaging tools.

Use meaningful names. Dataset names show up everywhere — in Runs, Comparison, and exports. "MMLU Math Subset" is better than "Dataset 1."
Use section filters to create smaller review bundles before sharing.
If a Hugging Face dataset needs extra auth or configuration not exposed by the wizard, use a simpler public dataset for now.