Dataset Format

eval_752 stores evaluation items in JSONL (one JSON object per line). This page describes the format so you can understand what's in an import, build datasets by hand, or debug column mappings.

You don't need to memorize this

If you're building datasets through the UI (Dataset Builder or Hugging Face import), eval_752 handles the format for you. This page is a reference for when you want to inspect or manually construct items.

Item Structure

Each line in the JSONL file represents one evaluation item:

{
  "id": "q-001",
  "type": "mcq_single",
  "inputs": {
    "text": "What is the capital of France?",
    "images": [],
    "audio": [],
    "video": []
  },
  "choices": ["London", "Paris", "Berlin", "Madrid"],
  "answer": "Paris",
  "checker": null,
  "criteria": null,
  "meta": {
    "source": "geography-basics",
    "difficulty": "easy",
    "section": "Europe"
  }
}

Field Reference

Field	Required	Description
`id`	Yes	Unique identifier for this item. Must be unique within the dataset.
`type`	Yes	One of: `mcq_single`, `mcq_multi`, `freeform`, `code`, `judge_pairwise`
`inputs.text`	Yes	The prompt text shown to the model.
`inputs.images`	No	Array of relative paths to image assets (e.g., `["assets/diagram.png"]`).
`inputs.audio`	No	Array of relative paths to audio assets (planned).
`inputs.video`	No	Array of relative paths to video assets (planned).
`choices`	MCQ only	Array of answer options. For MCQ types, this is required.
`answer`	Yes	The correct answer. A string for single-choice, array for multi-choice.
`checker`	No	Custom checker specification (reserved for future use).
`criteria`	No	Rubric or criteria for LLM-as-judge scoring.
`meta`	No	Arbitrary metadata — source, difficulty, section, tags, etc.

Supported Item Types

mcq_single — One correct answer from choices. answer is a string matching one choice.
mcq_multi — Multiple correct answers. answer is an array of strings.
freeform — Open-ended text response. answer is the reference text (used by judge or regex scoring).
code — Code generation task. answer contains reference solution.
judge_pairwise — Two-response comparison for arena mode.

For items that include images, reference them with relative paths under assets/:

{
  "id": "img-001",
  "type": "freeform",
  "inputs": {
    "text": "Describe what you see in this image.",
    "images": ["assets/photo-001.jpg"]
  },
  "answer": "A cat sitting on a windowsill.",
  "meta": { "requires": "vision" }
}

When packaged in an .eval752.zip, image files live under the assets/ directory. During a run, eval_752 sends these images to the model if the provider supports vision inputs.

The `.eval752.zip` Package

An .eval752.zip bundle is a self-contained dataset (or result) package:

bundle.eval752.zip
├── manifest.json        # Package format version and type
├── meta.json            # Dataset metadata (name, description, version hash)
├── sections/
│   ├── section-1.jsonl  # Items for each section
│   └── section-2.jsonl
├── assets/              # Referenced images and files
│   └── photo-001.jpg
├── run_config.json      # (Optional) Run configuration snapshot
├── results.jsonl        # (Optional) Per-item results
└── checkers/            # (Optional) Custom checker scripts

This format is used for:

Sharing datasets between eval_752 instances
Exporting and archiving run results
Importing Browser Harness captures

For import workflows, see Working with Datasets.

#Dataset Format

#Item Structure

#Field Reference

#Supported Item Types

#Multi-Modal Items

#The .eval752.zip Package

Dataset Format

Item Structure

Field Reference

Supported Item Types

Multi-Modal Items

The `.eval752.zip` Package