Core Concepts
Before you start running evaluations, it helps to understand a few ideas that come up repeatedly.
These pages are not required reading to use eval_752 — the User Guide is task-oriented and gets you running immediately. But when you want to understand why something works a certain way, this is where to look.
In This Section
- Evaluation Types — What kinds of tasks can eval_752 score, and when does each type make sense?
- Scoring Methods — Programmatic matching vs. LLM-as-judge vs. pairwise arena — tradeoffs and when to pick each.
- Dataset Format — How prompts, answers, and metadata are structured inside eval_752.
- Reproducibility — What
.eval752.zipbundles capture, what they don't, and how to think about evidence quality.
