User Guide

This guide covers the tasks you'll do repeatedly in eval_752: setting up providers, importing datasets, running evaluations, and comparing results.

Recommended reading order

If you're new, work through these in order — each step builds on the previous one:

Providers — Connect your API endpoints and verify they work
Datasets — Import or build the question sets you'll evaluate against
Running Evaluations — Launch runs and monitor progress
Viewing Results — Understand what the Dashboard, Runs, and Comparison pages show
Exporting Results — Save and share your evidence

Once the core workflow feels solid, explore:

Browser Harness — Evaluate models that only exist behind a web UI
Scheduled Evaluations — Automate recurring checks
Advanced Features — Judge scoring, variants, and what's coming next

Tip

Don't jump into schedules or Browser Harness until you can run and compare a normal evaluation end-to-end. Debugging too many moving parts at once is frustrating.

#User Guide

#Recommended reading order

User Guide

Recommended reading order