User Guide
This guide covers the tasks you'll do repeatedly in eval_752: setting up providers, importing datasets, running evaluations, and comparing results.
Recommended reading order
If you're new, work through these in order — each step builds on the previous one:
- Providers — Connect your API endpoints and verify they work
- Datasets — Import or build the question sets you'll evaluate against
- Running Evaluations — Launch runs and monitor progress
- Viewing Results — Understand what the Dashboard, Runs, and Comparison pages show
- Exporting Results — Save and share your evidence
Once the core workflow feels solid, explore:
- Browser Harness — Evaluate models that only exist behind a web UI
- Scheduled Evaluations — Automate recurring checks
- Advanced Features — Judge scoring, variants, and what's coming next
Tip
Don't jump into schedules or Browser Harness until you can run and compare a normal evaluation end-to-end. Debugging too many moving parts at once is frustrating.
