Get Started
Everything you need to go from zero to your first evaluation run.
The recommended path takes about 5 minutes with Docker:
- Start the stack
- Add a provider and verify it works
- Import a dataset
- Launch a run
In This Section
- Quick Start — The step-by-step walkthrough from
docker compose upto a completed evaluation. - Installation — Deployment options: Docker, local development, and GHCR prebuilt images.
- Troubleshooting — Common issues with Docker, provider connectivity, and first-run problems.
Use the real model name from day one
The fastest way to waste time is to smoke test one model and evaluate another. Use the exact model name you actually plan to benchmark.
When you're done here
You'll have:
- A running eval_752 instance with a healthy dashboard
- A real provider that passes smoke test
- A dataset loaded and ready
- Your first completed evaluation run
- An idea of where to go next: User Guide for day-to-day workflows, Core Concepts for the "why" behind things
