Get Started

Everything you need to go from zero to your first evaluation run.

The recommended path takes about 5 minutes with Docker:

  1. Start the stack
  2. Add a provider and verify it works
  3. Import a dataset
  4. Launch a run

In This Section

  • Quick Start — The step-by-step walkthrough from docker compose up to a completed evaluation.
  • Installation — Deployment options: Docker, local development, and GHCR prebuilt images.
  • Troubleshooting — Common issues with Docker, provider connectivity, and first-run problems.
Use the real model name from day one

The fastest way to waste time is to smoke test one model and evaluate another. Use the exact model name you actually plan to benchmark.

When you're done here

You'll have:

  • A running eval_752 instance with a healthy dashboard
  • A real provider that passes smoke test
  • A dataset loaded and ready
  • Your first completed evaluation run
  • An idea of where to go next: User Guide for day-to-day workflows, Core Concepts for the "why" behind things