import { withBase } from "@rspress/core/runtime";
Quick Start
Get eval_752 running and complete your first evaluation in about 5 minutes.
What you'll do
- Start the platform with Docker Compose
- Add a provider and smoke test it
- Import a dataset
- Launch your first evaluation run
Prerequisites: Docker installed and running, plus one API key from the provider you want to evaluate.
Step 1 — Start the stack
After a minute or so, verify everything is healthy:
The stack includes PostgreSQL, Redis, FastAPI, Celery (worker + beat), and the React frontend.
If you see database authentication errors, your existing Docker volumes probably don't match the .env values. Reset with docker compose down -v and start again.
Step 2 — Check the dashboard
Open http://localhost:5173.
You should see:
- The dashboard loads successfully
- Service health shows the backend as online
Providers,Datasets,Runs, andSettingsall open without errors
The workspace starts empty — that's intentional. You're about to add real data.
<img alt="Dashboard showing an empty workspace with service health cards" src={withBase("/screenshots/en/dashboard-overview.webp")} />
Step 3 — Add a provider
Go to Providers and create the provider you want to evaluate.
You'll need:
- Name — something you'll recognize later (e.g., "OpenAI Production")
- Provider type — match the API family (OpenAI, Anthropic, Google, Custom)
- Base URL — the provider's API root (e.g.,
https://api.openai.com/v1) - API key — your real key (encrypted at rest)
After saving, run a Smoke Test with the exact model name you plan to evaluate.
A successful smoke test means:
- A readable response comes back
- No credential errors
- The provider card shows as healthy
If your model server runs on your host machine (e.g., LM Studio, Ollama, vLLM), use host.docker.internal instead of localhost in the Base URL.
<img alt="Provider form with fields for name, type, base URL, and API key" src={withBase("/screenshots/en/providers-form.webp")} />
Step 4 — Import a dataset
Go to Datasets and pick one:
- Hugging Face import — for public benchmark datasets (start with a small slice like
test[:30]) - Upload
.eval752.zip— if you already have a packaged dataset - Dataset Builder — to create a custom benchmark directly in the browser
For your first run, use something small so you can verify the whole pipeline quickly.
Once imported, check that:
- The dataset appears in the list
- Clicking View shows real question text
- The dataset is selectable from the Runs page
Step 5 — Launch your first evaluation
Go to Runs and click Launch run.
- Pick your provider
- Enter the exact model name
- Pick your dataset
- Click Launch run
You'll see:
- The run appears on the active runs board
- Progress updates stream in via SSE
- Each item shows the question and response as it completes
When the run finishes, you can inspect item-level results and export the run as a .eval752.zip bundle.
Step 6 (optional) — Tune runtime settings
Go to Settings if you want to adjust:
- Request timeout
- Retry count and backoff
- Run recovery policy
- LightEval executor routing
These settings are stored in the database — no need to edit .env for runtime tuning.
Troubleshooting
Database connection errors
Make sure .env exists and DATABASE_URL uses the hostname postgres (not localhost) when running through Docker Compose.
Smoke test fails
Double-check: base URL, model name, API key. For local gateways, use host.docker.internal.
Old secrets break after changing ENCRYPTION_KEY
If you changed the encryption key after already using the app, your stored provider secrets are now unreadable. Reset and start fresh:
Then re-add your provider from the Providers page.
What's next?
- User Guide — Day-to-day workflows: providers, datasets, runs, comparison, schedules
- Core Concepts — Understand evaluation types, scoring, and reproducibility
- Configuration — All environment variables and runtime settings
