import { withBase } from "@rspress/core/runtime";

Quick Start

Get eval_752 running and complete your first evaluation in about 5 minutes.

What you'll do

  1. Start the platform with Docker Compose
  2. Add a provider and smoke test it
  3. Import a dataset
  4. Launch your first evaluation run

Prerequisites: Docker installed and running, plus one API key from the provider you want to evaluate.

Step 1 — Start the stack

git clone https://github.com/t41372/eval_752.git
cd eval_752
cp .env.example .env
openssl rand -hex 32  # Paste the output into ENCRYPTION_KEY in .env
docker compose up --build -d

After a minute or so, verify everything is healthy:

docker compose ps          # All services should be "running"
curl http://localhost:8000/healthz   # Should return {"status":"ok"}

The stack includes PostgreSQL, Redis, FastAPI, Celery (worker + beat), and the React frontend.

Volume conflicts

If you see database authentication errors, your existing Docker volumes probably don't match the .env values. Reset with docker compose down -v and start again.

Step 2 — Check the dashboard

Open http://localhost:5173.

You should see:

  • The dashboard loads successfully
  • Service health shows the backend as online
  • Providers, Datasets, Runs, and Settings all open without errors

The workspace starts empty — that's intentional. You're about to add real data.

<img alt="Dashboard showing an empty workspace with service health cards" src={withBase("/screenshots/en/dashboard-overview.webp")} />

Step 3 — Add a provider

Go to Providers and create the provider you want to evaluate.

You'll need:

  1. Name — something you'll recognize later (e.g., "OpenAI Production")
  2. Provider type — match the API family (OpenAI, Anthropic, Google, Custom)
  3. Base URL — the provider's API root (e.g., https://api.openai.com/v1)
  4. API key — your real key (encrypted at rest)

After saving, run a Smoke Test with the exact model name you plan to evaluate.

A successful smoke test means:

  • A readable response comes back
  • No credential errors
  • The provider card shows as healthy
Docker + local model servers

If your model server runs on your host machine (e.g., LM Studio, Ollama, vLLM), use host.docker.internal instead of localhost in the Base URL.

<img alt="Provider form with fields for name, type, base URL, and API key" src={withBase("/screenshots/en/providers-form.webp")} />

Step 4 — Import a dataset

Go to Datasets and pick one:

  • Hugging Face import — for public benchmark datasets (start with a small slice like test[:30])
  • Upload .eval752.zip — if you already have a packaged dataset
  • Dataset Builder — to create a custom benchmark directly in the browser

For your first run, use something small so you can verify the whole pipeline quickly.

Once imported, check that:

  • The dataset appears in the list
  • Clicking View shows real question text
  • The dataset is selectable from the Runs page

Step 5 — Launch your first evaluation

Go to Runs and click Launch run.

  1. Pick your provider
  2. Enter the exact model name
  3. Pick your dataset
  4. Click Launch run

You'll see:

  • The run appears on the active runs board
  • Progress updates stream in via SSE
  • Each item shows the question and response as it completes

When the run finishes, you can inspect item-level results and export the run as a .eval752.zip bundle.

Step 6 (optional) — Tune runtime settings

Go to Settings if you want to adjust:

  • Request timeout
  • Retry count and backoff
  • Run recovery policy
  • LightEval executor routing

These settings are stored in the database — no need to edit .env for runtime tuning.

Troubleshooting

Database connection errors

Make sure .env exists and DATABASE_URL uses the hostname postgres (not localhost) when running through Docker Compose.

Smoke test fails

Double-check: base URL, model name, API key. For local gateways, use host.docker.internal.

Old secrets break after changing ENCRYPTION_KEY

If you changed the encryption key after already using the app, your stored provider secrets are now unreadable. Reset and start fresh:

docker compose down -v
docker compose up --build -d

Then re-add your provider from the Providers page.

What's next?

  • User Guide — Day-to-day workflows: providers, datasets, runs, comparison, schedules
  • Core Concepts — Understand evaluation types, scoring, and reproducibility
  • Configuration — All environment variables and runtime settings