import { withBase } from "@rspress/core/runtime";

Quick Start

Get eval_752 running and complete your first evaluation in about 5 minutes.

What you'll do

Start the platform with Docker Compose
Add a provider and smoke test it
Import a dataset
Launch your first evaluation run

Prerequisites: Docker installed and running, plus one API key from the provider you want to evaluate.

Step 1 — Start the stack

git clone https://github.com/t41372/eval_752.git
cd eval_752
cp .env.example .env
openssl rand -hex 32  # Paste the output into ENCRYPTION_KEY in .env
docker compose up --build -d

After a minute or so, verify everything is healthy:

docker compose ps          # All services should be "running"
curl http://localhost:8000/healthz   # Should return {"status":"ok"}

The stack includes PostgreSQL, Redis, FastAPI, Celery (worker + beat), and the React frontend.

Volume conflicts

If you see database authentication errors, your existing Docker volumes probably don't match the .env values. Reset with docker compose down -v and start again.

Step 2 — Check the dashboard

Open http://localhost:5173.

You should see:

The dashboard loads successfully
Service health shows the backend as online
Providers, Datasets, Runs, and Settings all open without errors

The workspace starts empty — that's intentional. You're about to add real data.

Step 3 — Add a provider

Go to Providers and create the provider you want to evaluate.

You'll need:

Name — something you'll recognize later (e.g., "OpenAI Production")
Provider type — match the API family (OpenAI, Anthropic, Google, Custom)
Base URL — the provider's API root (e.g., https://api.openai.com/v1)
API key — your real key (encrypted at rest)

After saving, run a Smoke Test with the exact model name you plan to evaluate.

A successful smoke test means:

A readable response comes back
No credential errors
The provider card shows as healthy

Docker + local model servers

If your model server runs on your host machine (e.g., LM Studio, Ollama, vLLM), use host.docker.internal instead of localhost in the Base URL.

Step 4 — Import a dataset

Go to Datasets and pick one:

Hugging Face import — for public benchmark datasets (start with a small slice like test[:30])
Upload .eval752.zip — if you already have a packaged dataset
Dataset Builder — to create a custom benchmark directly in the browser

For your first run, use something small so you can verify the whole pipeline quickly.

Once imported, check that:

The dataset appears in the list
Clicking View shows real question text
The dataset is selectable from the Runs page

Step 5 — Launch your first evaluation

Go to Runs and click Launch run.

Pick your provider
Enter the exact model name
Pick your dataset
Click Launch run

You'll see:

The run appears on the active runs board
Progress updates stream in via SSE
Each item shows the question and response as it completes

When the run finishes, you can inspect item-level results and export the run as a .eval752.zip bundle.

Step 6 (optional) — Tune runtime settings

Go to Settings if you want to adjust:

Request timeout
Retry count and backoff
Run recovery policy
LightEval executor routing

These settings are stored in the database — no need to edit .env for runtime tuning.

Troubleshooting

Database connection errors

Make sure .env exists and DATABASE_URL uses the hostname postgres (not localhost) when running through Docker Compose.

Smoke test fails

Double-check: base URL, model name, API key. For local gateways, use host.docker.internal.

Old secrets break after changing ENCRYPTION_KEY

If you changed the encryption key after already using the app, your stored provider secrets are now unreadable. Reset and start fresh:

docker compose down -v
docker compose up --build -d

Then re-add your provider from the Providers page.

What's next?

User Guide — Day-to-day workflows: providers, datasets, runs, comparison, schedules
Core Concepts — Understand evaluation types, scoring, and reproducibility
Configuration — All environment variables and runtime settings

#Quick Start

#What you'll do

#Step 1 — Start the stack

#Step 2 — Check the dashboard

#Step 3 — Add a provider

#Step 4 — Import a dataset

#Step 5 — Launch your first evaluation

#Step 6 (optional) — Tune runtime settings

#Troubleshooting

#Database connection errors

#Smoke test fails

#Old secrets break after changing ENCRYPTION_KEY

#What's next?