eval_752 v2 Architecture
Status: current architecture snapshot · 2026-03-13
Overview
- Frontend: React SPA with Vite, TanStack Query, Tailwind, and Radix/shadcn primitives
- Backend: FastAPI API plus Celery workers on Python 3.12, managed with
uv - Model invocation: LiteLLM as the main provider abstraction
- Datasets and scoring: Hugging Face imports, dataset builder flows, direct run execution, and LightEval interoperability
- Storage: PostgreSQL via SQLAlchemy/Alembic plus Redis for queues and short-lived state
- Packaging:
.eval752.zipas the portable dataset and result bundle format
Component Topology
Development and Deployment Shape
- local development commonly uses
docker compose up --build - the stack includes API, worker, beat, frontend, PostgreSQL, and Redis
- GHCR-backed deployment can use prebuilt container images instead of local builds
- runtime behavior and operator guidance are kept synchronized through the docs,
specs/, and acceptance inventories
Data Layer
- SQLAlchemy models live in
backend/src/eval_752/infra - Alembic revisions live in
backend/alembic/ - FastAPI dependencies and worker services share session helpers such as
create_session_factoryandsession_scope
Observability and Streaming
- HTTP metrics are exposed at
/metrics - run lifecycle updates fan out through Redis and the FastAPI SSE endpoint at
/runs/events - the SPA subscribes with
EventSource; payload details are documented in SSE Events
Current Truthfulness Constraints
- the docs should describe only shipped alpha behavior, not aspirational product surfaces
- provider-first setup, dataset import, run execution, and comparison remain the main operator path
- Browser Harness is shipped for browser-captured evaluation, but still has explicit v1 scope boundaries
