eval_752 v2 Architecture

Status: current architecture snapshot · 2026-03-13

Overview

  • Frontend: React SPA with Vite, TanStack Query, Tailwind, and Radix/shadcn primitives
  • Backend: FastAPI API plus Celery workers on Python 3.12, managed with uv
  • Model invocation: LiteLLM as the main provider abstraction
  • Datasets and scoring: Hugging Face imports, dataset builder flows, direct run execution, and LightEval interoperability
  • Storage: PostgreSQL via SQLAlchemy/Alembic plus Redis for queues and short-lived state
  • Packaging: .eval752.zip as the portable dataset and result bundle format

Component Topology

Development and Deployment Shape

  • local development commonly uses docker compose up --build
  • the stack includes API, worker, beat, frontend, PostgreSQL, and Redis
  • GHCR-backed deployment can use prebuilt container images instead of local builds
  • runtime behavior and operator guidance are kept synchronized through the docs, specs/, and acceptance inventories

Data Layer

  • SQLAlchemy models live in backend/src/eval_752/infra
  • Alembic revisions live in backend/alembic/
  • FastAPI dependencies and worker services share session helpers such as create_session_factory and session_scope

Observability and Streaming

  • HTTP metrics are exposed at /metrics
  • run lifecycle updates fan out through Redis and the FastAPI SSE endpoint at /runs/events
  • the SPA subscribes with EventSource; payload details are documented in SSE Events

Current Truthfulness Constraints

  • the docs should describe only shipped alpha behavior, not aspirational product surfaces
  • provider-first setup, dataset import, run execution, and comparison remain the main operator path
  • Browser Harness is shipped for browser-captured evaluation, but still has explicit v1 scope boundaries