Deployment Guide

Last updated: 2026-03-14 · Owner: DevOps

This guide consolidates the working deployment paths for the Python/FastAPI + React stack and supersedes scattered notes that previously lived under docs/docker and docs/ops. All new deployment changes should update this document first, then backfill references elsewhere as needed.

Compose deep dive

/docker/compose-v2/ serves as the low-level appendix for container topology details. Keep this page focused on the recommended deployment paths and push implementation minutiae into the appendix.

1. Topology Matrix

TargetRecommended StackCompose/K8s entrypointNotes
Local dev / QADocker Composedocker compose up --build (optionally ./scripts/tests/run_docker_integration.sh)Uses local builds and .env.example defaults; ideal for single contributor workflows.
Small teams (<15 users)Single host Docker Composedocker compose -f docker-compose.ghcr.yml --env-file .env.prod pull && docker compose -f docker-compose.ghcr.yml --env-file .env.prod up -dPulls prebuilt GHCR images for backend/celery/frontend; combine with Traefik/Caddy reverse proxy.
StagingDocker Compose + Playwright smokeSame as small team, plus pnpm test:e2e-smoke gated in CIMirrors production secrets, enables release candidate sign-off.
ProductionKubernetes (Kustomize or Helm)kubectl apply -k deploy/k8s (manifests WIP)Spreads API + worker pods across nodes, uses managed Postgres/Redis.

2. Pre-flight Checklist

  1. Secrets & config — Copy .env.example.env, generate a 32-byte ENCRYPTION_KEY, and set unique Postgres credentials. For Compose deploys, commit a .env.prod.example for reproducibility but store actual secrets in your vault.
  2. Artifacts — Prefer the GHCR images published by .github/workflows/docker-publish.yml; fall back to docker compose -f docker-compose.build.yml build only when validating unpublished local changes.
  3. Database — Run uv run alembic upgrade head (Compose entrypoint already does this) and confirm migrations succeed against the target database.
  4. Health endpoints — Ensure /healthz and /metrics are reachable from your orchestrator. Prometheus scrape samples live in Monitoring.

3. Deployment Recipes

3.1 Local & QA (Docker Compose)

cp .env.example .env
docker compose up --build
  • Spins up FastAPI (backend), Celery worker/beat, PostgreSQL, Redis, and the static frontend container.
  • For backend-only validation: docker compose up backend celery-worker celery-beat postgres redis -d.
  • Integration smoke (same topology CI uses): scripts/tests/run_docker_integration.sh --full-run --fresh.
  • Additional overrides (hot reload, bind mounts) live in docker-compose.override.example.yml.
  • After bootstrap, open the UI, add a real provider from Providers, import a dataset, and tune workspace runtime policy from Settings if needed.

3.2 Production Compose (Single Host)

  1. Create .env.prod with hardened secrets (openssl rand -hex 32 for ENCRYPTION_KEY, random DB passwords).
  2. Pull the published images and start the stack with docker compose -f docker-compose.ghcr.yml --env-file .env.prod pull && docker compose -f docker-compose.ghcr.yml --env-file .env.prod up -d.
  3. Frontend container reverse-proxies /apibackend:8000; if you terminate TLS elsewhere, set BACKEND_ORIGIN so the SPA points at the correct hostname.
  4. Nightly backups: docker exec postgres pg_dump -Fc -f /backups/$(date +%F).dump eval752.

3.3 Kubernetes (Multi-node) — WIP

While the repo does not yet ship manifests, the reference topology is:

  • Deploy images via your preferred tool (Helm/Kustomize). Each workload runs as a separate Deployment (api, celery-worker, celery-beat).
  • Use managed PostgreSQL (RDS, Cloud SQL, Flexible Server) and managed Redis/KeyDB to offload persistence.
  • Provide secrets through SealedSecret/ExternalSecret referencing your vault; expose /api via Ingress with TLS.

Task P3-OPS-004 tracks authoring and publishing the official manifests; until it lands, copy the Compose environment variables into your cluster-specific tooling.

3.4 Cloud Building Blocks

CloudContainersDatabaseCacheNetworking
AWSECS/Fargate or EKSAurora PostgreSQLElastiCache (Redis)ALB + ACM TLS
AzureContainer Apps/AKSAzure Database for PostgreSQLAzure Cache for RedisApplication Gateway
GCPCloud Run/GKECloud SQL for PostgreSQLMemorystoreCloud Load Balancing + Certificate Manager

4. Validation Checklist

  1. Run pnpm test:e2e-smoke (or pnpm test:e2e for the full suite) against the deployed frontend.
  2. Hit /metrics and confirm Prometheus target passes with <10s scrape duration.
  3. Execute a sample dataset run through the UI and check that Celery updates arrive over SSE within 5 seconds.
  4. Verify backups succeed: simulate docker compose exec postgres pg_isready failure and ensure alerting triggers.

5. Troubleshooting Cheatsheet

SymptomLikely CauseFix
password authentication failed for user "eval752"POSTGRES_PASSWORD drifted from existing volumeEither reuse the previous password or docker compose down -v to recreate the volume.
Frontend 404s on /api/*VITE_API_BASE_URL mismatch or proxy misconfiguredRebuild frontend with correct env or configure reverse proxy to forward /api.
Celery never startsredis service unhealthy or wrong credentialsCheck redis:// values in .env and re-create the container with proper password.
GitHub Pages docs load unstyled and every JS/CSS/image request 404sThe Rspress base / siteUrl path casing does not match the exact repository name (/Eval_752/ vs /eval_752/)Keep docs/rspress.config.ts aligned with the real repository name, ideally by deriving Pages URLs from GITHUB_REPOSITORY, then rebuild and redeploy the docs site.
Docs deploy pipeline misses config or asset changesCache key not including Rspress config, docs sources, or docs scriptsSee .github/workflows/deploy-docs.yml and confirm the cache key hashes docs/**, scripts/docs/**, and the docs package.json / lockfile.
  • ../docker/compose-v2.md — deep dive into Compose layout and entrypoints.
  • Configuration — exhaustive environment variable reference (kept in sync with this page).
  • Security — hardening checklist for secrets, TLS, and incident response.
  • specs/3_tasks.md#PX-GOV-012 — work item tracking the documentation consolidation.