Logging & Observability Baseline
Last updated: 2026-03-13 · Related task: P1-BE-006
Phase 1 established a shared JSON logging and Prometheus baseline for the FastAPI and Celery runtime. This page documents the current logging shape, context conventions, and the main metrics you can use in Docker, Prometheus, or a broader observability stack.
Logging Stack
structloghandles JSON serialization and request/task context binding throughcontextvars.- Standard-library
loggingremains the bridge for Uvicorn, Celery, and other third-party code. RequestContextMiddlewaregenerates or forwardsX-Request-IDfor every HTTP request.
Initialization lives in backend/src/eval_752/app/logging.py and runs during app startup through
configure_logging().
Log Shape
Field notes:
request_idcomes from theX-Request-IDheader or is auto-generated.run_idis explicitly bound throughbind_run_id("run-1234")orwith run_context("run-1234").- Additional business fields can be attached with
logger.info("event", key=value).
Since 2026-03-11, if a browser user explicitly enables onboarding telemetry in Settings, the
backend also logs onboarding_telemetry.received. That event contains only tutorial lifecycle
fields (started, step_viewed, completed, dismissed), route, timezone, tutorial version, and
an anonymous session id. It does not log prompts, dataset names, provider names, or free-form user
text.
Common Usage Patterns
FastAPI route handlers
RequestContextMiddleware already injects X-Request-ID.
Celery tasks
run_context binds run_id for the block and clears it on exit.
Manual binding
bind_run_id(value)persists untilbind_run_id(None)or the surrounding request/task context ends.structlog.contextvars.bind_contextvars(...)can still be used directly for extra fields such asdataset_id.
Request Tracing
- Every response includes
X-Request-IDby default. - To provide your own request id, send
X-Request-ID: <value>. - The next planned improvement is having the frontend fetch layer generate request ids and pass them deeper into the Celery pipeline.
Prometheus Metrics
PrometheusMiddleware in backend/src/eval_752/app/metrics.py tracks HTTP request counts,
latency, and in-flight concurrency. The backend exposes them on GET /metrics.
Primary metrics:
eval752_http_requests_total{method, path, status_code}eval752_http_request_duration_seconds_bucket|sum|counteval752_http_requests_in_progress{method, path}eval752_build_info{version}
Worker cancellation-polling metrics
To measure the cost of cancellation polling, Celery workers export the following metrics from
backend/src/eval_752/workers/metrics.py:
runs_cancel_poll_countruns_cancel_poll_db_queries_totalruns_cancel_poll_latency_seconds_bucket|sum|count
Use these together with the cancellation check interval controls documented in
Configuration and surfaced in Settings. They help you compare DB round trips
and latency under different polling cadences.
pytest resets counters with reset_metrics_for_testing() to avoid cross-test bleed.
Deployment Notes
- In Docker Compose, add a Prometheus scrape target for
backend:8000/metrics. - In Kubernetes, wire the backend through
ServiceMonitororPodMonitor. - For reverse-proxy or ingress deployments, make sure
/metricsand/healthzremain reachable from the scraper or control plane you rely on.
Local Verification
Watch terminal logs or any configured file target while issuing requests.
Test Coverage
backend/tests/app/test_logging.pyverifies JSON serialization and context binding.backend/tests/test_health.pyandbackend/tests/app/test_metrics.pyverifyX-Request-ID, Prometheus output, and request accounting.
If you change the log schema or required fields, update logging.py, refresh this document, and
keep the tests aligned.
