Logging & Observability Baseline

Last updated: 2026-03-13 · Related task: P1-BE-006

Phase 1 established a shared JSON logging and Prometheus baseline for the FastAPI and Celery runtime. This page documents the current logging shape, context conventions, and the main metrics you can use in Docker, Prometheus, or a broader observability stack.

Logging Stack

  • structlog handles JSON serialization and request/task context binding through contextvars.
  • Standard-library logging remains the bridge for Uvicorn, Celery, and other third-party code.
  • RequestContextMiddleware generates or forwards X-Request-ID for every HTTP request.

Initialization lives in backend/src/eval_752/app/logging.py and runs during app startup through configure_logging().

Log Shape

{
  "event": "healthz.check",
  "logger": "eval_752.app.main",
  "level": "info",
  "timestamp": "2025-10-19T08:12:34.567Z",
  "request_id": "7f3d3aa7a4b444cfaf3e1f8bdbf8f6ad",
  "run_id": "run-1234",
  "extra": "value"
}

Field notes:

  • request_id comes from the X-Request-ID header or is auto-generated.
  • run_id is explicitly bound through bind_run_id("run-1234") or with run_context("run-1234").
  • Additional business fields can be attached with logger.info("event", key=value).

Since 2026-03-11, if a browser user explicitly enables onboarding telemetry in Settings, the backend also logs onboarding_telemetry.received. That event contains only tutorial lifecycle fields (started, step_viewed, completed, dismissed), route, timezone, tutorial version, and an anonymous session id. It does not log prompts, dataset names, provider names, or free-form user text.

Common Usage Patterns

FastAPI route handlers

from eval_752.app.logging import get_logger

logger = get_logger(__name__)

@router.get("/healthz")
async def healthz() -> dict[str, str]:
    logger.info("healthz.check")
    return {"status": "ok"}

RequestContextMiddleware already injects X-Request-ID.

Celery tasks

from eval_752.app.logging import get_logger, run_context

logger = get_logger(__name__)

@celery_app.task
def process_run(run_id: str) -> None:
    with run_context(run_id):
        logger.info("worker.run.start", run_id=run_id)
        ...

run_context binds run_id for the block and clears it on exit.

Manual binding

  • bind_run_id(value) persists until bind_run_id(None) or the surrounding request/task context ends.
  • structlog.contextvars.bind_contextvars(...) can still be used directly for extra fields such as dataset_id.

Request Tracing

  • Every response includes X-Request-ID by default.
  • To provide your own request id, send X-Request-ID: <value>.
  • The next planned improvement is having the frontend fetch layer generate request ids and pass them deeper into the Celery pipeline.

Prometheus Metrics

PrometheusMiddleware in backend/src/eval_752/app/metrics.py tracks HTTP request counts, latency, and in-flight concurrency. The backend exposes them on GET /metrics.

Primary metrics:

  • eval752_http_requests_total{method, path, status_code}
  • eval752_http_request_duration_seconds_bucket|sum|count
  • eval752_http_requests_in_progress{method, path}
  • eval752_build_info{version}

Worker cancellation-polling metrics

To measure the cost of cancellation polling, Celery workers export the following metrics from backend/src/eval_752/workers/metrics.py:

  • runs_cancel_poll_count
  • runs_cancel_poll_db_queries_total
  • runs_cancel_poll_latency_seconds_bucket|sum|count

Use these together with the cancellation check interval controls documented in Configuration and surfaced in Settings. They help you compare DB round trips and latency under different polling cadences.

pytest resets counters with reset_metrics_for_testing() to avoid cross-test bleed.

Deployment Notes

  • In Docker Compose, add a Prometheus scrape target for backend:8000/metrics.
  • In Kubernetes, wire the backend through ServiceMonitor or PodMonitor.
  • For reverse-proxy or ingress deployments, make sure /metrics and /healthz remain reachable from the scraper or control plane you rely on.

Local Verification

cd backend
uv run uvicorn eval_752.app.main:app --reload
curl -i http://localhost:8000/healthz
curl -s http://localhost:8000/metrics | head

Watch terminal logs or any configured file target while issuing requests.

Test Coverage

  • backend/tests/app/test_logging.py verifies JSON serialization and context binding.
  • backend/tests/test_health.py and backend/tests/app/test_metrics.py verify X-Request-ID, Prometheus output, and request accounting.

If you change the log schema or required fields, update logging.py, refresh this document, and keep the tests aligned.