Documentation Smoke Checklist

Use this checklist whenever the product changes a user-visible workflow, page IA, or capability boundary.

Core Journeys

  1. Run the current Quick Start end to end against the default Docker bootstrap path.
  2. Confirm the Runs page description matches the shipped active runs board, not the pre-redesign table/detail-panel model.
  3. Confirm the documented export path matches the shipped UI and only promises .eval752.zip if no other format is actually exposed.
  4. Confirm Comparison copy only promises the metrics and panels that are currently rendered.

Truthfulness

  1. Search docs for unshipped claims such as Arena leaderboard, direct CSV/JSON browser upload, HF Hub push, significance testing, or any feature still described as roadmap-only after it shipped.
  2. Check roadmap/current-status language: alpha docs must not imply GA or “production-ready” if the release is still alpha.
  3. If a task moved from roadmap to shipped functionality, update both docs and specs/3_tasks.md in the same change.

Runtime & Demo Truthfulness

  1. Verify Quick Start, provider docs, and run docs no longer describe a seeded demo provider as part of the default user path.
  2. Verify internal testing docs clearly separate the local OpenAI-compatible test gateway from real provider results.

Cross-Surface Consistency

  1. Check Providers, Schedules, Dashboard, Runs, and Comparison docs against the actual navigation labels and primary controls.
  2. If a new page is shipped, add it to the appropriate locale nav file (docs/en/_nav.json or docs/zh/_nav.json) and the nearest section _meta.json, then cross-link it from the nearest user guide index.
  3. If a page becomes localized, search for leftover hard-coded English examples in user-facing docs and test fixtures.

Verification

  1. Run the focused Playwright journeys for the changed flow when feasible.
  2. Rebuild or preview docs if navigation changed.
  3. Update screenshots or recorded steps only after the underlying text has been corrected.