LightEval Interoperability Test Fixtures

Status: Active (2026-03-13)

This document describes the deterministic fixture set used to exercise the contract between LightEvalConfigService, the LiteLLM factory, and the lighteval endpoint litellm CLI/Python entrypoints.

Objectives

  • validate generated config.yaml files for both CLI and Python LightEval paths
  • cover success, retry, and failure semantics
  • avoid external API calls in CI and local smoke runs
  • keep contract fixtures reproducible

Fixture Layout

backend/tests/lighteval/
├── fixtures/
│   ├── providers/
│   │   └── fake-openai.json
│   ├── datasets/
│   │   └── mini-mmlu.eval752.zip
│   └── configs/
│       ├── expected-config.yaml
│       └── expected-cli-stdout.txt
├── test_cli_contract.py
├── test_python_contract.py
└── utils.py

The provider fixture points at a local OpenAI-compatible test gateway. Contract tests may rewrite the base URL dynamically when they start an ephemeral local instance.

Test Matrix

IDScenarioAssertion
LEV-001CLI happy pathconfig matches golden, CLI succeeds, one provider call
LEV-002CLI retry pathfail-once model retries and then succeeds
LEV-003Python API failurealways-fail model raises after retry budget is exhausted
LEV-004Config drift guardgenerated YAML still matches golden output
LEV-005Multilingual contractpolyglot fixture remains stable across CLI and Python paths

Test Gateway Behavior

The local test gateway is OpenAI-compatible and keyed by model name:

  • normal model names succeed
  • names containing fail-once fail on the first call, then succeed
  • names containing always-fail fail every time

This keeps retry coverage deterministic without introducing alternate runtime execution branches in production code.

Fixture Regeneration

Refresh the fixture set with:

cd backend
uv run python ../scripts/testing/build_lighteval_fixture.py

The builder updates:

  • provider JSON
  • dataset package
  • golden LightEval config
  • golden CLI stdout

CI Usage

  • pytest -m lighteval --no-cov runs in CI and nightly flows
  • Docker integration remains independent but uses the same local-gateway approach
  • failures should upload logs and invocation metadata for debugging