LightEval Interoperability Test Fixtures

Status: Active (2026-03-13)

This document describes the deterministic fixture set used to exercise the contract between LightEvalConfigService, the LiteLLM factory, and the lighteval endpoint litellm CLI/Python entrypoints.

Objectives

validate generated config.yaml files for both CLI and Python LightEval paths
cover success, retry, and failure semantics
avoid external API calls in CI and local smoke runs
keep contract fixtures reproducible

Fixture Layout

backend/tests/lighteval/
├── fixtures/
│   ├── providers/
│   │   └── fake-openai.json
│   ├── datasets/
│   │   └── mini-mmlu.eval752.zip
│   └── configs/
│       ├── expected-config.yaml
│       └── expected-cli-stdout.txt
├── test_cli_contract.py
├── test_python_contract.py
└── utils.py

The provider fixture points at a local OpenAI-compatible test gateway. Contract tests may rewrite the base URL dynamically when they start an ephemeral local instance.

Test Matrix

ID	Scenario	Assertion
LEV-001	CLI happy path	config matches golden, CLI succeeds, one provider call
LEV-002	CLI retry path	fail-once model retries and then succeeds
LEV-003	Python API failure	always-fail model raises after retry budget is exhausted
LEV-004	Config drift guard	generated YAML still matches golden output
LEV-005	Multilingual contract	polyglot fixture remains stable across CLI and Python paths

Test Gateway Behavior

The local test gateway is OpenAI-compatible and keyed by model name:

normal model names succeed
names containing fail-once fail on the first call, then succeed
names containing always-fail fail every time

This keeps retry coverage deterministic without introducing alternate runtime execution branches in production code.

Fixture Regeneration

Refresh the fixture set with:

cd backend
uv run python ../scripts/testing/build_lighteval_fixture.py

The builder updates:

provider JSON
dataset package
golden LightEval config
golden CLI stdout

CI Usage

pytest -m lighteval --no-cov runs in CI and nightly flows
Docker integration remains independent but uses the same local-gateway approach
failures should upload logs and invocation metadata for debugging

#LightEval Interoperability Test Fixtures

#Objectives

#Fixture Layout

#Test Matrix

#Test Gateway Behavior

#Fixture Regeneration