LiteLLM Python SDK Integration Blueprint
Status: draft · Owner: Codex
This blueprint describes how the LiteLLM Python SDK fits into the FastAPI + Celery architecture. It keeps provider configuration, run execution, scoring, and LightEval interoperability aligned around one integration surface.
Objectives
- Centralize LiteLLM client initialization so FastAPI, Celery workers, and LightEval reuse the same defaults.
- Keep deterministic test coverage possible through local OpenAI-compatible gateways.
- Expose runtime controls without leaking secrets into the wrong layer.
- Preserve a thin enough adapter that a future SDK swap remains plausible.
Current Baseline
eval_752.services.litellm_client.LiteLLMClientwrapslitellm.completion.- Celery tasks obtain clients through worker-side helper code that decrypts provider keys.
- Provider creation stores encrypted API keys and optional
rate_limitJSON on theproviderstable. - Unit tests use mocks or fakes; integration tests use a local OpenAI-compatible gateway.
Design Overview
Component Plan
Factory and caching
Introduce LiteLLMClientFactory under eval_752.services:
- accepts
LiteLLMSettings - caches sync clients per provider/key combination
- leaves room for future async clients when streaming needs widen
- exposes a
reset()hook for tests
Settings split
- bootstrap-only engine selection stays in
backend/src/eval_752/app/config.py - runtime request timeout and retry policy are persisted through workspace settings
- the env/UI split must stay documented in Configuration
Retry and rate-limit strategy
- provider metadata can describe concurrency and other rate-related fields
- retry behavior should wrap LiteLLM client calls with explicit backoff and structured logging
- logs should record
provider_id,model,retry_count,latency_ms, and token usage when available
Test hooks
- production keeps a single standard LiteLLM path
- unit tests may still stub or fake clients
- integration tests should continue using a local OpenAI-compatible gateway with deterministic model names
FastAPI usage
The /providers/{id}/smoke-test endpoint should use the shared factory so smoke tests and real runs
observe the same client defaults.
Celery usage
- worker execution should go through the same factory
- streaming-first behavior should remain preferred, with buffered fallback only when the provider rejects streaming
- structured invocation logs should be attached to the run log stream
LightEval alignment
- LightEval config generation should pull the same provider base URL and key selection logic
- when
use_lighteval_executoris enabled, the LightEval path should stay configuration-compatible with the direct LiteLLM path
Observability
Recommended metrics:
litellm_requests_totallitellm_request_latency_secondslitellm_tokens_total
Recommended labels:
provider_idmodelstatus
Open Questions
- whether day-level token budgets are needed before the Phase 3 horizon
- how provider-specific headers should be stored if richer proxy compatibility becomes necessary
- how streaming adapters should evolve without duplicating logic
Next Actions
- keep runtime controls and secret templates aligned
- continue factory hardening and integration coverage
- keep this blueprint synchronized with implementation and acceptance docs
