📄️ Input Params
Common Params
📄️ Provider-specific Params
Providers might offer params not supported by OpenAI (e.g. top_k). LiteLLM treats any non-openai param, as a provider-specific param, and passes it to the provider in the request body, as a kwarg. See Reserved Params
📄️ Structured Outputs (JSON Mode)
Quick Start
📄️ Pre-fix Assistant Messages
Supported by:
📄️ Drop Unsupported Params
Drop unsupported OpenAI params by your LLM Provider.
📄️ Prompt Formatting
LiteLLM automatically translates the OpenAI ChatCompletions prompt format, to other models. You can control this by setting a custom prompt template for a model as well.
📄️ Output
Format
📄️ Prompt Caching
For OpenAI + Anthropic + Deepseek, LiteLLM follows the OpenAI prompt caching usage object format:
📄️ Usage
LiteLLM returns the OpenAI compatible usage object across all providers.
📄️ Exception Mapping
LiteLLM maps exceptions across all providers to their OpenAI counterparts.
📄️ Streaming + Async
- Streaming Responses
📄️ Trimming Input Messages
Use litellm.trim_messages() to ensure messages does not exceed a model's token limit or specified max_tokens
📄️ Function Calling
Checking if a model supports function calling
📄️ Using Vision Models
Quick Start
📄️ Model Alias
The model name you show an end-user might be different from the one you pass to LiteLLM - e.g. Displaying GPT-3.5 while calling gpt-3.5-turbo-16k on the backend.
📄️ Batching Completion()
LiteLLM allows you to:
📄️ Mock Completion() Responses - Save Testing Costs 💰
For testing purposes, you can use completion() with mock_response to mock calling the completion endpoint.
📄️ Reliability - Retries, Fallbacks
LiteLLM helps prevent failed requests in 2 ways: