LiteLLM Integration Guide
This guide explains how to integrate LiteLLM with our Claude CLI study implementations, allowing you to extend Claude CLI's architectural patterns to other language models.
What is LiteLLM?
LiteLLM is an open-source library that provides a unified interface for interacting with various large language models (LLMs) from different providers, including:
- Anthropic (Claude)
- OpenAI (GPT models)
- Google (Gemini)
- Mistral AI
- Cohere
- And many others
By integrating our Claude CLI study implementations with LiteLLM, you can:
- Extend Claude CLI patterns to other language models
- Compare architecture effectiveness across different providers
- Test semantic chunking with various models
- Evaluate hybrid architecture performance with different LLMs
Prerequisites
Before integrating LiteLLM with our Claude CLI study implementations, ensure you have:
- Our experimental implementations set up (see CLI Emulator and Multi-Provider CLI)
- Python 3.8 or higher
- API keys for the language model providers you want to use
Installation
1. Install LiteLLM
First, install the LiteLLM package:
pip install litellm
2. Configure API Keys
Set up your API keys for the different providers. You can do this in several ways:
Option 1: Environment Variables
# For Anthropic (Claude)
export ANTHROPIC_API_KEY=your_anthropic_api_key
# For OpenAI
export OPENAI_API_KEY=your_openai_api_key
# For Google
export GOOGLE_API_KEY=your_google_api_key
# For Mistral AI
export MISTRAL_API_KEY=your_mistral_api_key
Option 2: LiteLLM Config File
Create a litellm-config.yaml
file:
# litellm-config.yaml
model_list:
- model_name: claude-3-sonnet-20240229
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: your_anthropic_api_key
- model_name: gpt-4-turbo
litellm_params:
model: openai/gpt-4-turbo
api_key: your_openai_api_key
- model_name: gemini-pro
litellm_params:
model: google/gemini-pro
api_key: your_google_api_key
- model_name: mistral-large
litellm_params:
model: mistral/mistral-large-latest
api_key: your_mistral_api_key
general_settings:
default_model: claude-3-sonnet-20240229
3. Configure Multi-Provider CLI for LiteLLM
Update your Multi-Provider CLI configuration to use LiteLLM:
{
"model": {
"provider": "litellm",
"config_path": "/path/to/litellm-config.yaml",
"default_model": "claude-3-sonnet-20240229",
"fallback_models": ["gpt-4-turbo", "mistral-large"]
},
"chunking": {
"semantic_chunking": true,
"chunk_size": 8000,
"chunk_overlap": 200
},
"hybrid_processing": {
"enabled": true,
"local_tools": ["file_read", "search", "git"]
}
}
Basic Usage
Using Different Models
With LiteLLM integration, you can specify which model to use with our Multi-Provider CLI:
# Use Claude 3 Sonnet (default)
multi_provider_cli.py "Explain the authentication flow in this codebase"
# Use GPT-4 Turbo
multi_provider_cli.py --provider openai --model gpt-4 "Explain the authentication flow in this codebase"
# Use Mistral Large
multi_provider_cli.py --provider mistral --model mistral-large "Explain the authentication flow in this codebase"
Model Comparison
You can compare responses from different models with our Multi-Provider CLI:
multi_provider_cli.py --compare "What design patterns are used in this codebase?" \
--models claude-3-sonnet-20240229 gpt-4-turbo mistral-large
This will display the responses from each model side by side, allowing you to compare their analysis.
Advanced Configuration
Model-Specific Parameters
You can configure different parameters for each model:
# litellm-config.yaml
model_list:
- model_name: claude-3-sonnet-20240229
litellm_params:
model: anthropic/claude-3-sonnet-20240229
api_key: your_anthropic_api_key
temperature: 0.1
max_tokens: 4000
- model_name: gpt-4-turbo
litellm_params:
model: openai/gpt-4-turbo
api_key: your_openai_api_key
temperature: 0.2
max_tokens: 3000
Fallback Mechanisms
Configure fallback models in case the primary model fails:
{
"model": {
"provider": "litellm",
"default_model": "claude-3-sonnet-20240229",
"fallback_models": ["gpt-4-turbo", "mistral-large"],
"fallback_strategy": "sequential",
"max_fallback_attempts": 2
}
}
With this configuration, if Claude 3 Sonnet fails, CodeCompass will automatically try GPT-4 Turbo, and if that fails, it will try Mistral Large.
Cost Optimization
Configure cost-based routing to automatically select the most cost-effective model:
{
"model": {
"provider": "litellm",
"routing_strategy": "cost",
"cost_thresholds": {
"low_complexity": {
"model": "mistral-large",
"max_tokens": 2000
},
"medium_complexity": {
"model": "gpt-4-turbo",
"max_tokens": 3000
},
"high_complexity": {
"model": "claude-3-sonnet-20240229",
"max_tokens": 4000
}
}
}
}
With this configuration, CodeCompass will automatically select the appropriate model based on the complexity of the query and the estimated cost.
Prompt Templates for Different Models
Different models may perform better with different prompt formats. You can configure model-specific prompt templates:
{
"prompt_templates": {
"claude-3-sonnet-20240229": {
"code_explanation": "Human: Please analyze this {language} code and explain what it does:\n\n{code}\n\nAssistant:",
"semantic_search": "Human: Find code in the codebase related to: {query}\n\nAssistant:"
},
"gpt-4-turbo": {
"code_explanation": "You are an expert programmer. Analyze this {language} code and explain what it does:\n\n{code}",
"semantic_search": "Find code in the codebase related to: {query}"
},
"mistral-large": {
"code_explanation": "<s>[INST] Analyze this {language} code and explain what it does:\n\n{code} [/INST]",
"semantic_search": "<s>[INST] Find code in the codebase related to: {query} [/INST]"
}
}
}
Programmatic Usage
You can use the LiteLLM integration programmatically in your Python code with our implementation:
from multi_provider_cli import MultiProviderCLI
# Initialize MultiProviderCLI with LiteLLM
cli = MultiProviderCLI(
provider_config_path="/path/to/litellm-config.yaml",
default_provider="anthropic",
default_model="claude-3-sonnet-20240229"
)
# Process a file with a specific model
chunks = cli.semantic_chunker.chunk_file('main.py')
# Process a query using the default model
response = cli.process_query("What does this code do?")
# Process a query using a different model
response_gpt4 = cli.process_query(
"What does this code do?",
provider="openai",
model="gpt-4-turbo"
)
# Compare responses from multiple models
responses = cli.compare_models(
"What design patterns are used in this code?",
models=[
{"provider": "anthropic", "model": "claude-3-sonnet-20240229"},
{"provider": "openai", "model": "gpt-4-turbo"},
{"provider": "mistral", "model": "mistral-large"}
]
)
for provider_model, response in responses.items():
print(f"=== {provider_model} ===")
print(response)
print()
Model Performance Comparison
Different models have different strengths when it comes to code understanding. Here's a general comparison:
Model | Code Understanding | Documentation Generation | Bug Finding | Performance |
---|---|---|---|---|
Claude 3 Sonnet | Excellent | Excellent | Very Good | Fast |
GPT-4 Turbo | Excellent | Very Good | Excellent | Medium |
Gemini Pro | Very Good | Good | Good | Fast |
Mistral Large | Very Good | Good | Good | Fast |
Claude 3 Opus | Outstanding | Outstanding | Outstanding | Slow |
Example: Multi-Model Code Analysis
Here's an example of using our Multi-Provider CLI with multiple models to analyze a complex function:
from multi_provider_cli import MultiProviderCLI
# Initialize MultiProviderCLI with LiteLLM
cli = MultiProviderCLI(provider_config_path="/path/to/litellm-config.yaml")
# Load a complex function
with open("complex_algorithm.py", "r") as f:
code = f.read()
# Define the models to use
models = [
{"provider": "anthropic", "model": "claude-3-sonnet-20240229"},
{"provider": "openai", "model": "gpt-4-turbo"},
{"provider": "mistral", "model": "mistral-large"}
]
analysis_results = {}
for model_config in models:
provider = model_config["provider"]
model = model_config["model"]
model_key = f"{provider}/{model}"
# Time complexity analysis
time_analysis = cli.process_query(
f"Analyze the time complexity of this function:\n\n{code}",
provider=provider,
model=model
)
# Bug detection
bug_analysis = cli.process_query(
f"Identify potential bugs or edge cases in this function:\n\n{code}",
provider=provider,
model=model
)
# Refactoring suggestions
refactor_suggestions = cli.process_query(
f"Suggest ways to refactor this function for better readability and performance:\n\n{code}",
provider=provider,
model=model
)
analysis_results[model_key] = {
"time_complexity": time_analysis,
"bugs": bug_analysis,
"refactoring": refactor_suggestions
}
# Compare and consolidate results
consolidated_analysis = cli.process_query(
f"I've analyzed this function with multiple models. Here are their findings:\n\n" +
"\n\n".join([f"{model}:\n{results}" for model, results in analysis_results.items()]) +
"\n\nPlease provide a consolidated analysis that takes the best insights from each model.",
provider="anthropic",
model="claude-3-sonnet-20240229" # Use Claude for the final consolidation
)
print(consolidated_analysis)
Troubleshooting
API Key Issues
If you encounter authentication errors:
- Verify that your API keys are correct
- Check that the environment variables are properly set
- Ensure the API keys have the necessary permissions
Model Availability
If a model is unavailable:
- Check if the model name is correct in your configuration
- Verify that you have access to the model with your API key
- Check if the model provider is experiencing any outages
Rate Limiting
If you encounter rate limiting:
- Implement exponential backoff in your requests
- Consider using a different model temporarily
- Check your usage limits with the provider
Best Practices
1. Model Selection Guidelines
Choose the appropriate model based on the task:
- Code Explanation: Claude 3 Sonnet or GPT-4 Turbo
- Bug Finding: GPT-4 Turbo or Claude 3 Opus
- Documentation Generation: Claude 3 Sonnet
- Simple Queries: Mistral Large or Gemini Pro (more cost-effective)
2. Cost Management
- Use less expensive models for simpler tasks
- Implement caching to avoid redundant API calls
- Monitor your usage and costs regularly
3. Prompt Engineering
- Adapt your prompts to each model's strengths
- Use model-specific prompt templates
- Include clear instructions and context
4. Fallback Strategies
- Configure multiple fallback models
- Implement retry logic with exponential backoff
- Log failures for analysis
Conclusion
Integrating CodeCompass with LiteLLM provides flexibility, reliability, and cost optimization by allowing you to leverage multiple language models. By following this guide, you can configure CodeCompass to use the most appropriate model for each task, compare model performances, and implement fallback mechanisms for robust code analysis.
For more information on LiteLLM, visit the official LiteLLM documentation.