Skip to main content

๐Ÿ”ฅ How-To: Setup Firecrawl Integration

Goal: Enable web research capabilities for enhanced architectural analysis and comprehensive ADR generation.

When to use this guide: When you want to add web search and research capabilities to your MCP ADR Analysis Server for more comprehensive architectural decision-making.


๐ŸŽฏ What is Firecrawl?โ€‹

Firecrawl provides intelligent web scraping and content extraction, enabling the MCP ADR Analysis Server to:

  • Research Best Practices - Find current architectural patterns and recommendations
  • Gather External Context - Access technical documentation, blogs, and case studies
  • Enhance ADRs - Generate more comprehensive decision records with real-world examples
  • Intelligent Scraping - Extract relevant content from complex web pages

๐Ÿš€ Quick Setupโ€‹

Best for: Most users, easy setup, no infrastructure management

# 1. Get your API key from https://firecrawl.dev
# Sign up and get your API key (starts with "fc-")

# 2. Configure environment variables
export FIRECRAWL_ENABLED="true"
export FIRECRAWL_API_KEY="fc-your-api-key-here"

# 3. Test the integration
mcp-adr-analysis-server --test

Option 2: Self-Hostedโ€‹

Best for: Enterprise users, privacy concerns, custom configurations

# 1. Run Firecrawl locally with Docker
docker run -p 3000:3000 firecrawl/firecrawl

# 2. Configure environment variables
export FIRECRAWL_ENABLED="true"
export FIRECRAWL_BASE_URL="http://localhost:3000"

# 3. Test the integration
mcp-adr-analysis-server --test

Option 3: Disabled (Default)โ€‹

Best for: Users who don't need web search capabilities

# Firecrawl is disabled by default
# Server works perfectly without web search
# No configuration needed

๐Ÿ”ง Detailed Configurationโ€‹

Environment Variablesโ€‹

VariableRequiredDefaultDescription
FIRECRAWL_ENABLEDNofalseEnable Firecrawl integration
FIRECRAWL_API_KEYNo*-API key for cloud service
FIRECRAWL_BASE_URLNohttp://localhost:3000Self-hosted instance URL

*Required if using cloud service

Configuration Examplesโ€‹

Development Environmentโ€‹

# .env.development
FIRECRAWL_ENABLED="true"
FIRECRAWL_API_KEY="fc-dev-key-here"
LOG_LEVEL="DEBUG"

Production Environmentโ€‹

# .env.production
FIRECRAWL_ENABLED="true"
FIRECRAWL_BASE_URL="http://firecrawl:3000"
LOG_LEVEL="INFO"

CI/CD Environmentโ€‹

# .env.ci
FIRECRAWL_ENABLED="false" # Disable for CI performance
LOG_LEVEL="ERROR"

MCP Client Configurationโ€‹

{
"mcpServers": {
"adr-analysis": {
"command": "mcp-adr-analysis-server",
"env": {
"FIRECRAWL_ENABLED": "true",
"FIRECRAWL_API_KEY": "fc-your-api-key-here"
}
}
}
}

๐Ÿงช Testing Firecrawl Integrationโ€‹

Basic Testโ€‹

# Test if Firecrawl is properly configured
mcp-adr-analysis-server --test

# Look for this output:
# โœ… Firecrawl integration: ENABLED
# โœ… Firecrawl connection: SUCCESS

Advanced Testโ€‹

# Test web search functionality
mcp-adr-analysis-server --test-web-search

# Expected output:
# ๐Ÿ” Testing web search capabilities...
# โœ… Web search: SUCCESS
# โœ… Content extraction: SUCCESS
# โœ… Relevance scoring: SUCCESS

Manual Testโ€‹

// Test in your MCP client
const result = await llm_web_search({
query: "microservices architecture best practices 2024",
maxResults: 3,
includeContent: true
});

console.log(result);
// Should return relevant web content with relevance scores

๐Ÿ› ๏ธ Firecrawl-Enhanced Toolsโ€‹

When Firecrawl is enabled, these tools gain enhanced capabilities:

  • Purpose: Intelligent web search with relevance scoring
  • Enhanced with: Real-time content extraction and analysis
  • Use case: Research architectural patterns and best practices

llm_cloud_managementโ€‹

  • Purpose: Cloud provider research and recommendations
  • Enhanced with: Current pricing, features, and best practices
  • Use case: Make informed cloud architecture decisions

llm_database_managementโ€‹

  • Purpose: Database technology research and recommendations
  • Enhanced with: Performance benchmarks and real-world usage
  • Use case: Select optimal database technologies

Research Orchestratorโ€‹

  • Purpose: Multi-source research with confidence scoring
  • Enhanced with: Web search as additional research source
  • Use case: Comprehensive architectural analysis

๐Ÿšจ Troubleshootingโ€‹

Common Issuesโ€‹

"Firecrawl integration: DISABLED"โ€‹

# Check if FIRECRAWL_ENABLED is set
echo $FIRECRAWL_ENABLED

# If not set, enable it
export FIRECRAWL_ENABLED="true"

"Firecrawl connection: FAILED"โ€‹

# Check API key format
echo $FIRECRAWL_API_KEY
# Should start with "fc-" for cloud service

# Check base URL for self-hosted
echo $FIRECRAWL_BASE_URL
# Should be accessible (test with curl)
curl "$FIRECRAWL_BASE_URL/health"

"Web search: FAILED"โ€‹

# Check network connectivity
curl -I https://firecrawl.dev

# Check API key validity
curl -H "Authorization: Bearer $FIRECRAWL_API_KEY" \
https://api.firecrawl.dev/v0/health

"Content extraction: FAILED"โ€‹

# Check if target URLs are accessible
curl -I "https://example.com"

# Check Firecrawl service status
curl "$FIRECRAWL_BASE_URL/health"

Debug Modeโ€‹

# Enable debug logging
export LOG_LEVEL="DEBUG"

# Run with verbose output
mcp-adr-analysis-server --test --verbose

# Check logs for detailed error information

Fallback Behaviorโ€‹

The server gracefully handles Firecrawl failures:

  • Web search fails: Falls back to local analysis only
  • Content extraction fails: Uses cached results if available
  • API rate limits: Implements exponential backoff
  • Network issues: Continues with local capabilities

๐Ÿ“Š Performance Considerationsโ€‹

Response Timesโ€‹

  • Web search: 2-5 seconds per query
  • Content extraction: 1-3 seconds per page
  • Relevance scoring: 0.5-1 second per result

Rate Limitsโ€‹

  • Cloud service: 100 requests/minute (free tier)
  • Self-hosted: No rate limits (depends on your infrastructure)

Cachingโ€‹

  • Research results: 5-minute TTL
  • Web content: 1-hour TTL
  • Relevance scores: 24-hour TTL

Optimization Tipsโ€‹

# Use self-hosted for high-volume usage
export FIRECRAWL_BASE_URL="http://your-firecrawl-instance:3000"

# Enable caching for better performance
export AI_CACHE_ENABLED="true"
export CACHE_TTL="3600" # 1 hour

# Limit concurrent requests
export MAX_CONCURRENT_REQUESTS="5"

๐Ÿ”’ Security Considerationsโ€‹

API Key Securityโ€‹

# Never commit API keys to version control
echo "FIRECRAWL_API_KEY=fc-*" >> .gitignore

# Use environment-specific keys
export FIRECRAWL_API_KEY="fc-dev-key" # Development
export FIRECRAWL_API_KEY="fc-prod-key" # Production

Self-Hosted Securityโ€‹

# Use HTTPS in production
export FIRECRAWL_BASE_URL="https://firecrawl.yourcompany.com"

# Implement authentication
export FIRECRAWL_AUTH_TOKEN="your-auth-token"

# Restrict network access
# Only allow access from your MCP server

Content Filteringโ€‹

# Filter sensitive domains
export FIRECRAWL_BLOCKED_DOMAINS="internal.company.com,private.*"

# Enable content sanitization
export FIRECRAWL_SANITIZE_CONTENT="true"

๐Ÿ“š Further Readingโ€‹


Need help with Firecrawl setup? โ†’ Join the Discussion

Firecrawl issues? โ†’ Check Troubleshooting