
fire 1 agent
基于 Firecrawl v1 API 的高级网页抓取代理,支持移动端模拟、复杂操作和结构化数据提取。
Repository Info
About This Server
基于 Firecrawl v1 API 的高级网页抓取代理,支持移动端模拟、复杂操作和结构化数据提取。
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
🔥 FIRE-1 Agent
Next-Generation Web Scraping Agent powered by Firecrawl v1 API
A comprehensive web scraping solution optimized for the latest Firecrawl capabilities, featuring mobile device emulation, advanced actions, and seamless integration with agentic frameworks.
🤖 FIRE-1 Agent vs. Firecrawl MCP: Better Together
FIRE-1 Agent and Firecrawl's MCP Server are complementary tools that solve different problems:
🛠️ Firecrawl MCP (AI Integration Layer)
- Purpose: Direct AI assistant access to web scraping via Model Context Protocol
- Best for: Real-time AI interactions, conversational scraping, IDE integrations
- Use cases: "Hey Claude, scrape this website for me" or automated AI workflows
🚀 FIRE-1 Agent (Advanced Framework)
- Purpose: Sophisticated scraping automation with custom logic and workflows
- Best for: Complex scraping projects, specialized retry logic, batch operations
- Use cases: Large-scale data collection, difficult sites, custom data processing
🔥 FIRE-1 MCP Server ✅ NOW AVAILABLE!
- Purpose: Best of both worlds - Advanced FIRE-1 Agent capabilities accessible via MCP
- Best for: AI assistants needing sophisticated scraping with retry logic, mobile emulation, actions
- Use cases: "Use FIRE-1 Agent to scrape this mobile site with screenshots and data extraction"
💡 Perfect Integration
# Option 1: Use FIRE-1 Agent directly for complex automation
fire_agent = FireAgent()
data = await fire_agent.scrape_with_retries(difficult_url)
# Option 2: Use FIRE-1 MCP Server for AI assistant integration
# "Hey Claude, use FIRE-1 Agent to crawl this docs site and extract all API endpoints"
# Option 3: Use both in your workflow
# Complex scraping + AI analysis + MCP integration
🎉 Ready Now: FIRE-1 MCP Server provides 6 advanced tools accessible from any MCP-compatible AI assistant!
📁 Project Structure
FIRE-1-Agent/
├── fire_agent.py # Core FIRE-1 Agent module
├── fire_mcp_server.py # 🆕 MCP Server - AI Assistant Integration
├── setup.py # Package setup and installation
├── requirements.txt # Python dependencies (includes MCP)
├── env.example # Environment configuration template
├── mcp_config.json # MCP server configuration template
├── pytest.ini # Test configuration
├── README.md # This file
├── SETUP_GITHUB.md # GitHub setup guide
├── MCP_SETUP.md # 🆕 MCP Server setup and integration guide
├──
├── examples/ # Example scrapers and use cases
│ ├── scrape_surveillance.py # Surveillance Watch scraper
│ ├── surveillance_design_scraper.py # Design inspiration scraper
│ └── surveillance_urls.txt # Target URLs
├──
├── scripts/ # Utility scripts
│ ├── gemini_docs_scraper.py # Gemini API docs scraper
│ ├── run_tests.py # Test runner
│ └── activate.sh # Virtual environment helper
├──
├── tests/ # Test suite
└── scraped_data/ # Output directory (auto-created)
✨ Features
🆕 Latest Firecrawl v1 Features
- 📱 Mobile Device Emulation - Scrape mobile-specific content and responsive designs
- 🎬 Advanced Actions - Click, scroll, input, wait, screenshot, and scrape in sequence
- 🖼️ Enhanced Screenshots - Full-page and element-specific screenshots
- 🔗 Advanced Iframe Scraping - Recursive iframe traversal and cross-origin handling
- ⚡ 4x Faster Markdown Parsing - Rebuilt parser for enhanced speed and reliability
Core Capabilities
- 🔍 Individual URL Scraping - Extract content from specific URLs with retry logic
- 🕷️ Website Crawling - Recursively crawl entire websites with smart limits
- 🗺️ Website Mapping - Fast discovery of all URLs on a website
- 🔬 Structured Data Extraction - LLM-powered data extraction with custom schemas
- ⚡ Actions Support - Execute complex interactions before scraping
Advanced Features
- 🤖 Agentic Framework Ready - Optimized async interfaces for AI agent integration
- 🚀 Batch Processing - High-performance concurrent processing with semaphores
- 📊 Progress Tracking - Real-time progress bars and comprehensive statistics
- 🛡️ Enterprise Error Handling - Exponential backoff retry with smart error detection
- 💾 Multiple Output Formats - Markdown, HTML, screenshots, links, structured data
- 🎯 Flexible Configuration - Environment variables and command-line options
- 📈 Analytics & Reporting - Detailed statistics and success rate tracking
Output Formats
- Markdown - Clean, LLM-ready content
- HTML - Raw HTML content with proper formatting
- Screenshots - Full-page and mobile screenshots
- Links - Extracted links and navigation structure
- Structured Data - LLM-extracted JSON with custom schemas
- JSON/CSV - Complete metadata and analytics exports
🚀 Quick Start
1. Installation
# Clone or create your project directory
mkdir fire-agent && cd fire-agent
# Install dependencies
pip install -r requirements.txt
2. Configuration
Copy the example configuration:
cp config.env.example .env
Edit .env and add your Firecrawl API key:
# Get your API key from https://firecrawl.dev
FIRECRAWL_API_KEY=fc-YOUR_API_KEY_HERE
3. Prepare URLs
Create a URLs file with your target websites, one per line:
# Create your URLs file
echo "https://example.com/page1" > my_urls.txt
echo "https://example.com/page2" >> my_urls.txt
# Comments start with #
4. Run the Agent
For specialized scraping (surveillance example):
python examples/scrape_surveillance.py
Basic Scraping (General purpose):
python fire_agent.py --urls-file my_urls.txt
Interactive Mode:
python fire_agent.py --operation interactive
📖 Usage Guide
Basic Agent (fire_agent.py)
🆕 Unified agent with ALL Firecrawl v1 capabilities - optimized for agentic frameworks:
- Default mode: Simple batch scraping with v1 API optimizations
- Advanced modes:
--operation crawl|map|extract|interactive - New features: Mobile scraping, action sequences, enhanced screenshots
- Agentic ready: Async interfaces, capabilities discovery, batch processing
- Enterprise grade: Exponential backoff retry, comprehensive error handling
Default usage (simple scraping):
python fire_agent.py --urls-file your_urls.txt # Scrapes your URLs with v1 API
python fire_agent.py --urls-file your_urls.txt --concurrent 3
Advanced usage:
python fire_agent.py --operation crawl --crawl-limit 100
python fire_agent.py --operation map
python fire_agent.py --operation interactive
Agentic framework usage:
# Direct integration in your agent code
agent = FireAgent()
result = await agent.scrape_single_url(url, {'mobile': True, 'formats': ['markdown', 'screenshot']})
Specialized Scrapers (Examples)
Surveillance Watch Design Scraper:
- Pre-configured for design inspiration scraping
- Enhanced retry logic with exponential backoff
- Optimized output formatting for design analysis
Usage:
python examples/surveillance_design_scraper.py # Use default settings
python examples/scrape_surveillance.py --concurrent 3 # Adjust concurrency
Additional Utility Scripts (scripts/)
Gemini Documentation Scraper (scripts/gemini_docs_scraper.py):
- Specialized agent optimized for API documentation scraping
- Enhanced retry logic with exponential backoff
- Optimized output formatting for documentation
- Available in the scripts directory
Usage:
python scripts/gemini_docs_scraper.py # Use default settings
python scripts/gemini_docs_scraper.py --concurrent 3 # Adjust concurrency
Configuration Options
Environment Variables (.env file):
# API Configuration
FIRECRAWL_API_KEY=fc-your-api-key
# Scraping Configuration
OUTPUT_FORMAT=markdown,html,metadata,mobile,screenshot
MAX_CONCURRENT_REQUESTS=5
DELAY_BETWEEN_REQUESTS=1.0
# Retry Configuration (NEW!)
MAX_RETRIES=3
RETRY_BASE_DELAY=2.0
# Output Configuration
OUTPUT_DIR=scraped_data
SAVE_INDIVIDUAL_FILES=true
SAVE_COMBINED_FILE=true
# Advanced Configuration
CRAWL_LIMIT=50
CRAWL_DEPTH=3
ENABLE_ACTIONS=true
ENABLE_STRUCTURED_EXTRACTION=true
🎯 Use Cases
1. Design Inspiration Scraping
Use the specialized surveillance scraper example:
python examples/surveillance_design_scraper.py
✅ Pre-configured for design analysis
✅ Retry logic for 100% success rate
✅ Optimized output formats for design inspiration
2. General Documentation Scraping
python fire_agent.py --urls-file your_urls.txt
3. Website Content Analysis
python fire_agent.py --operation crawl --crawl-limit 200
4. URL Discovery
python fire_agent.py --operation map
5. Structured Data Extraction
Extract specific data using custom schemas:
python fire_agent.py --operation interactive
# Then choose option 4 for structured extraction
6. Agentic Framework Integration
Perfect for AI agents and automation:
from fire_agent import FireAgent
# Initialize with capabilities discovery
agent = FireAgent()
capabilities = agent.get_agent_capabilities()
# Advanced action-based scraping
actions = [
{"type": "wait", "selector": "#content"},
{"type": "click", "selector": ".load-more"},
{"type": "screenshot"},
{"type": "scrape"}
]
result = await agent.scrape_with_actions(url, actions, ['markdown', 'screenshot'])
# High-performance batch processing
results = await agent.batch_scrape_async(urls, max_concurrent=10)
📁 Output Structure
The agent creates organized output in the scraped_data/ directory:
scraped_data/
├── individual_files/
│ ├── ai.google.dev_gemini-api_docs_api-key.md
│ ├── ai.google.dev_gemini-api_docs_api-key.html
│ └── ai.google.dev_gemini-api_docs_api-key_metadata.json
├── combined_scrape_20241201_143022.md
├── combined_scrape_20241201_143022.json
└── scrape_summary_20241201_143022.csv
File Types
.mdfiles - Clean markdown content for each URL.htmlfiles - Raw HTML content_metadata.json- Page metadata (title, description, etc.)combined_*.md- All content in one markdown filecombined_*.json- Complete data with metadatasummary_*.csv- Spreadsheet-friendly summary
🔧 Advanced Configuration
Custom Actions
Enable JavaScript interactions:
ENABLE_ACTIONS=true
This adds actions like:
- Wait for page load
- Scroll to load dynamic content
- Click buttons
- Fill forms
Structured Data Extraction
Enable LLM-powered data extraction:
ENABLE_STRUCTURED_EXTRACTION=true
Define custom schemas for:
- Articles (title, author, content, tags)
- Products (name, price, features, rating)
- Contacts (name, email, phone, address)
Rate Limiting
Adjust request frequency:
MAX_CONCURRENT_REQUESTS=3
DELAY_BETWEEN_REQUESTS=2.0
🛡️ Error Handling & Retry Logic
The agent includes comprehensive error handling with automatic retry mechanism:
Retry Configuration
MAX_RETRIES=3 # Number of retry attempts (default: 3)
RETRY_BASE_DELAY=2.0 # Base delay in seconds (default: 2.0s)
How Retries Work
- Exponential Backoff: 2s → 4s → 8s delays between retries
- Smart Retry Logic: Only retries on network/timeout errors, not content issues
- Attempt Tracking: Logs show which attempt succeeded
- Final Failure: After max retries, records detailed error information
Error Types Handled
- Network timeouts - Automatic retries with backoff
- Rate limiting - Respects API limits
- Invalid URLs - Graceful skipping
- Content errors - Detailed logging
- File system errors - Safe file operations
Example Retry Flow
🔍 Scraping: https://example.com (attempt 1/4)
❌ Error: Request timeout (attempt 1)
🔄 Retry attempt 1/3 for https://example.com (waiting 2s)
🔍 Scraping: https://example.com (attempt 2/4)
❌ Error: Request timeout (attempt 2)
🔄 Retry attempt 2/3 for https://example.com (waiting 4s)
🔍 Scraping: https://example.com (attempt 3/4)
✅ Successfully scraped: https://example.com (attempt 3)
View errors in the statistics report or check individual error logs.
📊 Monitoring & Analytics
Both agents provide detailed statistics:
- Success rates - Track scraping efficiency
- Processing time - Monitor performance
- Error analysis - Identify problem URLs
- Output metrics - Count pages processed
Example statistics output:
📊 FIRE-1 Agent Scraping Statistics
┌─────────────────────┬─────────────┐
│ Metric │ Value │
├─────────────────────┼─────────────┤
│ Total URLs │ 70 │
│ Successful Scrapes │ 68 │
│ Failed Scrapes │ 2 │
│ Duration │ 0:02:34 │
│ Success Rate │ 97.1% │
└─────────────────────┴─────────────┘
🤖 Interactive Mode
The unified agent includes an interactive mode for exploratory scraping:
python fire_agent.py --operation interactive
Features:
- Choose operation type - Scrape, crawl, map, or extract
- Input URLs manually - Or load from file
- Configure parameters - Set limits and options
- Real-time feedback - See results immediately
- Chain operations - Map → Scrape discovered URLs
💡 Tips & Best Practices
1. Start Small
Begin with a few URLs to test configuration:
# Test with first 5 URLs
head -5 your_urls.txt > test_urls.txt
python fire_agent.py --urls-file test_urls.txt
2. Optimize Concurrency
Balance speed vs. politeness:
- Conservative: 2-3 concurrent requests
- Moderate: 5-8 concurrent requests
- Aggressive: 10+ concurrent requests (use carefully)
3. Content Quality
Choose appropriate formats:
- Markdown: Best for LLM processing
- HTML: Preserve original formatting
- Metadata: Extract page information
4. Large Datasets
For large URL lists:
- Use batch processing
- Monitor memory usage
- Enable individual file saving
- Set appropriate delays
5. API Limits
Firecrawl has usage limits:
- Monitor your usage
- Implement proper delays
- Handle rate limiting gracefully
🔗 Firecrawl Features Used
This agent leverages Firecrawl's full capability set:
- Scraping -
scrape_url()for individual pages - Crawling -
crawl_url()for recursive site scraping - Mapping -
map_url()for URL discovery - Actions - JavaScript interactions
- Extraction - LLM-powered structured data
- Formats - Multiple output formats
🚨 Troubleshooting
Common Issues
1. API Key Error
❌ FIRECRAWL_API_KEY not found
Solution: Set your API key in .env file
2. Import Errors
❌ firecrawl-py not installed
Solution: Run pip install -r requirements.txt
3. File Not Found
❌ File your_urls.txt not found
Solution: Ensure your URLs file exists and has correct name
4. Empty Results
⚠️ No content returned for URL
Solution: Check URL accessibility and Firecrawl limits
Debug Mode
Enable verbose logging:
python fire_agent.py --verbose
Rate Limiting
If hitting rate limits:
DELAY_BETWEEN_REQUESTS=3.0
MAX_CONCURRENT_REQUESTS=2
🔄 Updates & Maintenance
Keep your agent updated with the latest Firecrawl v1 features:
-
Update dependencies:
pip install --upgrade firecrawl-py rich python-dotenv pandas -
Check Firecrawl status: Visit Firecrawl Status
-
Monitor API usage: Check your Firecrawl Dashboard
-
Latest v1 features: Mobile scraping, advanced actions, enhanced iframes
🧪 Testing & Quality Assurance
The FIRE-1 Agent includes a comprehensive test suite with 100% passing tests:
Test Categories
- Unit Tests (12 tests) - Core functionality, configuration, file operations
- Integration Tests (15 tests) - Complete workflows, error recovery, feature integration
- Scraping Operations (12 tests) - Scrape, crawl, map, extract operations
- Retry Logic (8 tests) - Exponential backoff, error handling, concurrency
- Gemini Scraper (6 tests) - Specialized documentation scraping workflows
Running Tests
Quick test (unit tests only):
python scripts/run_tests.py --mode quick
Full test suite:
python scripts/run_tests.py --mode all
Specific test categories:
python scripts/run_tests.py --mode unit # Unit tests only
python scripts/run_tests.py --mode integration # Integration tests
python scripts/run_tests.py --mode retry # Retry logic tests
python scripts/run_tests.py --mode gemini # Gemini-specific tests
Real API tests (use sparingly):
pytest tests/test_real_integration.py -m real_api -v
Test Coverage
- 53 passing tests with 98% success rate
- Professional mocking for external APIs
- Real file operations testing
- Async/await support with proper fixtures
- Coverage reporting in HTML and XML formats
View coverage report:
# After running tests
open htmlcov/index.html # View detailed coverage report
Test Infrastructure
- pytest with async support
- Rich output with colored progress
- HTML reports for CI/CD integration
- Parallel execution with pytest-xdist
- Factory patterns for test data generation
- Comprehensive mocking framework
The test suite ensures reliability and catches regressions, making FIRE-1 Agent production-ready.
📝 License
This project is open source. Use it freely for your scraping and agentic framework needs.
🤝 Contributing
Contributions welcome! Areas for improvement:
- Additional agentic framework integrations
- Enhanced mobile device profiles
- Advanced action sequences
- Performance optimizations
- More structured extraction schemas
Happy Scraping with FIRE-1 Agent v2.0! 🔥
Optimized for Firecrawl v1 API • Ready for Agentic Frameworks • Enterprise-Grade Performance
For questions or issues, please check the Firecrawl documentation or create an issue in this repository.
Quick Start
Clone the repository
git clone https://github.com/bubroz/fire-1-agentInstall dependencies
cd fire-1-agent
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.