MCP Serverbubrozpublic

fire 1 agent

基于 Firecrawl v1 API 的高级网页抓取代理，支持移动端模拟、复杂操作和结构化数据提取。

Repository Info

Stars

Forks

Watchers

Issues

Python

Language

License

View on GitHubGitHub Download DocumentationDocs

About This Server

基于 Firecrawl v1 API 的高级网页抓取代理，支持移动端模拟、复杂操作和结构化数据提取。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

🔥 FIRE-1 Agent

Next-Generation Web Scraping Agent powered by Firecrawl v1 API

A comprehensive web scraping solution optimized for the latest Firecrawl capabilities, featuring mobile device emulation, advanced actions, and seamless integration with agentic frameworks.

🤖 FIRE-1 Agent vs. Firecrawl MCP: Better Together

FIRE-1 Agent and Firecrawl's MCP Server are complementary tools that solve different problems:

🛠️ Firecrawl MCP (AI Integration Layer)

Purpose: Direct AI assistant access to web scraping via Model Context Protocol
Best for: Real-time AI interactions, conversational scraping, IDE integrations
Use cases: "Hey Claude, scrape this website for me" or automated AI workflows

🚀 FIRE-1 Agent (Advanced Framework)

Purpose: Sophisticated scraping automation with custom logic and workflows
Best for: Complex scraping projects, specialized retry logic, batch operations
Use cases: Large-scale data collection, difficult sites, custom data processing

🔥 FIRE-1 MCP Server ✅ NOW AVAILABLE!

Purpose: Best of both worlds - Advanced FIRE-1 Agent capabilities accessible via MCP
Best for: AI assistants needing sophisticated scraping with retry logic, mobile emulation, actions
Use cases: "Use FIRE-1 Agent to scrape this mobile site with screenshots and data extraction"

💡 Perfect Integration

# Option 1: Use FIRE-1 Agent directly for complex automation
fire_agent = FireAgent()
data = await fire_agent.scrape_with_retries(difficult_url)

# Option 2: Use FIRE-1 MCP Server for AI assistant integration
# "Hey Claude, use FIRE-1 Agent to crawl this docs site and extract all API endpoints"

# Option 3: Use both in your workflow
# Complex scraping + AI analysis + MCP integration

🎉 Ready Now: FIRE-1 MCP Server provides 6 advanced tools accessible from any MCP-compatible AI assistant!

📁 Project Structure

FIRE-1-Agent/
├── fire_agent.py           # Core FIRE-1 Agent module
├── fire_mcp_server.py      # 🆕 MCP Server - AI Assistant Integration
├── setup.py               # Package setup and installation
├── requirements.txt       # Python dependencies (includes MCP)
├── env.example           # Environment configuration template
├── mcp_config.json       # MCP server configuration template
├── pytest.ini           # Test configuration
├── README.md             # This file
├── SETUP_GITHUB.md       # GitHub setup guide
├── MCP_SETUP.md          # 🆕 MCP Server setup and integration guide
├── 
├── examples/             # Example scrapers and use cases
│   ├── scrape_surveillance.py      # Surveillance Watch scraper
│   ├── surveillance_design_scraper.py  # Design inspiration scraper
│   └── surveillance_urls.txt       # Target URLs
├── 
├── scripts/              # Utility scripts
│   ├── gemini_docs_scraper.py      # Gemini API docs scraper
│   ├── run_tests.py               # Test runner
│   └── activate.sh                # Virtual environment helper
├── 
├── tests/                # Test suite
└── scraped_data/         # Output directory (auto-created)

✨ Features

🆕 Latest Firecrawl v1 Features

📱 Mobile Device Emulation - Scrape mobile-specific content and responsive designs
🎬 Advanced Actions - Click, scroll, input, wait, screenshot, and scrape in sequence
🖼️ Enhanced Screenshots - Full-page and element-specific screenshots
🔗 Advanced Iframe Scraping - Recursive iframe traversal and cross-origin handling
⚡ 4x Faster Markdown Parsing - Rebuilt parser for enhanced speed and reliability

Core Capabilities

🔍 Individual URL Scraping - Extract content from specific URLs with retry logic
🕷️ Website Crawling - Recursively crawl entire websites with smart limits
🗺️ Website Mapping - Fast discovery of all URLs on a website
🔬 Structured Data Extraction - LLM-powered data extraction with custom schemas
⚡ Actions Support - Execute complex interactions before scraping

Advanced Features

🤖 Agentic Framework Ready - Optimized async interfaces for AI agent integration
🚀 Batch Processing - High-performance concurrent processing with semaphores
📊 Progress Tracking - Real-time progress bars and comprehensive statistics
🛡️ Enterprise Error Handling - Exponential backoff retry with smart error detection
💾 Multiple Output Formats - Markdown, HTML, screenshots, links, structured data
🎯 Flexible Configuration - Environment variables and command-line options
📈 Analytics & Reporting - Detailed statistics and success rate tracking

Output Formats

Markdown - Clean, LLM-ready content
HTML - Raw HTML content with proper formatting
Screenshots - Full-page and mobile screenshots
Links - Extracted links and navigation structure
Structured Data - LLM-extracted JSON with custom schemas
JSON/CSV - Complete metadata and analytics exports

🚀 Quick Start

1. Installation

# Clone or create your project directory
mkdir fire-agent && cd fire-agent

# Install dependencies
pip install -r requirements.txt

2. Configuration

Copy the example configuration:

cp config.env.example .env

Edit .env and add your Firecrawl API key:

# Get your API key from https://firecrawl.dev
FIRECRAWL_API_KEY=fc-YOUR_API_KEY_HERE

3. Prepare URLs

Create a URLs file with your target websites, one per line:

# Create your URLs file
echo "https://example.com/page1" > my_urls.txt
echo "https://example.com/page2" >> my_urls.txt
# Comments start with #

4. Run the Agent

For specialized scraping (surveillance example):

python examples/scrape_surveillance.py

Basic Scraping (General purpose):

python fire_agent.py --urls-file my_urls.txt

Interactive Mode:

python fire_agent.py --operation interactive

📖 Usage Guide

Basic Agent (`fire_agent.py`)

🆕 Unified agent with ALL Firecrawl v1 capabilities - optimized for agentic frameworks:

Default mode: Simple batch scraping with v1 API optimizations
Advanced modes: --operation crawl|map|extract|interactive
New features: Mobile scraping, action sequences, enhanced screenshots
Agentic ready: Async interfaces, capabilities discovery, batch processing
Enterprise grade: Exponential backoff retry, comprehensive error handling

Default usage (simple scraping):

python fire_agent.py --urls-file your_urls.txt    # Scrapes your URLs with v1 API
python fire_agent.py --urls-file your_urls.txt --concurrent 3

Advanced usage:

python fire_agent.py --operation crawl --crawl-limit 100
python fire_agent.py --operation map
python fire_agent.py --operation interactive

Agentic framework usage:

# Direct integration in your agent code
agent = FireAgent()
result = await agent.scrape_single_url(url, {'mobile': True, 'formats': ['markdown', 'screenshot']})

Specialized Scrapers (Examples)

Surveillance Watch Design Scraper:

Pre-configured for design inspiration scraping
Enhanced retry logic with exponential backoff
Optimized output formatting for design analysis

Usage:

python examples/surveillance_design_scraper.py              # Use default settings
python examples/scrape_surveillance.py --concurrent 3       # Adjust concurrency

Additional Utility Scripts (`scripts/`)

Gemini Documentation Scraper (scripts/gemini_docs_scraper.py):

Specialized agent optimized for API documentation scraping
Enhanced retry logic with exponential backoff
Optimized output formatting for documentation
Available in the scripts directory

Usage:

python scripts/gemini_docs_scraper.py              # Use default settings
python scripts/gemini_docs_scraper.py --concurrent 3    # Adjust concurrency

Configuration Options

Environment Variables (.env file):

# API Configuration
FIRECRAWL_API_KEY=fc-your-api-key

# Scraping Configuration
OUTPUT_FORMAT=markdown,html,metadata,mobile,screenshot
MAX_CONCURRENT_REQUESTS=5
DELAY_BETWEEN_REQUESTS=1.0

# Retry Configuration (NEW!)
MAX_RETRIES=3
RETRY_BASE_DELAY=2.0

# Output Configuration
OUTPUT_DIR=scraped_data
SAVE_INDIVIDUAL_FILES=true
SAVE_COMBINED_FILE=true

# Advanced Configuration
CRAWL_LIMIT=50
CRAWL_DEPTH=3
ENABLE_ACTIONS=true
ENABLE_STRUCTURED_EXTRACTION=true

🎯 Use Cases

1. Design Inspiration Scraping

Use the specialized surveillance scraper example:

python examples/surveillance_design_scraper.py

✅ Pre-configured for design analysis
✅ Retry logic for 100% success rate
✅ Optimized output formats for design inspiration

2. General Documentation Scraping

python fire_agent.py --urls-file your_urls.txt

3. Website Content Analysis

python fire_agent.py --operation crawl --crawl-limit 200

4. URL Discovery

python fire_agent.py --operation map

5. Structured Data Extraction

Extract specific data using custom schemas:

python fire_agent.py --operation interactive
# Then choose option 4 for structured extraction

6. Agentic Framework Integration

Perfect for AI agents and automation:

from fire_agent import FireAgent

# Initialize with capabilities discovery
agent = FireAgent()
capabilities = agent.get_agent_capabilities()

# Advanced action-based scraping
actions = [
    {"type": "wait", "selector": "#content"},
    {"type": "click", "selector": ".load-more"},
    {"type": "screenshot"},
    {"type": "scrape"}
]
result = await agent.scrape_with_actions(url, actions, ['markdown', 'screenshot'])

# High-performance batch processing
results = await agent.batch_scrape_async(urls, max_concurrent=10)

📁 Output Structure

The agent creates organized output in the scraped_data/ directory:

scraped_data/
├── individual_files/
│   ├── ai.google.dev_gemini-api_docs_api-key.md
│   ├── ai.google.dev_gemini-api_docs_api-key.html
│   └── ai.google.dev_gemini-api_docs_api-key_metadata.json
├── combined_scrape_20241201_143022.md
├── combined_scrape_20241201_143022.json
└── scrape_summary_20241201_143022.csv

File Types

.md files - Clean markdown content for each URL
.html files - Raw HTML content
_metadata.json - Page metadata (title, description, etc.)
combined_*.md - All content in one markdown file
combined_*.json - Complete data with metadata
summary_*.csv - Spreadsheet-friendly summary

🔧 Advanced Configuration

Custom Actions

Enable JavaScript interactions:

ENABLE_ACTIONS=true

This adds actions like:

Wait for page load
Scroll to load dynamic content
Click buttons
Fill forms

Structured Data Extraction

Enable LLM-powered data extraction:

ENABLE_STRUCTURED_EXTRACTION=true

Define custom schemas for:

Articles (title, author, content, tags)
Products (name, price, features, rating)
Contacts (name, email, phone, address)

Rate Limiting

Adjust request frequency:

MAX_CONCURRENT_REQUESTS=3
DELAY_BETWEEN_REQUESTS=2.0

🛡️ Error Handling & Retry Logic

The agent includes comprehensive error handling with automatic retry mechanism:

Retry Configuration

MAX_RETRIES=3           # Number of retry attempts (default: 3)
RETRY_BASE_DELAY=2.0    # Base delay in seconds (default: 2.0s)

How Retries Work

Exponential Backoff: 2s → 4s → 8s delays between retries
Smart Retry Logic: Only retries on network/timeout errors, not content issues
Attempt Tracking: Logs show which attempt succeeded
Final Failure: After max retries, records detailed error information

Error Types Handled

Network timeouts - Automatic retries with backoff
Rate limiting - Respects API limits
Invalid URLs - Graceful skipping
Content errors - Detailed logging
File system errors - Safe file operations

Example Retry Flow

🔍 Scraping: https://example.com (attempt 1/4)
❌ Error: Request timeout (attempt 1)
🔄 Retry attempt 1/3 for https://example.com (waiting 2s)
🔍 Scraping: https://example.com (attempt 2/4)
❌ Error: Request timeout (attempt 2)  
🔄 Retry attempt 2/3 for https://example.com (waiting 4s)
🔍 Scraping: https://example.com (attempt 3/4)
✅ Successfully scraped: https://example.com (attempt 3)

View errors in the statistics report or check individual error logs.

📊 Monitoring & Analytics

Both agents provide detailed statistics:

Success rates - Track scraping efficiency
Processing time - Monitor performance
Error analysis - Identify problem URLs
Output metrics - Count pages processed

Example statistics output:

📊 FIRE-1 Agent Scraping Statistics
┌─────────────────────┬─────────────┐
│ Metric              │ Value       │
├─────────────────────┼─────────────┤
│ Total URLs          │ 70          │
│ Successful Scrapes  │ 68          │
│ Failed Scrapes      │ 2           │
│ Duration           │ 0:02:34     │
│ Success Rate       │ 97.1%       │
└─────────────────────┴─────────────┘

🤖 Interactive Mode

The unified agent includes an interactive mode for exploratory scraping:

python fire_agent.py --operation interactive

Features:

Choose operation type - Scrape, crawl, map, or extract
Input URLs manually - Or load from file
Configure parameters - Set limits and options
Real-time feedback - See results immediately
Chain operations - Map → Scrape discovered URLs

💡 Tips & Best Practices

1. Start Small

Begin with a few URLs to test configuration:

# Test with first 5 URLs
head -5 your_urls.txt > test_urls.txt
python fire_agent.py --urls-file test_urls.txt

2. Optimize Concurrency

Balance speed vs. politeness:

Conservative: 2-3 concurrent requests
Moderate: 5-8 concurrent requests
Aggressive: 10+ concurrent requests (use carefully)

3. Content Quality

Choose appropriate formats:

Markdown: Best for LLM processing
HTML: Preserve original formatting
Metadata: Extract page information

4. Large Datasets

For large URL lists:

Use batch processing
Monitor memory usage
Enable individual file saving
Set appropriate delays

5. API Limits

Firecrawl has usage limits:

Monitor your usage
Implement proper delays
Handle rate limiting gracefully

🔗 Firecrawl Features Used

This agent leverages Firecrawl's full capability set:

Scraping - scrape_url() for individual pages
Crawling - crawl_url() for recursive site scraping
Mapping - map_url() for URL discovery
Actions - JavaScript interactions
Extraction - LLM-powered structured data
Formats - Multiple output formats

🚨 Troubleshooting

Common Issues

1. API Key Error

❌ FIRECRAWL_API_KEY not found

Solution: Set your API key in .env file

2. Import Errors

❌ firecrawl-py not installed

Solution: Run pip install -r requirements.txt

3. File Not Found

❌ File your_urls.txt not found

Solution: Ensure your URLs file exists and has correct name

4. Empty Results

⚠️ No content returned for URL

Solution: Check URL accessibility and Firecrawl limits

Debug Mode

Enable verbose logging:

python fire_agent.py --verbose

Rate Limiting

If hitting rate limits:

DELAY_BETWEEN_REQUESTS=3.0
MAX_CONCURRENT_REQUESTS=2

🔄 Updates & Maintenance

Keep your agent updated with the latest Firecrawl v1 features:

Update dependencies:

pip install --upgrade firecrawl-py rich python-dotenv pandas

Check Firecrawl status: Visit Firecrawl Status
Monitor API usage: Check your Firecrawl Dashboard
Latest v1 features: Mobile scraping, advanced actions, enhanced iframes

🧪 Testing & Quality Assurance

The FIRE-1 Agent includes a comprehensive test suite with 100% passing tests:

Test Categories

Unit Tests (12 tests) - Core functionality, configuration, file operations
Integration Tests (15 tests) - Complete workflows, error recovery, feature integration
Scraping Operations (12 tests) - Scrape, crawl, map, extract operations
Retry Logic (8 tests) - Exponential backoff, error handling, concurrency
Gemini Scraper (6 tests) - Specialized documentation scraping workflows

Running Tests

Quick test (unit tests only):

python scripts/run_tests.py --mode quick

Full test suite:

python scripts/run_tests.py --mode all

Specific test categories:

python scripts/run_tests.py --mode unit        # Unit tests only
python scripts/run_tests.py --mode integration # Integration tests
python scripts/run_tests.py --mode retry       # Retry logic tests
python scripts/run_tests.py --mode gemini      # Gemini-specific tests

Real API tests (use sparingly):

pytest tests/test_real_integration.py -m real_api -v

Test Coverage

53 passing tests with 98% success rate
Professional mocking for external APIs
Real file operations testing
Async/await support with proper fixtures
Coverage reporting in HTML and XML formats

View coverage report:

# After running tests
open htmlcov/index.html  # View detailed coverage report

Test Infrastructure

pytest with async support
Rich output with colored progress
HTML reports for CI/CD integration
Parallel execution with pytest-xdist
Factory patterns for test data generation
Comprehensive mocking framework

The test suite ensures reliability and catches regressions, making FIRE-1 Agent production-ready.

📝 License

This project is open source. Use it freely for your scraping and agentic framework needs.

🤝 Contributing

Contributions welcome! Areas for improvement:

Additional agentic framework integrations
Enhanced mobile device profiles
Advanced action sequences
Performance optimizations
More structured extraction schemas

Happy Scraping with FIRE-1 Agent v2.0! 🔥

Optimized for Firecrawl v1 API • Ready for Agentic Frameworks • Enterprise-Grade Performance

For questions or issues, please check the Firecrawl documentation or create an issue in this repository.

Quick Start

Clone the repository

git clone https://github.com/bubroz/fire-1-agent

Install dependencies

cd fire-1-agent
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownerbubroz

Repofire-1-agent

LanguagePython

License-

Last fetched8/10/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation

fire 1 agent

Repository Info

About This Server

Documentation

🔥 FIRE-1 Agent

🤖 FIRE-1 Agent vs. Firecrawl MCP: Better Together

🛠️ Firecrawl MCP (AI Integration Layer)

🚀 FIRE-1 Agent (Advanced Framework)

🔥 FIRE-1 MCP Server ✅ NOW AVAILABLE!

💡 Perfect Integration

📁 Project Structure

✨ Features

🆕 Latest Firecrawl v1 Features

Core Capabilities

Advanced Features

Output Formats

🚀 Quick Start

1. Installation

2. Configuration

3. Prepare URLs

4. Run the Agent

📖 Usage Guide

Basic Agent (fire_agent.py)

Specialized Scrapers (Examples)

Additional Utility Scripts (scripts/)

Configuration Options

🎯 Use Cases

1. Design Inspiration Scraping

2. General Documentation Scraping

3. Website Content Analysis

4. URL Discovery

5. Structured Data Extraction

6. Agentic Framework Integration

📁 Output Structure

File Types

🔧 Advanced Configuration

Custom Actions

Structured Data Extraction

Rate Limiting

🛡️ Error Handling & Retry Logic

Retry Configuration

How Retries Work

Error Types Handled

Example Retry Flow

📊 Monitoring & Analytics

🤖 Interactive Mode

💡 Tips & Best Practices

1. Start Small

2. Optimize Concurrency

3. Content Quality

4. Large Datasets

5. API Limits

🔗 Firecrawl Features Used

🚨 Troubleshooting

Common Issues

Debug Mode

Rate Limiting

🔄 Updates & Maintenance

🧪 Testing & Quality Assurance

Test Categories

Running Tests

Test Coverage

Test Infrastructure

📝 License

🤝 Contributing

Quick Start

Clone the repository

Install dependencies

Follow the documentation

Repository Details

Quick Links

Recommended MCP Servers

Discord MCP

Knit MCP

Apify MCP Server

BrowserStack MCP

Zapier MCP

Basic Agent (`fire_agent.py`)

Additional Utility Scripts (`scripts/`)