
md webcrawl mcp
一个基于 Python 的 MCP 协议网络爬虫,用于提取和保存网页内容。
Repository Info
About This Server
一个基于 Python 的 MCP 协议网络爬虫,用于提取和保存网页内容。
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
git clone https://github.com/yourusername/webcrawler.git
cd webcrawler
- Install dependencies:
pip install -r requirements.txt
- Optional: Configure environment variables:
export OUTPUT_PATH=./output # Set your preferred output directory
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH: Default output directory for saved filesMAX_CONCURRENT_REQUESTS: Maximum parallel requests (default: 5)REQUEST_TIMEOUT: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
"Crawl Server": {
"command": "fastmcp",
"args": [
"run",
"/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
],
"env": {
"OUTPUT_PATH": "/Users/user/Webcrawl"
}
Development
Live Development
fastmcp dev server.py --with-editable .
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
mcp call extract_content --url "https://example.com" --output_path "example.md"
Example 2: Create Content Index
mcp call scan_linked_content --url "https://example.com" | \
mcp call create_index --content_map - --output_path "index.md"
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
Quick Start
Clone the repository
git clone https://github.com/jmh108/md-webcrawl-mcpInstall dependencies
cd md-webcrawl-mcp
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.