jmh108
MCP Serverjmh108public

md webcrawl mcp

一个基于 Python 的 MCP 协议网络爬虫,用于提取和保存网页内容。

Repository Info

3
Stars
5
Forks
3
Watchers
2
Issues
Python
Language
MIT License
License

About This Server

一个基于 Python 的 MCP 协议网络爬虫,用于提取和保存网页内容。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

MD MCP Webcrawler Project

A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.

Features

  • Extract website content and save as markdown files
  • Map website structure and links
  • Batch processing of multiple URLs
  • Configurable output directory

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/webcrawler.git
cd webcrawler
  1. Install dependencies:
pip install -r requirements.txt
  1. Optional: Configure environment variables:
export OUTPUT_PATH=./output  # Set your preferred output directory

Output

Crawled content is saved in markdown format in the specified output directory.

Configuration

The server can be configured through environment variables:

  • OUTPUT_PATH: Default output directory for saved files
  • MAX_CONCURRENT_REQUESTS: Maximum parallel requests (default: 5)
  • REQUEST_TIMEOUT: Request timeout in seconds (default: 30)

Claude Set-Up

Install with FastMCP fastmcp install server.py

or user custom settings to run with fastmcp directly

"Crawl Server": {
      "command": "fastmcp",
      "args": [
        "run",
        "/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
      ],
      "env": {
        "OUTPUT_PATH": "/Users/user/Webcrawl"
      }

Development

Live Development

fastmcp dev server.py --with-editable .

Debug

It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging

Examples

Example 1: Extract and Save Content

mcp call extract_content --url "https://example.com" --output_path "example.md"

Example 2: Create Content Index

mcp call scan_linked_content --url "https://example.com" | \
  mcp call create_index --content_map - --output_path "index.md"

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Requirements

  • Python 3.7+
  • FastMCP (uv pip install fastmcp)
  • Dependencies listed in requirements.txt

Quick Start

1

Clone the repository

git clone https://github.com/jmh108/md-webcrawl-mcp
2

Install dependencies

cd md-webcrawl-mcp
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownerjmh108
Repomd-webcrawl-mcp
LanguagePython
LicenseMIT License
Last fetched8/10/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation