
mcp browser_use study
通过 LangChain 和 browser_use.Agent 展示运行本地语言模型及实现 Web 自动化的项目。
Repository Info
About This Server
通过 LangChain 和 browser_use.Agent 展示运行本地语言模型及实现 Web 自动化的项目。
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
MCP Browser Use Study
A project demonstrating different ways to run local language models using various frameworks via LangChain, including web automation tasks with browser_use.Agent.
Features
- Run local LLMs using different backends:
- Use Case 1 (uc1): Hugging Face Transformers (
hf), CTransformers (ctransformers), LlamaCpp (llama_cpp). (Note:browser_use.Agentnot yet integrated). - Use Case 2 (uc2): GPT4All with
browser_use.Agent. (Currently experiencing issues, see Known Issues). - Use Case 3 (uc3): Ollama with
browser_use.Agentfor general web automation tasks. (Working) - Use Case 4 (uc4): Ollama with
browser_use.Agentspecifically for attempting Google Login usingsensitive_dataandinitial_actions. (Experimental)
- Use Case 1 (uc1): Hugging Face Transformers (
- Command-line interface to select the desired use case and backend (for uc1).
Prerequisites
-
Python >= 3.12
-
uv (recommended for faster dependency management) or pip.
-
browser_uselibrary and its dependencies (including Playwright browsers: runplaywright installafter installing Python packages). -
For Use Case 1 (
hfrunner): Requirestorch(included inpyproject.toml). -
For Use Case 1 (
ctransformersrunner): Requiresctransformers(included). GGUF model file will be automatically downloaded. -
For Use Case 1 (
llama_cpprunner):- Requires
llama-cpp-python(included). May require C++ build tools during installation. - The specific GGUF model file must already exist at the path specified in
src/uc1_local_hf/run_llama_cpp.py.
- Requires
-
For Use Case 2 (GPT4All with Agent):
- Requires
gpt4all(included). - A suitable GGUF model file must be correctly specified in
src/uc2_gpt4all/uc2.py. - Note:
browser_use.Agentintegration currently has issues with common GPT4All models for complex tasks (see Known Issues).
- Requires
-
For Use Case 3 (Ollama with Agent):
- Requires
langchain-ollamaandbrowser_use(included). - Ollama server must be running locally.
- A capable model (e.g.,
qwen2:7b-instructormistral) must be pulled viaollama pull <model_name>. - To use your existing Chrome logins/sessions with
uc3(or potentiallyuc4), see the "Using a non standard profile" section below.
- Requires
-
For Use Case 4 (Google Login with Agent - Experimental):
- Ollama server must be running locally.
- A highly capable model (e.g.,
qwen2:7b-instructor larger) must be pulled viaollama pull <model_name>. - Crucially: You must set the
GOOGLE_IDandGOOGLE_PASSWORDenvironment variables with your Google credentials before runninguc4. - A local Chrome installation path must be detectable or set via the
CHROME_EXECUTABLE_PATHenvironment variable. - Note: Google login flows are complex and change frequently. This use case might require adjustments or fail due to CAPTCHAs, 2FA, or updated UI elements.
Using a non standard profile
To allow uc3 to leverage existing browser sessions or observe the process in a dedicated profile, you can launch Chrome with a specific remote debugging port and a non standard user data directory. The browser_use agent will then connect to this pre-launched instance via its CDP (Chrome DevTools Protocol) URL.
Steps:
-
Close an existing Chrome instances with the same remote debugging port: This is crucial to avoid conflicts.
-
Launch Chrome with Remote Debugging: Open your terminal and run the command appropriate for your OS. You might need to adjust the path to
chrome.exeorgoogle-chromeif it's installed in a non-standard location. Using a non-standard profile directory (e.g.,C:\tmp\chrome_debugor~/tmp/chrome_debug) is essential since Chrome 136. (see Changes to remote debugging switches to improve security). If it is set to the default Chrome data directory, it is ignored and the debug port is not working. Do not use your default Chrome profile directory.- Windows:
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222 --user-data-dir="C:\tmp\chrome_debug_uc3" - macOS:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir="$HOME/ tmp/chrome_debug_uc3" - Linux:
google-chrome --remote-debugging-port=9222 --user-data-dir="$HOME/tmp/chrome_debug_uc3"
- Windows:
-
Log in to Services: In the newly opened Chrome window (launched by the command above), manually navigate to any websites you need the agent to access (e.g., Gmail, Google) and log in. Perform any necessary 2FA steps.
-
Keep this Chrome window open.
-
Run the Use Case: The script (
uc3.py) is configured to attempt connection tohttp://127.0.0.1:9222by default (can be overridden byCHROME_CDP_URLenv var).Fallback to Cookies: If
uc3cannot connect via CDP, it will attempt to fall back to usingcookies.json(if generated byconf1).
Known Issues
- UC2 (GPT4All with Agent): The
browser_use.Agentcurrently fails to complete tasks reliably with tested GPT4All models, likely due to model limitations. - UC4 (Google Login): Google login automation is inherently fragile. Changes in Google's UI, security measures (like CAPTCHAs, device verification), or model limitations can easily cause this to fail. It serves as an experimental demonstration of
sensitive_dataandinitial_actions.
Installation
-
Clone the repository:
git clone <your-repository-url> cd mcp-browser-use-study -
Create and activate a virtual environment:
- Using
uv:uv venv source .venv/bin/activate # Linux/macOS / Git Bash # .\.venv\Scripts\activate # Windows PowerShell/CMD - Using standard
venv:python -m venv .venv source .venv/bin/activate # Linux/macOS / Git Bash # .venv\Scripts\activate.bat # Windows CMD # .\.venv\Scripts\Activate.ps1 # Windows PowerShell
- Using
-
Install dependencies:
uv pip install -e . # or for pip: # pip install -e . -
Install Playwright browser drivers:
playwright install
Usage
Run the use cases via the main.py script from the project root directory.
# Show help message
uv run main.py --help
# Run Use Case 1 (Local Hugging Face Transformers)
uv run main.py uc1 hf
# Run Use Case 1 (CTransformers with GGUF)
uv run main.py uc1 ctransformers
# Run Use Case 1 (LlamaCpp with GGUF)
uv run main.py uc1 llama_cpp
# Run Use Case 2 (GPT4All with Agent - experimental)
uv run main.py uc2
# Run Use Case 3 (Ollama with Agent)
uv run main.py uc3
# Run Use Case 4 (Ollama Google Login - experimental)
# Ensure GOOGLE_ID and GOOGLE_PASSWORD env vars are set!
uv run main.py uc4
- For
uc1, you need to specify the runner (hf,ctransformers, orllama_cpp) afteruc1. - Ensure prerequisites for the chosen use cases/runner are met (e.g., Ollama server running for
uc3/uc4, environment variables set foruc4).
Quick Start
Clone the repository
git clone https://github.com/humble92/mcp-browser_use-studyInstall dependencies
cd mcp-browser_use-study
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.