estevaom
MCP Serverestevaompublic

md rag mcp

RAG to index md files accessible via MCP

Repository Info

8
Stars
0
Forks
8
Watchers
0
Issues
Python
Language
-
License

About This Server

RAG to index md files accessible via MCP

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

Journal RAG System

A powerful, privacy-focused journal system with AI-powered semantic search and metadata analysis. This system allows you to query your personal journal entries using natural language and analyze patterns in your mood, habits, and thoughts over time.

✨ Features

  • 🔍 Semantic Search: Query your journal entries using natural language
  • 📊 Frontmatter Analysis: Track mood, anxiety, weight, and custom metrics over time
  • 🤖 AI Agent Integration: Connect to AI assistants like Rovo Dev, VS Code, etc.
  • 🏠 Local Processing: Journal files and search indexing stay on your machine
  • 🚀 GPU Acceleration: Optional NVIDIA GPU support for faster embeddings
  • 📱 Cross-Platform: macOS, Ubuntu/Debian, Arch Linux, and Windows/WSL with automated setup

🚀 Quick Start

1. Choose Your Setup Method

macOS (Automated):

./setup_mac_environment.sh

Ubuntu/Debian (Automated):

./setup_ubuntu_environment.sh

Arch Linux (Automated):

./setup_arch_environment.sh

Windows/WSL (Automated):

# After setting up WSL with Ubuntu, run:
./setup_ubuntu_environment.sh

For detailed Windows/WSL setup, see setup_win_instructions.md.

2. Configure MCP Servers

  1. Copy the template: cp mcp.json.template mcp.json
  2. Replace ${PROJECT_ROOT} with your actual project path
  3. Configure your AI agent to use the MCP servers

3. Start Journaling

Add entries to the journal/ directory:

journal/2025/01/01.md
journal/2025/01/02.md

With optional frontmatter for tracking:

---
date: 2025-01-01
mood: 7
anxiety: 3
weight_kg: 70
---

# Today's thoughts
Your journal content here...

📁 Directory Structure

md-rag-mcp/
├── README.md                           # This file
├── setup_mac_environment.sh           # macOS automated setup
├── setup_ubuntu_environment.sh        # Ubuntu/Debian automated setup
├── setup_arch_environment.sh          # Arch Linux automated setup
├── setup_win_instructions.md          # Windows/WSL setup guide
├── install_instructions.md            # General setup guide
├── mcp.json.template                  # MCP configuration template
├── journal/                           # Your journal entries
│   ├── 2025/01/01.md                 # Daily entries
│   └── topics/                       # Topic-based entries
└── .tech/                            # Technical components
    ├── code/
    │   ├── mcp/                      # MCP servers
    │   │   ├── journal_rag_mcp.py    # Semantic search server
    │   │   ├── frontmatter_mcp.py    # Metadata analysis server
    │   │   ├── requirements.txt      # Python dependencies
    │   │   └── .venv/               # Virtual environment
    │   └── scripts/
    │       └── rag_search.py         # Standalone search script
    ├── data/                         # Generated data
    │   └── chroma_db/               # Vector database
    └── docs/                        # Additional documentation

🔧 System Requirements

Minimum Requirements

  • Python 3.8+
  • 4GB RAM
  • 2GB free disk space
  • NVIDIA GPU with CUDA support
  • 8GB+ GPU VRAM for large embedding models
  • 16GB+ system RAM

Supported Platforms

  • macOS (Intel & Apple Silicon) - Automated setup
  • Ubuntu/Debian (including WSL) - Automated setup
  • Arch Linux - Automated setup
  • Windows (via WSL2 with Ubuntu) - Automated setup

🎯 Usage

Query your journal entries using natural language:

# Via standalone script
source .tech/code/mcp/.venv/bin/activate
python .tech/code/scripts/rag_search.py

# Via AI agent (after MCP setup)
"What did I write about productivity last month?"
"Show me entries where I felt anxious"

Frontmatter Analysis

Analyze patterns in your journal metadata:

# Via AI agent (after MCP setup)
"What's my average mood this month?"
"Show me my weight trend over the last 3 months"
"Plot my anxiety levels vs sleep quality"

🤖 AI Agent Integration

Rovo Dev is a free AI agent built by Atlassian that works excellently with this journal system.

Setup Steps:

  1. Install Rovo Dev CLI: Follow the complete setup guide at the official Atlassian Community post
  2. Configure MCP Integration:
    # Open Rovo Dev's MCP configuration file
    acli rovodev mcp
    
    # Copy the contents from mcp.json.template
    # Replace ${PROJECT_ROOT} with your actual path, e.g.:
    # /Users/yourname/Documents/md-rag-mcp
    
  3. Create .agent.md: Create a custom persona that defines how Rovo Dev should interact with your journal. This persona can help with thoughtful questioning, pattern recognition across entries, balanced reflection prompts, and contextual analysis of your journaling themes and personal growth over time.
  4. Start Rovo Dev:
    acli rovodev
    
  5. Start Querying: Try commands like:
    • "What did I write about yesterday?"
    • "Show me my mood trends this month"
    • "Find entries where I mentioned productivity"

Other Agents

Cloud-based options:

  • Cursor: MCP-compatible, all conversations sent to cloud
  • GitHub Copilot: If MCP-compatible, cloud-based

Local privacy options:

  • Cline (VS Code extension): Can connect to local LLMs (Ollama, LM Studio)
  • Roo Code (VS Code extension): Can connect to local LLMs for complete privacy

Any MCP-compatible AI agent can connect using the provided configuration.

📊 Frontmatter Fields

Track various metrics in your journal entries:

---
date: 2025-01-01          # Entry date (required)
mood: 7                   # 1-10 scale
anxiety: 3                # 1-10 scale  
energy: 8                 # 1-10 scale
sleep_hours: 7.5          # Hours of sleep
weight_kg: 70             # Weight in kg
exercise_minutes: 30      # Exercise duration
weather: "sunny"          # Weather description
tags: ["work", "family"]  # Custom tags
---

You can customize these fields based on what you want to track.

🔍 Advanced Features

GPU Acceleration

If you have an NVIDIA GPU, the system will automatically use it for faster embedding generation:

# Check GPU status
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"

Backup and Sync

Your journal is just markdown files - sync with any service:

  • Git repositories
  • Cloud storage (Dropbox, iCloud, etc.)
  • Network drives

📚 Documentation

  • Setup Instructions: install_instructions.md
  • Windows Setup: setup_win_instructions.md
  • Migration Plan: MIGRATION_PLAN.md (development)

🤝 Contributing

This is an open-source project! Contributions welcome:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

📝 License

This project is open source. See LICENSE file for details.

🆘 Support

Common Issues

"Module not found" errors:

  • Ensure virtual environment is activated
  • Verify requirements are installed in correct venv

MCP connection issues:

  • Check paths in mcp.json are absolute, not relative
  • Verify Python executable path is correct

Slow performance:

  • Check if GPU acceleration is working
  • Consider using a smaller embedding model
  • Ensure sufficient RAM available

Getting Help

  1. Check the troubleshooting section in install_instructions.md
  2. Review the setup guides for your platform
  3. Open an issue on GitHub with system details

🔒 Privacy & Data Considerations

What stays private (always local):

  • Your journal files and content
  • RAG embeddings and search indexing
  • Frontmatter analysis and statistics
  • All file processing and storage

What gets exposed with cloud-based AI agents (Rovo Dev, Cursor - both 100% cloud-based):

  • Your conversations about journal content
  • AI-generated insights and analysis
  • Personal reflections and patterns discussed
  • Any journal excerpts shared during conversations

For complete privacy: Use VS Code extensions like Cline or Roo Code with local LLMs (Ollama, LM Studio, etc.). These extensions can connect to local models while keeping all conversations on your machine.

Bottom line: Your raw journal data stays local, but if you use cloud AI agents to analyze it, those conversations are not private.

Quick Start

1

Clone the repository

git clone https://github.com/estevaom/md-rag-mcp
2

Install dependencies

cd md-rag-mcp
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownerestevaom
Repomd-rag-mcp
LanguagePython
License-
Last fetched8/10/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation