
vaadin documentation services
A complete solution for ingesting, indexing, and retrieving Vaadin documentation through semantic search.
Repository Info
About This Server
A complete solution for ingesting, indexing, and retrieving Vaadin documentation through semantic search.
Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.
Documentation
Vaadin Documentation RAG Service
A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.
🎯 Project Overview
This project provides an advanced RAG system with enhanced hybrid search that:
- Understands Hierarchical Structure: Navigates parent-child relationships within and across documentation files
- Enhanced Hybrid Search: Combines semantic and intelligent keyword search with native Pinecone reranking for superior relevance
- Framework Filtering: Intelligently filters content for Vaadin Flow (Java) vs Hilla (React) frameworks
- Agent-Friendly: Provides MCP (Model Context Protocol) server for seamless IDE assistant integration
- Production Ready: Clean architecture with dependency injection, comprehensive testing, and error handling
🏗️ Architecture
vaadin-documentation-services/
├── packages/
│ ├── core-types/ # Shared TypeScript interfaces
│ ├── 1-asciidoc-converter/ # AsciiDoc → Markdown + metadata extraction
│ ├── 2-embedding-generator/ # Markdown → Vector database with hierarchical chunking
│ ├── rest-server/ # Enhanced REST API with hybrid search + reranking
│ └── mcp-server/ # MCP server with hierarchical navigation
├── package.json # Bun workspace configuration
└── PROJECT_PLAN.md # Complete project documentation
Data Flow
flowchart TD
subgraph "Step 1: Documentation Processing"
VaadinDocs["📚 Vaadin Docs<br/>(AsciiDoc)"]
Converter["🔄 AsciiDoc Converter<br/>• Framework detection<br/>• URL generation<br/>• Markdown output"]
Processor["⚡ Embedding Generator<br/>• Hierarchical chunking<br/>• Parent-child relationships<br/>• OpenAI embeddings"]
end
subgraph "Step 2: Enhanced Retrieval"
Pinecone["🗄️ Pinecone Vector DB<br/>• Rich metadata<br/>• Hierarchical relationships<br/>• Framework tags"]
RestAPI["🌐 REST API<br/>• Enhanced hybrid search<br/>• Native Pinecone reranking<br/>• Framework filtering"]
end
subgraph "Step 3: Agent Integration"
MCP["🤖 MCP Server<br/>• search_vaadin_docs<br/>• get_full_document<br/>• Full document retrieval"]
IDEs["💻 IDE Assistants<br/>• Context-aware search<br/>• Hierarchical exploration<br/>• Framework-specific help"]
end
VaadinDocs --> Converter
Converter --> Processor
Processor --> Pinecone
Pinecone <--> RestAPI
RestAPI <--> MCP
MCP <--> IDEs
classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px
class VaadinDocs,Converter,Processor processing
class Pinecone,RestAPI storage
class MCP api
class IDEs agent
✨ Key Features
🔍 Intelligent Search
- Enhanced Hybrid Search: Combines semantic similarity with intelligent keyword extraction and scoring
- Native Pinecone Reranking: Uses Pinecone's bge-reranker-v2-m3 for optimal result ranking
- Framework Awareness: Filters Flow vs Hilla content with common content inclusion
- Query Preprocessing: Smart keyword extraction with stopword filtering for better search quality
🌳 Hierarchical Navigation
- Parent-Child Relationships: Navigate from specific details to broader context
- Cross-File Links: Understand relationships between different documentation files
- Context Breadcrumbs: Maintain navigation context for better user experience
🎛️ Developer Experience
- MCP Integration: Standardized protocol for IDE assistant integration
- TypeScript: Full type safety across all packages
- Comprehensive Testing: Unit tests, integration tests, and hierarchical workflow validation
- Clean Architecture: Dependency injection and interface-based design
🚀 Quick Start
Prerequisites
- Bun runtime
- OpenAI API key (for embeddings)
- Pinecone API key and index
Installation
# Clone and install dependencies
git clone https://github.com/vaadin/vaadin-documentation-services
cd vaadin-documentation-services
bun install
Environment Setup
# Create .env file with your API keys
echo "OPENAI_API_KEY=your_openai_api_key" > .env
echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env
echo "PINECONE_INDEX=your_pinecone_index" >> .env
Running the System
1. Process Documentation (One-time setup)
# Convert AsciiDoc to Markdown with metadata
cd packages/1-asciidoc-converter
bun run convert
# Generate embeddings and populate vector database
cd ../2-embedding-generator
bun run generate
2. Start REST API Server
cd packages/rest-server
bun run start
# Server runs at http://localhost:3001
3. Use MCP Server with IDE Assistant
The MCP server is deployed and available remotely via HTTP transport at:
https://vaadin-mcp.fly.dev/mcp
Configure your IDE assistant to use the Streamable HTTP transport:
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const transport = new StreamableHTTPClientTransport(
new URL("https://vaadin-mcp.fly.dev/mcp")
);
📦 Package Details
Core Types (packages/core-types/)
Shared TypeScript interfaces used across all packages:
DocumentChunk: Core documentation chunk structureRetrievalResult: Search result with relevance scoringFramework: Type-safe framework definitions
AsciiDoc Converter (packages/1-asciidoc-converter/)
Converts Vaadin AsciiDoc documentation to Markdown with metadata:
- Framework Detection: Automatically detects Flow/Hilla/common content
- URL Generation: Creates proper Vaadin.com documentation links
- Include Processing: Handles AsciiDoc include directives
- Metadata Extraction: Preserves semantic information in frontmatter
cd packages/1-asciidoc-converter
bun run convert # Convert all documentation
bun run test # Run framework detection tests
Embedding Generator (packages/2-embedding-generator/)
Creates vector embeddings with hierarchical relationships:
- Hierarchical Chunking: Preserves document structure and relationships
- Parent-Child Links: Creates cross-file and intra-file relationship mapping
- LangChain Integration: Uses MarkdownHeaderTextSplitter for intelligent chunking
- Batch Processing: Efficient embedding generation and Pinecone upsertion
cd packages/2-embedding-generator
bun run generate # Generate embeddings from Markdown
bun run test # Run chunking and relationship tests
REST Server (packages/rest-server/)
Enhanced API server with hybrid search capabilities:
- Hybrid Search: Semantic + keyword search with RRF fusion
- Framework Filtering: Flow/Hilla/common content filtering
- Document Navigation:
/chunk/:chunkIdendpoint for parent-child navigation - Backward Compatibility: Maintains existing API contracts
cd packages/rest-server
bun run start # Start production server
bun run test # Run comprehensive test suite
bun run test:verbose # Detailed test output
API Endpoints:
POST /search- Hybrid search with framework filteringGET /chunk/:chunkId- Retrieve specific document chunkPOST /ask- AI-generated answers (with streaming support)GET /health- Health checkGET /vaadin-version- Get latest Vaadin version from GitHub releases
MCP Server (packages/mcp-server/)
Model Context Protocol server for IDE assistant integration:
- Document Tools:
search_vaadin_docsandget_full_document - Full Document Retrieval: Complete documentation pages with context
- Framework Awareness: Intelligent framework detection and filtering
- Error Handling: Graceful degradation for missing content
cd packages/mcp-server
bun run build # Build for distribution
bun run test # Run document-based tests
Available Tools:
search_vaadin_docs: Search with semantic and keyword matchingget_full_document: Retrieve complete documentation pagesget_vaadin_version: Get latest Vaadin version and release timestamp
🧪 Testing
Each package includes comprehensive test suites:
# Test individual packages
cd packages/1-asciidoc-converter && bun run test
cd packages/2-embedding-generator && bun run test
cd packages/rest-server && bun run test
cd packages/mcp-server && bun run test
# Run REST server against live endpoint
cd packages/rest-server && bun run test:server
📈 Performance & Metrics
Search Quality
- 100% Framework Detection Accuracy: Flow, Hilla, and common content correctly identified
- Enhanced Hybrid Search: Semantic + keyword search with native Pinecone reranking dramatically improves relevance
- Contextual Navigation: Parent-child relationships enable better result exploration
- 4,982 Document Chunks: Complete coverage of 378 Vaadin documentation files with 5-level hierarchy
System Performance
- Parallel Processing: Semantic and keyword search executed in parallel with intelligent merging
- Native Reranking: Pinecone's bge-reranker-v2-m3 provides superior result ranking
- Query Preprocessing: Smart keyword extraction with stopword filtering improves search quality
- Efficient Chunking: Optimized token limits with intelligent content splitting
- Clean Architecture: Dependency injection enables easy performance optimization
Production Readiness
- 100% API Backward Compatibility: All existing integrations continue to work
- Robust Error Handling: Graceful fallbacks ensure system reliability
- Fresh Data: Recently updated with complete Vaadin documentation coverage
🌐 Deployment
REST Server (fly.io)
The REST server is deployed to fly.io and available at:
- Production:
https://vaadin-docs-search.fly.dev - Health Check:
https://vaadin-docs-search.fly.dev/health
MCP Server (fly.io)
The MCP server is deployed to fly.io and available at:
- Production:
https://vaadin-mcp.fly.dev/mcp - Health Check:
https://vaadin-mcp.fly.dev/health
Documentation Processing
Automated via GitHub Actions:
- Daily Updates: Documentation re-processed automatically
- Manual Triggers: Can be triggered via GitHub Actions UI
- Error Notifications: Automated alerts for processing failures
🔧 Development
Workspace Structure
This project uses Bun workspaces for package management:
bun install # Install all dependencies
bun run build # Build all packages
bun run test # Test all packages
Adding New Features
- Core Types: Add interfaces to
packages/core-types/ - Processing: Extend converters in
packages/1-asciidoc-converter/orpackages/2-embedding-generator/ - API: Enhance search in
packages/rest-server/ - Integration: Update MCP tools in
packages/mcp-server/
Architecture Principles
- Single Responsibility: Each package has a clear, focused purpose
- Interface-Based Design: Clean contracts between components
- Dependency Injection: Testable and swappable implementations
- Type Safety: Full TypeScript coverage with strict configuration
📚 Documentation
- Project Plan: Complete project breakdown and progress tracking
- Project Brief: Original requirements and problem definition
- Package READMEs: Detailed documentation for each package
🏆 Project Success
This project successfully delivered:
✅ Sophisticated RAG System: Replaced naive implementation with hierarchically-aware search
✅ Enhanced User Experience: Agents can now navigate from specific details to broader context
✅ Production Quality: Clean architecture, comprehensive testing, and error handling
✅ Framework Intelligence: Accurate Flow/Hilla content separation with common content inclusion
✅ Developer Integration: Seamless IDE assistant integration via MCP protocol
The system now provides intelligent, context-aware documentation search that understands the hierarchical structure of Vaadin documentation and enables sophisticated agent interactions.
📄 License
MIT - See license file for details.
Built with ❤️ for the Vaadin developer community
Quick Start
Clone the repository
git clone https://github.com/marcushellberg/vaadin-documentation-servicesInstall dependencies
cd vaadin-documentation-services
npm installFollow the documentation
Check the repository's README.md file for specific installation and usage instructions.
Repository Details
Recommended MCP Servers
Discord MCP
Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.
Knit MCP
Connect AI agents to 200+ SaaS applications and automate workflows.
Apify MCP Server
Deploy and interact with Apify actors for web scraping and data extraction.
BrowserStack MCP
BrowserStack MCP Server for automated testing across multiple browsers.
Zapier MCP
A Zapier server that provides automation capabilities for various apps.