cmakafui
MCP Servercmakafuipublic

vtt challenge

一个基于AI的管道,使用FAISS语义搜索和GPT-4.1-mini将VTT的碎片化创新数据转化为知识图谱。

Repository Info

6
Stars
3
Forks
6
Watchers
0
Issues
Jupyter Notebook
Language
-
License

About This Server

一个基于AI的管道,使用FAISS语义搜索和GPT-4.1-mini将VTT的碎片化创新数据转化为知识图谱。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

VTT Innovation Disambiguation Pipeline

🏆 Hackathon Solution: VTT Innovation De-duplication & Aggregation Challenge

An AI-powered pipeline that transforms VTT's fragmented innovation data into a clean, canonical knowledge graph using FAISS semantic search, GPT-4.1-mini, and intelligent graph curation.

The Problem

VTT's innovation data exists as fragmented mentions across multiple sources, making it impossible to get accurate portfolio overviews or track true collaboration patterns.

Our Solution

Three-stage automated pipeline:

  1. FAISS Semantic Analysis - Identify potential duplicates using context-aware similarity
  2. LLM Agent + MCP Server - GPT-4.1-mini makes intelligent SAME/DIFFERENT decisions
  3. Canonical Graph - Clean Memgraph database with complete provenance

Quick Demo

cd pipeline
# Install dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
curl -fsSL https://bun.sh/install | bash
uv sync && bun install

# Start services
docker compose up -d
uv run fastmcp run mcp/innovation_entity_server.py --transport sse --port 9000

# Run pipeline
uv run python scripts/innovations_analysis.py analyze
bun run agents/innovation-curator-agent.ts results.json

Key Innovation

Context-aware AI curation that preserves complete audit trails while creating canonical entities. Our "thick edge" graph architecture maintains every original mention with full provenance.

Results

  • Eliminates innovation mention duplicates while preserving all source context
  • Enables semantic discovery and accurate collaboration analysis
  • Production-ready automated pipeline replacing manual processes

https://github.com/user-attachments/assets/f7600076-04a1-4dc7-8930-84706030be01

Documentation

📋 Complete Solution Details - Full technical architecture and hackathon submission

🚀 Pipeline Usage Guide - Detailed setup and usage instructions

📋 Jupyter Notebook EDA - Exploratory Data Analysis of the candidates

Services

  • Memgraph Database: localhost:7687 (Web UI: localhost:3000)
  • MCP Server: localhost:9000
  • Pipeline Scripts: Python + TypeScript automation

Technology Stack

  • AI: Azure OpenAI (GPT-4.1, text-embedding-3-large)
  • Search: FAISS vector similarity
  • Graph: Memgraph with native vector search
  • Orchestration: Model Context Protocol (MCP)
  • Languages: Python (uv), TypeScript (Bun)

Built for the AaltoAI Hackathon 2025

Quick Start

1

Clone the repository

git clone https://github.com/cmakafui/vtt-challenge
2

Install dependencies

cd vtt-challenge
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownercmakafui
Repovtt-challenge
LanguageJupyter Notebook
License-
Last fetched8/10/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation