MCP Serveramirgutermanpublic

model eyes

ModelEyes: A structured UI representation protocol that enables AI models to "see" and interact with user interfaces without screenshots. Reduces context size by ~95% while providing semantically rich information about UI elements, their properties, and relationships. Supports web applications and Windows desktop integration.

Repository Info

Stars

Forks

Watchers

Issues

TypeScript

Language

MIT License

License

View on GitHubGitHub Download DocumentationDocs

About This Server

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

ModelEyes

GitHub Package License: MIT

ModelEyes is a Model Context Protocol (MCP) implementation that replaces traditional screenshot-based UI representation with a structured, efficient approach leveraging DOM parsing, HWND access, and UI element extraction.

Project Overview

The MCP serves as a bridge between AI models and desktop/web environments, providing:

Dramatic reduction in context size compared to screenshots (95% reduction target)
Improved responsiveness in AI interactions with UIs
Semantically rich information about UI states
Support for web applications and Windows desktop applications

Key Features

Core Features

Structured UI Representation: Captures UI elements with their properties and relationships
Differential Updates: Efficiently tracks UI changes with sophisticated diffing algorithms
Caching Mechanism: Improves performance by avoiding redundant processing
Element Filtering & Prioritization: Focuses on the most relevant UI elements
iframe Support: Captures elements within iframes in web applications
Token Optimization: Advanced token management for AI model integration

Installation

Quick Setup

For a quick setup and run, use one of the provided scripts:

Unix/Linux/macOS:

# Clone the repository
git clone https://github.com/yourusername/model-eyes.git
cd model-eyes

# Make the script executable
chmod +x setup-and-run.sh

# Run the setup script
./setup-and-run.sh

Windows:

# Clone the repository
git clone https://github.com/yourusername/model-eyes.git
cd model-eyes

# Run the setup script
setup-and-run.bat

Manual Setup

If you prefer to run commands manually:

# Clone the repository
git clone https://github.com/yourusername/model-eyes.git
cd model-eyes

# Install dependencies
npm install

# Build the project
npm run build

Usage

Web Client

import { createWebClient, createOpenAIServer } from 'model-eyes';

// Initialize the client and server
async function initializeMCP() {
  // Create a web client
  const client = await createWebClient();
  
  // Create an OpenAI server
  const server = await createOpenAIServer('your-api-key-here');
  
  // Capture the initial UI state
  const initialState = await client.captureState();
  
  // Process the initial state on the server
  server.processInitialState(initialState);
  
  // Subscribe to state changes
  client.subscribeToStateChanges((update) => {
    server.processStateUpdate(update);
  });
  
  // Prepare context for the model
  const context = server.prepareContextForModel();
  
  // Use the context with your AI model
  console.log(`Prepared context with ${context.tokenCount} tokens`);
  
  // Generate an action based on model output
  const modelOutput = '{"action": "click", "targetId": "button-submit"}';
  const action = server.generateAction(modelOutput);
  
  // Execute the action
  const result = await client.executeAction(action.type, action.targetId, action.data);
  
  if (result.success) {
    console.log('Action executed successfully');
  } else {
    console.error(`Action execution failed: ${result.error}`);
  }
}

initializeMCP().catch(console.error);

Windows Desktop Client

import { createWindowsClient, createOpenAIServer } from 'model-eyes';

// Initialize the client and server
async function initializeMCP() {
  // Create a Windows client
  const client = await createWindowsClient();
  
  // Create an OpenAI server
  const server = await createOpenAIServer('your-api-key-here');
  
  // Capture the initial UI state
  const initialState = await client.captureState();
  
  // Process the initial state on the server
  server.processInitialState(initialState);
  
  // Rest of the code is similar to the web client example
}

Core Components

Client-Side Components

Web Implementation: DOM traversal and element extraction
Desktop Implementation: Windows UI Automation framework integration
Common Processing: Caching, differential updates, filtering, and compression

Server-Side Components

Data Processing: Deserialization, hierarchy reconstruction, and validation
Model Integration: Context management and token optimization
Action Generation: Translating model outputs to precise element references

MCP Integration

ModelEyes can be used as an MCP (Model Context Protocol) server, allowing AI models to access UI state through a standardized protocol:

// Create and start an MCP server
const server = await createModelEyesMcpServer();

// Capture UI state from a web page
const state = await server.captureWebState('https://example.com');

// Access the UI state through MCP resources
// In an MCP client:
const uiState = await mcpClient.readResource('ui-state://current');

Running as an MCP Server

You can run ModelEyes as a standalone MCP server:

# Start the MCP server with stdio transport
npm run start:mcp-server

# Or run the example script
npm run start:mcp-example

Available MCP Resources

ui-state://current - Current UI state
ui-state://history/0 - Most recent historical UI state
ui-state://compressed - Compressed current UI state
ui-state://filtered - Filtered UI state with only interactable elements

Browser Extension

A Chrome extension is available for capturing UI states directly from web pages:

!ModelEyes Chrome Extension

The extension allows you to:

Capture UI states from any web page
Send them directly to a ModelEyes MCP server
Choose between different capture modes (full page, viewport, element)
Configure server settings

To install the extension in development mode:

Navigate to chrome://extensions/ in Chrome
Enable "Developer mode"
Click "Load unpacked" and select the extensions/chrome directory

For more details, see the extension README.

Available MCP Tools

captureWebState - Capture UI state from a web page
captureWindowsState - Capture UI state from a Windows desktop application
executeAction - Execute an action on a UI element
findElements - Find UI elements matching specific criteria

Project Structure

mcp-structured-ui/
├── docs/                 # Documentation and protocol specification
├── src/                  # Source code
│   ├── client/           # Client-side components
│   │   ├── web/          # Web browser implementation
│   │   └── desktop/      # Desktop application implementation
│   ├── server/           # Server-side components
│   └── common/           # Shared code and utilities
├── examples/             # Example implementations and usage
└── tests/                # Test cases and benchmarks

Performance Targets

95% reduction in data size compared to screenshot-based approaches
Client-side processing completed within 100ms
Differential updates processed within 50ms
End-to-end latency reduced by 70% compared to screenshot methods
High accuracy in element detection and interaction targeting

Performance Optimizations

ModelEyes includes several optimizations to maximize performance and minimize token usage:

Differential Updates

Instead of sending the entire UI state on every update, ModelEyes computes the difference between states and sends only the changes:

// Only the changes are transmitted
const update = {
  added: { "new-button-1": { /* element properties */ } },
  modified: { "text-field-1": { text: "Updated text" } },
  removed: ["old-element-1"]
};

Element Filtering

Configure which elements to include based on relevance:

const client = await createWebClient({
  filtering: {
    maxElements: 100,
    prioritizeInteractable: true,
    excludeTypes: ['script', 'style', 'meta'],
    includeTypes: ['button', 'input', 'a']
  }
});

Caching

Efficient caching reduces processing overhead:

// State and element caching is handled automatically
const state1 = await client.captureState();
// Later updates use caching for better performance
client.subscribeToStateChanges((update) => {
  // Cached elements are reused when possible
});

Token Management

Sophisticated token estimation and management for AI models:

// Prepare context with token optimization
const context = server.prepareContextForModel({
  maxTokens: 4000,
  includeFullElementDetails: false
});

// Check token usage statistics
const stats = server.getTokenUsageStats();
console.log(`Average tokens: ${stats.average}, Max: ${stats.max}`);

Examples

Check out the examples directory for complete usage examples:

web-example.html - Example of using the MCP in a web application
web-example.ts - TypeScript code for the web example

Documentation

For detailed documentation, see:

Protocol Specification - Detailed specification of the MCP protocol
API Documentation (coming soon)

Roadmap

Enhance Windows Desktop Integration
- Complete the Windows UI Automation integration
- Improve element detection and interaction
Future Platform Support
- Add macOS Accessibility API support
- Develop Linux AT-SPI2 connector
Advanced Features
- Semantic enrichment using local ML models
- Optimize differential update algorithms
- Develop fallback mechanisms for complex UIs
- Improve browser extension with additional features

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Quick Start

Clone the repository

git clone https://github.com/amirguterman/model-eyes

Install dependencies

cd model-eyes
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Owneramirguterman

Repomodel-eyes

LanguageTypeScript

LicenseMIT License

Last fetched8/10/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation