MCP Serverchriscarrollsmithpublic

agentic etl

Power data scraping and transformation with a Cursor Agent and MCP tools

Repository Info

Stars

Forks

Watchers

Issues

TypeScript

Language

License

View on GitHubGitHub Download DocumentationDocs

About This Server

Power data scraping and transformation with a Cursor Agent and MCP tools

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

Agentic ETL

This repository contains a collection of Cursor rules, helper scripts, and documentation to power a generalized agentic ETL process with minimal human intervention. The idea is to enable an automated AI agent to:

Locate the most accessible and highest quality online data sources for information you want to retrieve
Determine the best way to extract the data using a combination of fetch requests, data APIs, sophisticated scraping systems, or LLM browser use
Design a JSON schema for chunking, extracting, or annotating the scraped information to turn it into usable structured data
Use LLM tools to perform the transformation according to the schema and save the results to JSON
Optionally upload the raw and/or structured JSON data to a database

This workflow is designed to power RAG pipelines, but can be used for any ETL process.

Workflow Documentation

Most ETL repositories on Github are mechanical software tools designed for a specific use case. This one is more like a generalized cookbook or recipe for an LLM-powered agent to follow for using existing tools to support a wider range of use cases. The workflow is designed to be executed by a Claude 3.7 Sonnet-powered Cursor Agent, with Cursor rules in the .cursor/rules directory and helpful scripts and documentation in the project root.

After presenting your use case to the agent, you should prompt the agent to follow the steps in the order listed below.

Data Collection - The agent can reference the Bash or Python scraping rules for help with locating and scraping data.
Data Processing - The agent can reference the Bash or Python cleaning rules for help with transforming scraped data into clean JSON.
Data Upload - The agent can reference the Digital Ocean PostgreSQL setup and upload rules for help with setting up a database and uploading the data.

More languages, tools, and deployment options can be supported by adding additional Cursor rules to the .cursor/rules folder.

Prerequisites

Before getting started, I highly recommend installing at least the following:

Cursor
Node.js
uv

Your Cursor Agent will walk you through the setup process for any additional required tools and resources for your use case, or perhaps even obtain and install them itself.

Contributing

If you'd like to contribute to this project, please:

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

MIT

Quick Start

Clone the repository

git clone https://github.com/chriscarrollsmith/agentic-etl

Install dependencies

cd agentic-etl
npm install

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownerchriscarrollsmith

Repoagentic-etl

LanguageTypeScript

License-

Last fetched8/10/2025

Quick Links

Issues

Releases

License

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat

🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas

🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata

🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

⚡

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation