gpapagiannis
MCP Servergpapagiannispublic

r plus x hand2actions

提供代码实现从人类演示视频中提取机器人动作,支持零样本任务执行。

Repository Info

41
Stars
3
Forks
41
Watchers
2
Issues
Python
Language
BSD 3-Clause "New" or "Revised" License
License

About This Server

提供代码实现从人类演示视频中提取机器人动作,支持零样本任务执行。

Model Context Protocol (MCP) - This server can be integrated with AI applications to provide additional context and capabilities, enabling enhanced AI interactions and functionality.

Documentation

🦾 R+X: Retrieval and Execution from Everyday Human Videos

🎥 Extracting robotic actions from human video demonstrations

!teaser

This repository provides the code used to extract robot actions given a human video demonstration for the paper R+X: Retrieval and Execution from Everyday Human Videos [paper]. R+X is divided into a Retrieval and an Execution phase. The execution phase leverages the actions extracted from the human videos to execute tasks zero-shot in new settings. R+X's execution phase relies on Keypoint Action Tokens (KAT). For code on KAT please refer to: colab. This is a simplified version in that it is not optimised to process large scale video datasets in parallel, instead it sequentially processes each frame in a video, runs on cpu (apart from the hand-tracking model) and acts as a good starting point demonstrating how to extract actions from human videos. We provide one video of the "grasp franta" and "pick up phone" tasks used in our experiments. Even though our experiments involved a human recording videos while moving, here we provide an example with a moving camera and one with a static camera each with different intrinsics.

📋 Installation

Download repository:

git clone https://github.com/gpapagiannis/r-plus-x-hand2actions.git
cd r-plus-x-hand2actions

Create a conda environment. Needs python >= 3.10:

conda env create -f conda_env.yaml
conda activate rpx

Install requirements.txt

pip install -r requirements.txt
  • Install HaMeR for hand tracking. Follow the instructions in https://github.com/geopavlakos/hamer
  • After installing HaMeR, go to: _DATA > hamer_ckpts > model_config.yaml. Then in the model_config.yaml set: FOCAL_LENGTH = 918. This ensures compatibility with the camera intrinsics used for the provided examples. We will fix that soon to be set automatically given any intrinsics matrix.
  • After installing HaMeR, replace the renderer.py file in "hamer/hamer/utils/renderer.py" with the renderer.py provided here renderer.py
  • Install the Pytorch implementation of MANO hand model https://github.com/otaheri/MANO

👟 Running the code ...

The whole method is contained in video_to_gripper_hamer_kpts. Just run:

python video_to_gripper_hamer_kpts.py

By default this will extract the actions for the "grasp_fanta" human video demo found here. For the pick_up_phone task (and its corresponding intrinsics), as well as to vizualize step by step the hand extraction process see the global_vars.py file.

✏️ Important Note

Note: Our method is optimised for the Robotiq Gripper 2F-85 which was used for our experiments. The model we used is in gripper_point_cloud_dense.npy. In principle, the heuristics used to map the gripper to the hand could be applied to most parallel jaw gripper, however you would need to manually define the gripper points you would like to map the hand to. We have done this in line 552 of video_to_gripper_hamer_kpts.py. \

Line 552:

dense_pcd_kpts = {"index_front": 517980, "thumb_front": 248802, "wrist": 246448}

The numbers correspond to the indices of the points in the point cloud: gripper_point_cloud_dense.npy. You can vizualize the action extraction process both in our code and by visiting the bottom of our website: https://www.robot-learning.uk/r-plus-x. If you use a different parallel jaw gripper, convert the mesh to a point cloud, select the indices you would like to map from the gripper to the tip of the index, the tip of the thumb and the mean distance between index mcp and the thumb dip of the hand.

🎞️ Vizualizing the results

The notebook vizualizer.ipynb provides minimal code to vizualize the hand to gripper action extraction results. For each task a scene_files folder is created that contains for each video frame, the frame's point cloud, the detected hand mesh, the gripper pose in the camera's frame, the gripper's point cloud and a file hand_joints_kpts_3d.npy that includes the sequence of gripper poses which could be used for further processing to train a model.

Quick Start

1

Clone the repository

git clone https://github.com/gpapagiannis/r-plus-x-hand2actions
2

Install dependencies

cd r-plus-x-hand2actions
npm install
3

Follow the documentation

Check the repository's README.md file for specific installation and usage instructions.

Repository Details

Ownergpapagiannis
Repor-plus-x-hand2actions
LanguagePython
LicenseBSD 3-Clause "New" or "Revised" License
Last fetched8/10/2025

Recommended MCP Servers

💬

Discord MCP

Enable AI assistants to seamlessly interact with Discord servers, channels, and messages.

integrationsdiscordchat
🔗

Knit MCP

Connect AI agents to 200+ SaaS applications and automate workflows.

integrationsautomationsaas
🕷️

Apify MCP Server

Deploy and interact with Apify actors for web scraping and data extraction.

apifycrawlerdata
🌐

BrowserStack MCP

BrowserStack MCP Server for automated testing across multiple browsers.

testingqabrowsers

Zapier MCP

A Zapier server that provides automation capabilities for various apps.

zapierautomation