Robotics: Science and Systems XXI

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Kun Wu, Chengkai Hou, Jiaming Liu, Zhengping Che, Xiaozhu Ju, Zhuqin Yang, Meng Li, Yinuo Zhao, Zhiyuan Xu, Guang Yang, Shichao Fan, Xinhua Wang, Fei Liao, Zhen Zhao, Guangyu Li, Zhao Jin, Lecheng Wang, Jilei Mao, Ning Liu, Pei Ren, Qiang Zhang, Yaoxu Lyu, Mengzhen Liu, He Jingyang, Yulin Luo, Zeyu Gao, Chenxuan Li, Chenyang Gu, Yankai Fu, Di Wu, Xingyu Wang, Sixiang Chen, Zhenyu Wang, Pengju An, Siyuan Qian, Shanghang Zhang, Jian Tang

Abstract:

Developing robust and general-purpose manipulation policies is a key goal in robotics. To achieve effective generalization, it is essential to construct comprehensive datasets that encompass a large number of demonstration trajectories and diverse tasks. Unlike vision or language data, which can be sourced from the internet, robotic datasets require detailed observations and manipulation actions, necessitating significant investments in both hardware-software infrastructure and human labor. While existing works have focused on assembling various individual robot datasets, there is still a lack of a unified data collection standard and insufficient high-quality data across diverse tasks, scenarios, and robot types. In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the X-Humannoid Tien Kung humanoid robot with dual dexterous hands, the AgileX dual-arm robot, and the UR5e. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and stateof-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is at https://x-humanoidrobomind.github.io/.

Download:

Bibtex:

  
@INPROCEEDINGS{WuK-RSS-25, 
    AUTHOR    = {Kun Wu AND Chengkai Hou AND Jiaming Liu AND Zhengping Che AND Xiaozhu Ju AND Zhuqin Yang AND Meng Li AND Yinuo Zhao AND Zhiyuan Xu AND Guang Yang AND Shichao Fan AND Xinhua Wang AND Fei Liao AND Zhen Zhao AND Guangyu Li AND Zhao Jin AND Lecheng Wang AND Jilei Mao AND Ning Liu AND Pei Ren AND Qiang Zhang AND Yaoxu Lyu AND Mengzhen Liu AND He Jingyang AND Yulin Luo AND Zeyu Gao AND Chenxuan Li AND Chenyang Gu AND Yankai Fu AND Di Wu AND Xingyu Wang AND Sixiang Chen AND Zhenyu Wang AND Pengju An AND Siyuan Qian AND Shanghang Zhang AND Jian Tang}, 
    TITLE     = {{RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation}}, 
    BOOKTITLE = {Proceedings of Robotics: Science and Systems}, 
    YEAR      = {2025}, 
    ADDRESS   = {LosAngeles, CA, USA}, 
    MONTH     = {June}, 
    DOI       = {10.15607/RSS.2025.XXI.152} 
}