论文信息 - DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

DexVIP: Learning Dexterous Grasping with Human Hand Pose Priors from Video

Dexterous multi-fingered robotic hands have a formidable action space, 1 yet their morphological similarity to the human hand holds immense potential to 2 accelerate robot learning. We propose DexVIP, an approach to learn dexterous 3 robotic grasping from human-object interactions present in in-the-wild YouTube 4 videos. We do this by curating grasp images from human-object interaction videos 5 and imposing a prior over the agent’s hand pose when learning to grasp with deep 6 reinforcement learning. A key advantage of our method is that the learned policy 7 is able to leverage free-form in-the-wild visual data. As a result, it can easily 8 scale to new objects, and it sidesteps the standard practice of collecting human 9 demonstrations in a lab—a much more expensive and indirect way to capture 10 human expertise. Through experiments on 27 objects with a 30-DoF simulated 11 robot hand, we demonstrate that DexVIP compares favorably to existing approaches 12 that lack a hand pose prior or rely on specialized tele-operation equipment to obtain 13 human demonstrations, while also being faster to train. 14

Kristen Grauman | Priyanka Mandikal | K. Grauman | Priyanka Mandikal

[1] Vikash Kumar,et al. MuJoCo HAPTIX: A virtual reality system for hand manipulation , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[2] David F. Fouhey,et al. Understanding Human Hands in Contact at Internet Scale , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Takaaki Shiratori,et al. FrankMocap: Fast Monocular 3D Hand and Body Motion Capture by Regression and Integration , 2020, ArXiv.

[4] Kristen Grauman,et al. Dexterous Robotic Grasping with Object-Centric Visual Affordances , 2020, ArXiv.

[5] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.

[6] Dimitrios Tzionas,et al. GRAB: A Dataset of Whole-Body Human Grasping of Objects , 2020, ECCV.

[7] Kristen Grauman,et al. Grounded Human-Object Interaction Hotspots From Video , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[9] Darwin G. Caldwell,et al. AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[10] Peter Stone,et al. Recent Advances in Imitation Learning from Observation , 2019, IJCAI.

[11] Sergey Levine,et al. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12] Abhinav Gupta,et al. Multiple Interactions Made Easy (MIME): Large Scale Demonstrations Data for Imitation , 2018, CoRL.

[13] Ludovic Righetti,et al. Leveraging Contact Forces for Learning to Grasp , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14] Christian Theobalt,et al. Monocular Real-Time Hand Shape and Motion Capture Using Multi-Modal Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.

[16] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[17] Nikolaos G. Tsagarakis,et al. Center-of-Mass-Based Grasp Pose Adaptation Using 3D Range and Force/Torque Sensing , 2018, Int. J. Humanoid Robotics.

[18] Ilija Radosavovic,et al. Reconstructing Hand-Object Interactions in the Wild , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Yoichi Sato,et al. Understanding Hand-Object Manipulation with Grasp Types and Object Attributes , 2016, Robotics: Science and Systems.

[20] Hui Cheng,et al. Learning Affordance Space in Physical World for Vision-based Robotic Object Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21] Dieter Fox,et al. ContactGrasp: Functional Multi-finger Grasp Synthesis from Contact , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22] Sergey Levine,et al. Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23] Jian Chen,et al. Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps , 2020, NeurIPS.

[24] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[25] Yashraj S. Narang,et al. DexYCB: A Benchmark for Capturing Hand Grasping of Objects , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] Vikash Kumar,et al. Fast, strong and compliant pneumatic actuation for dexterous tendon-driven hands , 2013, 2013 IEEE International Conference on Robotics and Automation.

[28] Charles C. Kemp,et al. ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31] Henry Zhu,et al. Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[32] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[33] Vishnu Naresh Boddeti,et al. Gesture-based Bootstrapping for Egocentric Hand Segmentation , 2016, ArXiv.

[34] Deepak Pathak,et al. Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller , 2019, NeurIPS.

[35] Vijay Kumar,et al. Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[36] Cordelia Schmid,et al. Learning Joint Reconstruction of Hands and Manipulated Objects , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Charles C. Kemp,et al. ContactPose: A Dataset of Grasps with Object Contact and Hand Pose , 2020, ECCV.

[38] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39] Ivan Laptev,et al. HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Andy Zeng,et al. Learning to See before Learning to Act: Visual Pre-training for Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[41] Aravind Rajeswaran,et al. Learning Deep Visuomotor Policies for Dexterous Hand Manipulation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[42] Danica Kragic,et al. Learning Task-Oriented Grasping From Human Activity Datasets , 2019, IEEE Robotics and Automation Letters.

[43] Joseph Redmon,et al. Real-time grasp detection using convolutional neural networks , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[44] Sergey Levine,et al. Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[45] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[46] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.