论文信息 - Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Everyday contact-rich tasks, such as peeling, cleaning, and writing, demand multimodal perception for effective and precise task execution. However, these present a novel challenge to robots as they lack the ability to combine these multimodal stimuli for performing contact-rich tasks. Learning-based methods have attempted to model multi-modal contact-rich tasks, but they often require extensive training examples and task-specific reward functions which limits their practicality and scope. Hence, we propose a generalizable model-free learning-from-demonstration framework for robots to learn contact-rich skills without explicit reward engineering. We present a novel multi-modal sensor data representation which improves the learning performance for contact-rich skills. We performed training and experiments using the real-life Sawyer robot for three everyday contact-rich skills – cleaning, writing, and peeling. Notably, the framework achieves a success rate of 100% for the peeling and writing skill, and 80% for the cleaning skill. Hence, this skill learning framework can be extended for learning other physical manipulation skills.

[1] Jimmy A. Jørgensen,et al. Adaptation of manipulation skills in physical contact with the environment to reference force profiles , 2015, Auton. Robots.

[2] Jae-Bok Song,et al. Automated guidance of peg-in-hole assembly tasks for complex-shaped parts , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[4] Danica Kragic,et al. Learning tactile characterizations of object- and pose-specific grasps , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[6] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[7] Jitendra Malik,et al. More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[8] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[10] Aude Billard,et al. What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[11] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12] Sergey Levine,et al. Learning force-based manipulation of deformable objects from multiple demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13] Stefan Schaal,et al. Data-Driven Online Decision Making for Autonomous Manipulation , 2015, Robotics: Science and Systems.

[14] Aude Billard,et al. Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[15] Kaiming He,et al. Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] David Watkins-Valls,et al. Multi-Modal Geometric Learning for Grasping and Manipulation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[18] Sergey Levine,et al. Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19] Juan Pablo Wachs,et al. Extending Policy from One-Shot Learning through Coaching , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[22] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[23] Takamitsu Matsubara,et al. Learning Force Control for Contact-Rich Manipulation Tasks With Rigid Position-Controlled Robots , 2020, IEEE Robotics and Automation Letters.

[24] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[25] Carla E. Brodley,et al. Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.

[26] Sergey Levine,et al. Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[27] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[28] Stefan Schaal,et al. Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29] Aude Billard,et al. Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30] Kenith V. Sobel,et al. PSYCHOLOGICAL SCIENCE Research Article Neural Synergy Between Kinetic Vision and Touch , 2022 .

[31] Sachin Chitta,et al. Human-Inspired Robotic Grasp Control With Tactile Sensing , 2011, IEEE Transactions on Robotics.