Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Everyday contact-rich tasks, such as peeling, cleaning, and writing, demand multimodal perception for effective and precise task execution. However, these present a novel challenge to robots as they lack the ability to combine these multimodal stimuli for performing contact-rich tasks. Learning-based methods have attempted to model multi-modal contact-rich tasks, but they often require extensive training examples and task-specific reward functions which limits their practicality and scope. Hence, we propose a generalizable model-free learning-from-demonstration framework for robots to learn contact-rich skills without explicit reward engineering. We present a novel multi-modal sensor data representation which improves the learning performance for contact-rich skills. We performed training and experiments using the real-life Sawyer robot for three everyday contact-rich skills – cleaning, writing, and peeling. Notably, the framework achieves a success rate of 100% for the peeling and writing skill, and 80% for the cleaning skill. Hence, this skill learning framework can be extended for learning other physical manipulation skills.

[1]  Jimmy A. Jørgensen,et al.  Adaptation of manipulation skills in physical contact with the environment to reference force profiles , 2015, Auton. Robots.

[2]  Jae-Bok Song,et al.  Automated guidance of peg-in-hole assembly tasks for complex-shaped parts , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[3]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[4]  Danica Kragic,et al.  Learning tactile characterizations of object- and pose-specific grasps , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[6]  David Filliat,et al.  State Representation Learning for Control: An Overview , 2018, Neural Networks.

[7]  Jitendra Malik,et al.  More Than a Feeling: Learning to Grasp and Regrasp Using Vision and Touch , 2018, IEEE Robotics and Automation Letters.

[8]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[10]  Aude Billard,et al.  What is the Teacher"s Role in Robot Programming by Demonstration? - Toward Benchmarks for Improved Learning , 2007 .

[11]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Sergey Levine,et al.  Learning force-based manipulation of deformable objects from multiple demonstrations , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Stefan Schaal,et al.  Data-Driven Online Decision Making for Autonomous Manipulation , 2015, Robotics: Science and Systems.

[14]  Aude Billard,et al.  Learning from Humans , 2016, Springer Handbook of Robotics, 2nd Ed..

[15]  Kaiming He,et al.  Detecting and Recognizing Human-Object Interactions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  David Watkins-Valls,et al.  Multi-Modal Geometric Learning for Grasping and Manipulation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[18]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Juan Pablo Wachs,et al.  Extending Policy from One-Shot Learning through Coaching , 2019, 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[22]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[23]  Takamitsu Matsubara,et al.  Learning Force Control for Contact-Rich Manipulation Tasks With Rigid Position-Controlled Robots , 2020, IEEE Robotics and Automation Letters.

[24]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[25]  Carla E. Brodley,et al.  Proceedings of the twenty-first international conference on Machine learning , 2004, International Conference on Machine Learning.

[26]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[27]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[28]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[30]  Kenith V. Sobel,et al.  PSYCHOLOGICAL SCIENCE Research Article Neural Synergy Between Kinetic Vision and Touch , 2022 .

[31]  Sachin Chitta,et al.  Human-Inspired Robotic Grasp Control With Tactile Sensing , 2011, IEEE Transactions on Robotics.