Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning

Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-gentleness, which can be defined as excessive impact force. However, augmenting with only this penalty impairs learning: policies get stuck in a local optimum which avoids all contact with the environment. Prior research has shown that combining auxiliary tasks or intrinsic rewards can be beneficial for stabilizing and accelerating learning in sparse-reward domains, and indeed we find that introducing a surprise-based intrinsic reward does avoid the no-contact failure case. However, we show that a simple dynamics-based surprise is not as effective as penalty-based surprise. Penalty-based surprise, based on predicting forceful contacts, has a further benefit: it encourages exploration which is contact-rich yet gentle. We demonstrate the effectiveness of the approach using a complex, tendon-powered robot hand with tactile sensors. Videos are available at this http URL.

[1]  Liang-Boon Wee,et al.  On the dynamics of contact between space robots and configuration control for impact minimization , 1993, IEEE Trans. Robotics Autom..

[2]  P. Scheidt,et al.  The epidemiology of nonfatal injuries among US children and youth. , 1995, American journal of public health.

[3]  E. Altman Constrained Markov Decision Processes , 1999 .

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  M. Špinka,et al.  Mammalian Play: Training for the Unexpected , 2001, The Quarterly Review of Biology.

[6]  Koji Ikuta,et al.  Safety Evaluation Method of Design and Control for Human-Care Robots , 2003, Int. J. Robotics Res..

[7]  Stuart J. Russell,et al.  Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.

[8]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[9]  Yangsheng Xu,et al.  Configuration Control of Space Robots for Impact Minimization , 2006, 2006 IEEE International Conference on Robotics and Biomimetics.

[10]  E. B. H. Sandseter Categorising risky play—how can we identify risk‐taking in children's play? , 2007 .

[11]  B. Morrongiello,et al.  Understanding children's injury-risk behaviors: the independent contributions of cognitions and emotions. , 2007, Journal of pediatric psychology.

[12]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[13]  Pierre-Yves Oudeyer,et al.  What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[14]  Christoph H. Lampert,et al.  Learning Dynamic Tactile Sensing With Robust Vision-Based Training , 2011, IEEE Transactions on Robotics.

[15]  J. A. Fishel,et al.  Sensing tactile microvibrations with the BioTac — Comparison with human sensitivity , 2012, 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (BioRob).

[16]  M. Brussoni,et al.  Risky Play and Children’s Safety: Balancing Priorities for Optimal Child Development , 2012, International journal of environmental research and public health.

[17]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Jürgen Schmidhuber,et al.  Learning skills from play: Artificial curiosity on a Katana robot arm , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[21]  Pierre-Yves Oudeyer,et al.  Intrinsically Motivated Learning of Real-World Sensorimotor Skills with Developmental Constraints , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[22]  Lihui Wang,et al.  Minimizing Energy Consumption for Robot Arm Movement , 2014 .

[23]  Pierre-Yves Oudeyer,et al.  The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration , 2014, Front. Neurosci..

[24]  Shigeki Sugano,et al.  Tactile object recognition using deep learning and dropout , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[25]  Tomás Svoboda,et al.  Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.

[26]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[27]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[28]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[29]  Dewen Hu,et al.  Multiobjective Reinforcement Learning: A Comprehensive Overview , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[30]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[31]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[32]  Christopher K. Hsee,et al.  The Pandora Effect , 2016, Psychological science.

[33]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[35]  Martina Zambelli,et al.  Multimodal imitation using self-learned sensorimotor representations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Filip De Turck,et al.  #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning , 2016, NIPS.

[37]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Gordon Cheng,et al.  A Tactile-Based Framework for Active Object Learning and Discrimination using Multimodal Robotic Skin , 2017, IEEE Robotics and Automation Letters.

[39]  Jingchen Hu,et al.  Pre-Impact Configuration Designing of a Robot Manipulator for Impact Minimization , 2017 .

[40]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[41]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[42]  Justin Fu,et al.  EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[43]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[44]  Yuichiro Yoshikawa,et al.  Intrinsically motivated reinforcement learning for human-robot interaction in the real-world , 2018, Neural Networks.

[45]  Misha Denil,et al.  Learning Awareness Models , 2018, ICLR.

[46]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[47]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[48]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.