Combining learned skills and reinforcement learning for robotic manipulations

Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. The supervised approach of imitation learning can handle short tasks but suffers from compounding errors and the need of many demonstrations for longer and more complex tasks. Reinforcement learning (RL) can find solutions beyond demonstrations but requires tedious and task-specific reward engineering for multi-step problems. In this work we address the difficulties of both methods and explore their combination. To this end, we propose a RL policies operating on pre-trained skills, that can learn composite manipulations using no intermediate rewards and no demonstrations of full tasks. We also propose an efficient training of basic skills from few synthetic demonstrated trajectories by exploring recent CNN architectures and data augmentation. We show successful learning of policies for composite manipulation tasks such as making a simple breakfast. Notably, our method achieves high success rates on a real robot, while using synthetic training data only.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  TRUNCATED HORIZON POLICY SEARCH: COMBINING REINFORCEMENT LEARNING & IMITATION LEARNING , 2018 .

[4]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[5]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[6]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[9]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[10]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[15]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[16]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[17]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[18]  Xian Zhou,et al.  Can robots assemble an IKEA chair? , 2018, Science Robotics.

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Stefan Lee,et al.  Neural Modular Control for Embodied Question Answering , 2018, CoRL.

[21]  Marc Toussaint,et al.  Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning , 2015, IJCAI.

[22]  Martin A. Riedmiller,et al.  Acquiring visual servoing reaching and grasping skills using neural reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[23]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[24]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[25]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[26]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[27]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[28]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[29]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[30]  Leslie Pack Kaelbling,et al.  A constraint-based method for solving sequential manipulation planning problems , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[33]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[34]  Cordelia Schmid,et al.  Learning to Augment Synthetic Images for Sim2Real Policy Transfer , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[36]  Nolan Wagener,et al.  Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[39]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[42]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[44]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[45]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[46]  Yang Gao,et al.  Reinforcement Learning from Imperfect Demonstrations , 2018, ICLR.

[47]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[48]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[49]  Sergey Levine,et al.  Deep Reinforcement Learning for Robotic Manipulation , 2016, ArXiv.