MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale

General-purpose robotic systems must master a large repertoire of diverse skills to be useful in a range of daily tasks. While reinforcement learning provides a powerful framework for acquiring individual behaviors, the time needed to acquire each skill makes the prospect of a generalist robot trained with RL daunting. In this paper, we study how a largescale collective robotic learning system can acquire a repertoire of behaviors simultaneously, sharing exploration, experience, and representations across tasks. In this framework new tasks can be continuously instantiated from previously learned tasks improving overall performance and capabilities of the system. To instantiate this system, we develop a scalable and intuitive framework for specifying new tasks through user-provided examples of desired outcomes, devise a multi-robot collective learning system for data collection that simultaneously collects experience for multiple tasks, and develop a scalable and generalizable multitask deep reinforcement learning method, which we call MTOpt. We demonstrate how MT-Opt can learn a wide range of skills, including semantic picking (i.e., picking an object from a particular category), placing into various fixtures (e.g., placing a food item onto a plate), covering, aligning, and rearranging. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots, and demonstrate the performance of our system both in terms of its ability to generalize to structurally similar new tasks, and acquire distinct new tasks more quickly by leveraging past experience. We recommend viewing the videos at karolhausman.github.io/mt-opt

[1]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[2]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[3]  Yee Whye Teh,et al.  Meta-learning of Sequential Strategies , 2019, ArXiv.

[4]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[5]  Abhinav Gupta,et al.  Learning to push by grasping: Using multiple tasks for effective learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[7]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[8]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[9]  Sergey Levine,et al.  One-Shot Hierarchical Imitation Learning of Compound Visuomotor Tasks , 2018, ArXiv.

[10]  Nuttapong Chentanez,et al.  Intrinsically Motivated Learning of Hierarchical Collections of Skills , 2004 .

[11]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[12]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[13]  Sergey Levine,et al.  Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement , 2020, NeurIPS.

[14]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[15]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[17]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[18]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[19]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[20]  Jeannette Bohg,et al.  Concept2Robot: Learning manipulation concepts from instructions and human demonstrations , 2020, Robotics: Science and Systems.

[21]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[22]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[23]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[24]  Abhinav Gupta,et al.  Discovering Motor Programs by Recomposing Demonstrations , 2020, ICLR.

[25]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[26]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[27]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[28]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Dmitry Kalashnikov,et al.  Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Balaraman Ravindran,et al.  Learning to Multi-Task by Active Sampling , 2017, ICLR.

[31]  Oleg O. Sushkov,et al.  Scaling data-driven robotics with reward sketching and batch reinforcement learning , 2019, Robotics: Science and Systems.

[32]  Sergey Levine,et al.  Divide-and-Conquer Reinforcement Learning , 2017, ICLR.

[33]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[34]  Joost van de Weijer,et al.  Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[35]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[36]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[37]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[38]  Sergey Levine,et al.  Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[39]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[40]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[42]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[43]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[45]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[46]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[47]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[48]  Yan Liu,et al.  Deep Generative Dual Memory Network for Continual Learning , 2017, ArXiv.

[49]  Russ Tedrake,et al.  The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation , 2020, WAFR.

[50]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[51]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[52]  Sridhar Mahadevan,et al.  Hierarchical Policy Gradient Algorithms , 2003, ICML.

[53]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[54]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[55]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[56]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[57]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[58]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[59]  Andrew J. Davison,et al.  Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[60]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[62]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[63]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[64]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[65]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[66]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[67]  Yen-Chen Lin,et al.  Experience-Embedded Visual Foresight , 2019, CoRL.

[68]  Shimon Whiteson,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2020, ICLR.

[69]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[70]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[71]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[72]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[73]  Chong Li,et al.  Multi-task Learning for Continuous Control , 2018, ArXiv.

[74]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[75]  Richard Socher,et al.  Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning , 2017, ICLR.

[76]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[77]  Andrea Bonarini,et al.  Transfer of samples in batch reinforcement learning , 2008, ICML '08.

[78]  S. Levine,et al.  Guided Meta-Policy Search , 2019, NeurIPS.

[79]  Sergey Levine,et al.  Scalable Multi-Task Imitation Learning with Autonomous Improvement , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[80]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[81]  Razvan Pascanu,et al.  Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[82]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[83]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[84]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[85]  Sergey Levine,et al.  Improvisation through Physical Understanding: Using Novel Objects as Tools with Visual Foresight , 2019, Robotics: Science and Systems.

[86]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.