A Simple Approach to Continual Learning by Transferring Skill Parameters

In order to be effective general purpose machines in real world environments, robots not only will need to adapt their existing manipulation skills to new circumstances, they will need to acquire entirely new skills on-the-fly. A great promise of continual learning is to endow robots with this ability, by using their accumulated knowledge and experience from prior skills. We take a fresh look at this problem, by considering a setting in which the robot is limited to storing that knowledge and experience only in the form of learned skill policies. We show that storing skill policies, careful pre-training, and appropriately choosing when to transfer those skill policies is sufficient to build a continual learner in the context of robotic manipulation. We analyze which conditions are needed to transfer skills in the challenging Meta-World simulation benchmark. Using this analysis, we introduce a pair-wise metric relating skills that allows us to predict the effectiveness of skill transfer between tasks, and use it to reduce the problem of continual learning to curriculum selection. Given an appropriate curriculum, we show how to continually acquire robotic manipulation skills without forgetting, and using far fewer samples than needed to train them from scratch.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Andy Zeng,et al.  Learning to See before Learning to Act: Visual Pre-training for Manipulation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[5]  Chelsea Finn,et al.  Unsupervised Visuomotor Control through Distributional Planning Networks , 2019, Robotics: Science and Systems.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[8]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[9]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[10]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[11]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[12]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[14]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[16]  Hong Qiao,et al.  State Primitive Learning to Overcome Catastrophic Forgetting in Robotics , 2020, Cognitive Computation.

[17]  Sergey Levine,et al.  Sim2Real View Invariant Visual Servoing by Recurrent Control , 2017, ArXiv.

[18]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[19]  Pieter Abbeel,et al.  Reset-Free Lifelong Learning with Skill-Space Planning , 2020, ICLR.

[20]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[21]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[22]  Leslie Pack Kaelbling,et al.  Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[23]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[24]  Sergey Levine,et al.  Deep Dynamics Models for Learning Dexterous Manipulation , 2019, CoRL.

[25]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[26]  Gregory Dudek,et al.  Adapting learned robotics behaviours through policy adjustment , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[28]  Dorsa Sadigh,et al.  Transfer Reinforcement Learning Across Homotopy Classes , 2021, IEEE Robotics and Automation Letters.

[29]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[30]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[31]  Sergey Levine,et al.  Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[32]  Jan Peters,et al.  SKID RAW: Skill Discovery From Raw Trajectories , 2021, IEEE Robotics and Automation Letters.

[33]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[34]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[35]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[36]  Sergey Levine,et al.  One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL , 2020, NeurIPS.

[37]  Joshua B. Tenenbaum,et al.  Learning Task Decomposition with Ordered Memory Policy Network , 2021, ICLR.

[38]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[39]  Silvio Savarese,et al.  LASER: Learning a Latent Action Space for Efficient Reinforcement Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Martin A. Riedmiller,et al.  Learning by Playing - Solving Sparse Reward Tasks from Scratch , 2018, ICML.

[41]  Jackie Kay,et al.  Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer , 2019, ArXiv.

[42]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43]  Ville Kyrki,et al.  Active Incremental Learning of a Contextual Skill Model , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[45]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[46]  Freek Stulp Adaptive exploration for continual reinforcement learning , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[47]  Matteo Leonetti,et al.  Curriculum Learning for Cumulative Return Maximization , 2019, IJCAI.

[48]  Yee Whye Teh,et al.  Behavior Priors for Efficient Reinforcement Learning , 2020, J. Mach. Learn. Res..

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Huazhe Xu,et al.  Solving Compositional Reinforcement Learning Problems via Task Reduction , 2021, ICLR.

[51]  Ville Kyrki,et al.  Transferring Generalizable Motor Primitives From Simulation to Real World , 2019, IEEE Robotics and Automation Letters.

[52]  Sergey Levine,et al.  COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning , 2020, ArXiv.

[53]  Yann LeCun,et al.  Learning long‐range vision for autonomous off‐road driving , 2009, J. Field Robotics.

[54]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[55]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, ArXiv.

[56]  Maja J. Mataric,et al.  Robot life-long task learning from human demonstrations: a Bayesian approach , 2017, Auton. Robots.

[57]  Ville Kyrki,et al.  Affordance Learning for End-to-End Visuomotor Robot Control , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58]  Andrew J. Davison,et al.  Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[59]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[60]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Andrew J. Davison,et al.  Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[62]  Zohreh Raziei,et al.  Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer , 2020, Eng. Appl. Artif. Intell..

[63]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[64]  Gaurav S. Sukhatme,et al.  Scaling simulation-to-real transfer by learning composable robot skills , 2018, ISER.

[65]  Matthew E. Taylor,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[66]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[67]  Jonathan W. Hurst,et al.  Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie , 2019, ArXiv.

[68]  Gaurav S. Sukhatme,et al.  Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic Reinforcement Learning , 2020 .

[69]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[70]  Sameera S. Ponda,et al.  Autonomous navigation of stratospheric balloons using reinforcement learning , 2020, Nature.

[71]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[72]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[73]  Subramanian Ramamoorthy,et al.  Lifelong transfer learning with an option hierarchy , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[75]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[76]  N. Heess,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[77]  Yuval Tassa,et al.  Reusable neural skill embeddings for vision-guided whole body movement and object manipulation , 2019, ArXiv.

[78]  Jan Peters,et al.  Active Incremental Learning of Robot Movement Primitives , 2017, CoRL.

[79]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[80]  Mario Michael Krell,et al.  Accounting for Task-Difficulty in Active Multi-Task Robot Control Learning , 2015, KI - Künstliche Intelligenz.

[81]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[82]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[83]  Jitendra Malik,et al.  Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[84]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[85]  Jean-Baptiste Mouret,et al.  Reset-free Trial-and-Error Learning for Robot Damage Recovery , 2016, Robotics Auton. Syst..

[86]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[87]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[88]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[89]  Erfu Yang,et al.  Multiagent Reinforcement Learning for Multi-Robot Systems: A Survey , 2004 .

[90]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[91]  Lawson L. S. Wong,et al.  Action Priors for Large Action Spaces in Robotics , 2021, ArXiv.

[92]  Eric Eaton,et al.  Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting , 2020, NeurIPS.

[93]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[94]  Oliver Kroemer,et al.  Learning to Compose Hierarchical Object-Centric Controllers for Robotic Manipulation , 2020, CoRL.

[95]  Jean-Baptiste Mouret,et al.  Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[96]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[97]  Peter Stone,et al.  Learning Inter-Task Transferability in the Absence of Target Task Samples , 2015, AAMAS.

[98]  Divyam Rastogi,et al.  Sample-efficient Reinforcement Learning via Difference Models , 2018 .

[99]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[100]  Pierre Sermanet,et al.  Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning , 2020, CoRL.

[101]  Raia Hadsell,et al.  CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[102]  Manuela M. Veloso,et al.  Probabilistic policy reuse in a reinforcement learning agent , 2006, AAMAS '06.

[103]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[104]  Hongmin Wu,et al.  Incremental Learning Robot Task Representation and Identification , 2020, Nonparametric Bayesian Learning for Collaborative Robot Multimodal Introspection.

[105]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[106]  Yen-Chen Lin,et al.  Experience-Embedded Visual Foresight , 2019, CoRL.

[107]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[108]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[109]  Akshara Rai,et al.  Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[110]  Jean-Baptiste Mouret,et al.  Adaptive Prior Selection for Repertoire-Based Online Adaptation in Robotics , 2019, Frontiers in Robotics and AI.

[111]  Thomas Lampe,et al.  "What, not how": Solving an under-actuated insertion task from scratch , 2020, ArXiv.

[112]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[113]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[114]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[115]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[116]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[117]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[118]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[119]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[120]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[121]  Oliver Kroemer,et al.  Towards learning hierarchical skills for multi-phase manipulation tasks , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[122]  Dushyant Rao,et al.  Data-efficient Hindsight Off-policy Option Learning , 2021, ICML.

[123]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[124]  Jan Peters,et al.  Extracting low-dimensional control variables for movement primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[125]  Ricardo Luna Gutierrez,et al.  Information-theoretic Task Selection for Meta-Reinforcement Learning , 2020, Neural Information Processing Systems.

[126]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[127]  Sergey Levine,et al.  Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[128]  Sergey Levine,et al.  Parrot: Data-Driven Behavioral Priors for Reinforcement Learning , 2020, ICLR.

[129]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[130]  Alberto Camacho,et al.  Disentangled Planning and Control in Vision Based Robotics via Reward Machines , 2020, ArXiv.

[131]  Pierre-Yves Oudeyer,et al.  Behavioral Diversity Generation in Autonomous Exploration through Reuse of Past Experience , 2016, Front. Robot. AI.

[132]  Hyung Suck Cho,et al.  A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning , 1995, IEEE Trans. Syst. Man Cybern..