Continual World: A Robotic Benchmark For Continual Reinforcement Learning

Continual learning (CL) — the ability to continuously learn, building on previously acquired knowledge — is a natural requirement for long-lived autonomous reinforcement learning (RL) agents. While building such agents, one needs to balance opposing desiderata, such as constraints on capacity and compute, the ability to not catastrophically forget, and to exhibit positive transfer on new tasks. Understanding the right trade-off is conceptually and computationally challenging, which we argue has led the community to overly focus on catastrophic forgetting. In response to these issues, we advocate for the need to prioritize forward transfer and propose Continual World, a benchmark consisting of realistic and meaningfully diverse robotic tasks built on top of Meta-World [51] as a testbed. Following an in-depth empirical evaluation of existing CL methods, we pinpoint their limitations and highlight unique algorithmic challenges in the RL setting. Our benchmark aims to provide a meaningful and computationally inexpensive challenge for the community and thus help better understand existing and future solutions.

[1]  Eugenio Culurciello,et al.  Continual Reinforcement Learning in 3D Non-stationary Environments , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Sergey Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[3]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[4]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[5]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[6]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[7]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, ArXiv.

[8]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[9]  David Filliat,et al.  Don't forget, there is more than forgetting: new metrics for Continual Learning , 2018, ArXiv.

[10]  Eric Eaton,et al.  Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting , 2020, NeurIPS.

[11]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[12]  Ferenc Huszár Note on the quadratic penalties in elastic weight consolidation , 2018, Proceedings of the National Academy of Sciences.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[15]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[16]  Joel Veness,et al.  The Forget-me-not Process , 2016, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Tom Mitchell,et al.  Jelly Bean World: A Testbed for Never-Ending Learning , 2020, ICLR.

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Razvan Pascanu,et al.  Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[21]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[25]  Ryan P. Adams,et al.  On Warm-Starting Neural Network Training , 2020, NeurIPS.

[26]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[27]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[29]  Richard E. Turner,et al.  Continual Learning with Adaptive Weights (CLAW) , 2020, ICLR.

[30]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[31]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[32]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[33]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[34]  Yoshua Bengio,et al.  CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning , 2020, ICLR.

[35]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[36]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[37]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[38]  Murray Shanahan,et al.  Policy Consolidation for Continual Reinforcement Learning , 2019, ICML.

[39]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[40]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[41]  Razvan Pascanu,et al.  Adapting Auxiliary Losses Using Gradient Similarity , 2018, ArXiv.

[42]  D. Hassabis,et al.  Neuroscience-Inspired Artificial Intelligence , 2017, Neuron.

[43]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[44]  Henry Zhu,et al.  Soft Actor-Critic Algorithms and Applications , 2018, ArXiv.

[45]  Ilya Sutskever,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[46]  Aaron Courville,et al.  Continuous Coordination As a Realistic Scenario for Lifelong Learning , 2021, ICML.

[47]  Pieter Abbeel,et al.  Adaptive Online Planning for Continual Lifelong Learning , 2019, ArXiv.

[48]  Fan-Keng Sun,et al.  LAMOL: LAnguage MOdeling for Lifelong Language Learning , 2020, ICLR.

[49]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[50]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.