Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges

Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks while addressing the limitations of standard deep learning approaches, such as catastrophic forgetting. In this work, we investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents. We pose two hypotheses: (1) task-agnostic methods might provide advantages in settings with limited data, computation, or high dimensionality, and (2) faster adaptation may be particularly beneficial in continual learning settings, helping to mitigate the effects of catastrophic forgetting. To investigate these hypotheses, we introduce a replay-based recurrent reinforcement learning (3RL) methodology for task-agnostic CL agents. We assess 3RL on a synthetic task and the Meta-World benchmark, which includes 50 unique manipulation tasks. Our results demonstrate that 3RL outperforms baseline methods and can even surpass its multi-task equivalent in challenging settings with high dimensionality. We also show that the recurrent task-agnostic agent consistently outperforms or matches the performance of its transformer-based counterpart. These findings provide insights into the advantages of task-agnostic CL over task-aware MTL approaches and highlight the potential of task-agnostic methods in resource-constrained, high-dimensional, and multi-task environments.

[1]  Marc'Aurelio Ranzato,et al.  Towards Compute-Optimal Transfer Learning , 2023, ArXiv.

[2]  Diego de Las Casas,et al.  NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research , 2022, ArXiv.

[3]  Razvan Pascanu,et al.  Disentangling Transfer in Continual Reinforcement Learning , 2022, NeurIPS.

[4]  Jorge Armando Mendez Mendez,et al.  How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition , 2022, ArXiv.

[5]  Sergey Levine,et al.  CoMPS: Continual Meta Policy Search , 2021, ICLR.

[6]  Stephen J. Roberts,et al.  Same State, Different Task: Continual Reinforcement Learning without Interference , 2021, AAAI.

[7]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, J. Artif. Intell. Res..

[8]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[10]  G. Qi,et al.  Pretrained Language Model in Continual Learning: A Comparative Study , 2022, ICLR.

[11]  Marc G. Bellemare,et al.  Deep Reinforcement Learning at the Edge of the Statistical Precipice , 2021, NeurIPS.

[12]  Massimo Caccia,et al.  Sequoia: A Software Framework to Unify Continual Learning Research , 2021, ArXiv.

[13]  Sergey Levine,et al.  Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[14]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[15]  Razvan Pascanu,et al.  Continual World: A Robotic Benchmark For Continual Reinforcement Learning , 2021, NeurIPS.

[16]  Massimo Caccia,et al.  Understanding Continual Learning Settings with Data Distribution Drift Analysis , 2021, ArXiv.

[17]  D. Bacciu,et al.  Continual Learning for Recurrent Neural Networks: an Empirical Evaluation , 2021, Neural Networks.

[18]  Joelle Pineau,et al.  Multi-Task Reinforcement Learning with Context-based Representations , 2021, ICML.

[19]  Benjamin F. Grewe,et al.  Continual learning in recurrent neural networks , 2020, ICLR.

[20]  Elad Hoffer,et al.  Task Agnostic Continual Learning Using Online Variational Bayes , 2018, 1803.10123.

[21]  Ruslan Salakhutdinov,et al.  Recurrent Model-Free RL is a Strong Baseline for Many POMDPs , 2021, ArXiv.

[22]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[23]  Eric Eaton,et al.  Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting , 2020, NeurIPS.

[24]  Wenhao Ding,et al.  Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes , 2020, NeurIPS.

[25]  Chelsea Finn,et al.  Deep Reinforcement Learning amidst Lifelong Non-Stationarity , 2020, ArXiv.

[26]  Timoth'ee Lesort,et al.  Continual Learning: Tackling Catastrophic Forgetting in Deep Neural Networks with Replay Processes , 2020, ArXiv.

[27]  Murray Shanahan,et al.  Continual Reinforcement Learning with Multi-Timescale Replay , 2020, ArXiv.

[28]  Yi Wu,et al.  Multi-Task Reinforcement Learning with Soft Modularization , 2020, NeurIPS.

[29]  Sergey Levine,et al.  DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction , 2020, NeurIPS.

[30]  David Vázquez,et al.  Online Fast Adaptation and Knowledge Accumulation (OSAKA): a New Approach to Continual Learning , 2020, NeurIPS.

[31]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[32]  Alex Smola,et al.  Meta-Q-Learning , 2019, ICLR.

[33]  Natalia Díaz Rodríguez,et al.  Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges , 2019, Inf. Fusion.

[34]  David Filliat,et al.  Regularization Shortcomings for Continual Learning , 2019, ArXiv.

[35]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[36]  S. Levine,et al.  Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning , 2019, CoRL.

[37]  Francisco S. Melo,et al.  Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning , 2019, GCAI.

[38]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[39]  David Filliat,et al.  DisCoRL: Continual Reinforcement Learning via Policy Distillation , 2019, ArXiv.

[40]  Tayo Obafemi-Ajayi,et al.  Recurrent Network and Multi-arm Bandit Methods for Multi-task Learning without Task Specification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[41]  Yee Whye Teh,et al.  Task Agnostic Continual Learning via Meta Learning , 2019, ArXiv.

[42]  Efthymios Tzinis,et al.  Continual Learning of New Sound Classes Using Generative Replay , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[43]  Mikhail S. Burtsev,et al.  Continual and Multi-task Reinforcement Learning With Shared Episodic Memory , 2019, ArXiv.

[44]  Alexander J. Smola,et al.  P3O: Policy-on Policy-off Policy Optimization , 2019, UAI.

[45]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[46]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[47]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[48]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[49]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[50]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[51]  Matteo Hessel,et al.  Deep Reinforcement Learning and the Deadly Triad , 2018, ArXiv.

[52]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[53]  Thomas Wolf,et al.  Continuous Learning in a Hierarchical Multiscale Neural Network , 2018, ACL.

[54]  David Isele,et al.  Selective Experience Replay for Lifelong Learning , 2018, AAAI.

[55]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[56]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[57]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[58]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[60]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[61]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[62]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[63]  Finale Doshi-Velez,et al.  Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations , 2013, IJCAI.

[64]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[65]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[66]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[67]  Daniele Calandriello,et al.  Sparse multi-task reinforcement learning , 2014, Intelligenza Artificiale.

[68]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[69]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[70]  Manfred Huber,et al.  Improving tractability of POMDPs by separation of decision and perceptual processes , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[71]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[72]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[73]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[74]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[75]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[76]  Dit-Yan Yeung,et al.  Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making , 2001, Sequence Learning.

[77]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[78]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[79]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[80]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[81]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[82]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[83]  Tom M. Mitchell,et al.  Reinforcement learning with hidden states , 1993 .

[84]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .