Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks

Lifelong learning agents aim to learn multiple tasks sequentially over a lifetime. This involves the ability to exploit previous knowledge when learning new tasks and to avoid forgetting. Modulating masks, a specific type of parameter isolation approach, have recently shown promise in both supervised and reinforcement learning. While lifelong learning algorithms have been investigated mainly within a single-agent approach, a question remains on how multiple agents can share lifelong learning knowledge with each other. We show that the parameter isolation mechanism used by modulating masks is particularly suitable for exchanging knowledge among agents in a distributed and decentralized system of lifelong learners. The key idea is that the isolation of specific task knowledge to specific masks allows agents to transfer only specific knowledge on-demand, resulting in robust and effective distributed lifelong learning. We assume fully distributed and asynchronous scenarios with dynamic agent numbers and connectivity. An on-demand communication protocol ensures agents query their peers for specific masks to be transferred and integrated into their policies when facing each task. Experiments indicate that on-demand mask communication is an effective way to implement distributed lifelong reinforcement learning and provides a lifelong learning benefit with respect to distributed RL baselines such as DD-PPO, IMPALA, and PPO+EWC. The system is particularly robust to connection drops and demonstrates rapid learning due to knowledge exchange.

[1]  Praveen K. Pilly,et al.  The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning , 2023, ArXiv.

[2]  Santhosh K. Ramakrishnan,et al.  A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems , 2023, Neural Networks.

[3]  Praveen K. Pilly,et al.  Lifelong Reinforcement Learning with Modulating Masks , 2022, ArXiv.

[4]  A. Soltoggio,et al.  Wasserstein Task Embedding for Measuring Task Similarities , 2022, ArXiv.

[5]  Jorge Armando Mendez Mendez,et al.  Modular Lifelong Reinforcement Learning via Neural Composition , 2022, ICLR.

[6]  Yuanming Shi,et al.  Federated Multi-Task Learning with Non-Stationary Heterogeneous Data , 2022, ICC 2022 - IEEE International Conference on Communications.

[7]  Achim Rettinger,et al.  Signing the Supermask: Keep, Hide, Invert , 2022, ICLR.

[8]  Praveen K. Pilly,et al.  Context Meta-Reinforcement Learning via Neuromodulation , 2021, Neural Networks.

[9]  Chelsea Finn,et al.  Lifelong Robotic Reinforcement Learning by Retaining Experiences , 2021, CoLLAs.

[10]  Stephen J. Roberts,et al.  Same State, Different Task: Continual Reinforcement Learning without Interference , 2021, AAAI.

[11]  Qiang Yang,et al.  Towards Personalized Federated Learning , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Doina Precup,et al.  Towards Continual Reinforcement Learning: A Review and Perspectives , 2020, J. Artif. Intell. Res..

[13]  Timothy M. Hospedales,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Shuguang Cui,et al.  EFL: Elastic Federated Learning on Non-IID Data , 2022, CoLLAs.

[16]  Qihao Zhou,et al.  Federated Reinforcement Learning: Techniques, Applications, and Open Challenges , 2021, Intelligence & Robotics.

[17]  Razvan Pascanu,et al.  Continual World: A Robotic Benchmark For Continual Reinforcement Learning , 2021, NeurIPS.

[18]  Klaus Diepold,et al.  Multi-agent deep reinforcement learning: a survey , 2021, Artificial Intelligence Review.

[19]  Virginia Smith,et al.  Ditto: Fair and Robust Federated Learning Through Personalization , 2020, ICML.

[20]  Jorge Armando Mendez Mendez,et al.  Lifelong Learning of Compositional Structures , 2020, ICLR.

[21]  Jianping Gou,et al.  Knowledge Distillation: A Survey , 2020, International Journal of Computer Vision.

[22]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[23]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[24]  Laurent Itti,et al.  Lifelong Learning Without a Task Oracle , 2020, 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI).

[25]  Ali Farhadi,et al.  Supermasks in Superposition , 2020, NeurIPS.

[26]  Chelsea Finn,et al.  Deep Reinforcement Learning amidst Lifelong Non-Stationarity , 2020, ArXiv.

[27]  Soheil Kolouri,et al.  Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations , 2020, ICLR.

[28]  Aryan Mokhtari,et al.  Personalized Federated Learning: A Meta-Learning Approach , 2020, ArXiv.

[29]  Ali Farhadi,et al.  What’s Hidden in a Randomly Weighted Neural Network? , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ari S. Morcos,et al.  DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames , 2019, ICLR.

[31]  Soheil Kolouri,et al.  Collaborative Learning Through Shared Collective Knowledge and Local Expertise , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[32]  Jakub Konecný,et al.  Improving Federated Learning Personalization via Model Agnostic Meta Learning , 2019, ArXiv.

[33]  Maria-Florina Balcan,et al.  Adaptive Gradient-Based Meta-Learning Methods , 2019, NeurIPS.

[34]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[35]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[36]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[37]  Michael L. Littman,et al.  Policy and Value Transfer in Lifelong Reinforcement Learning , 2018, ICML.

[38]  Michael L. Littman,et al.  State Abstractions for Lifelong Reinforcement Learning , 2018, ICML.

[39]  Pierre-Yves Oudeyer,et al.  How Many Random Seeds? Statistical Power Analysis in Deep Reinforcement Learning Experiments , 2018, ArXiv.

[40]  Yue Zhao,et al.  Federated Learning with Non-IID Data , 2018, ArXiv.

[41]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[42]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[43]  Philip H. S. Torr,et al.  Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence , 2018, ECCV.

[44]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[45]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[46]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[47]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[48]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[50]  Sebastian Risi,et al.  Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks , 2017, Neural Networks.

[51]  Yusen Zhan,et al.  Scalable lifelong reinforcement learning , 2017, Pattern Recognit..

[52]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[53]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[54]  Ameet Talwalkar,et al.  Federated Multi-Task Learning , 2017, NIPS.

[55]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[56]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[57]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[58]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[59]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[60]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[61]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[62]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[63]  Dario Floreano,et al.  Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios , 2008, ALIFE.

[64]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[65]  Gerhard Weiß,et al.  Distributed reinforcement learning , 1995, Robotics Auton. Syst..

[66]  Sebastian Thrun,et al.  A lifelong learning perspective for mobile robot control , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).

[67]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .