Lightweight Learner for Shared Knowledge Lifelong Learning

In Lifelong Learning (LL), agents continually learn as they encounter new conditions and tasks. Most current LL is limited to a single agent that learns tasks sequentially. Dedicated LL machinery is then deployed to mitigate the forgetting of old tasks as new tasks are learned. This is inherently slow. We propose a new Shared Knowledge Lifelong Learning (SKILL) challenge, which deploys a decentralized population of LL agents that each sequentially learn different tasks, with all agents operating independently and in parallel. After learning their respective tasks, agents share and consolidate their knowledge over a decentralized communication network, so that, in the end, all agents can master all tasks. We present one solution to SKILL which uses Lightweight Lifelong Learning (LLL) agents, where the goal is to facilitate efficient sharing by minimizing the fraction of the agent that is specialized for any given task. Each LLL agent thus consists of a common task-agnostic immutable part, where most parameters are, and individual task-specific modules that contain fewer parameters but are adapted to each task. Agents share their task-specific modules, plus summary information ("task anchors") representing their tasks in the common task-agnostic latent space of all agents. Receiving agents register each received task-specific module using the corresponding anchor. Thus, every agent improves its ability to solve new tasks each time new task-specific modules and anchors are received. On a new, very challenging SKILL-102 dataset with 102 image classification tasks (5,033 classes in total, 2,041,225 training, 243,464 validation, and 243,464 test images), we achieve much higher (and SOTA) accuracy over 8 LL baselines, while also achieving near perfect parallelization. Code and data can be found at https://github.com/gyhandy/Shared-Knowledge-Lifelong-Learning

[1]  A. Gupta,et al.  The Unsurprising Effectiveness of Pre-Trained Vision Models for Control , 2022, ICML.

[2]  H. Larochelle,et al.  Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning , 2022, ICML.

[3]  Bing Liu,et al.  Continual Learning of a Mixed Sequence of Similar and Dissimilar Tasks , 2021, NeurIPS.

[4]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[5]  Laurent Itti,et al.  Lifelong Learning Without a Task Oracle , 2020, 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI).

[6]  Joost van de Weijer,et al.  Class-Incremental Learning: Survey and Performance Evaluation on Image Classification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Laurent Itti,et al.  Beneficial Perturbation Network for Designing General Adaptive Artificial Intelligence Systems , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Michael Crawshaw,et al.  Multi-Task Learning with Deep Neural Networks: A Survey , 2020, ArXiv.

[9]  Ali Farhadi,et al.  Supermasks in Superposition , 2020, NeurIPS.

[10]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[11]  Eunho Yang,et al.  Federated Continual Learning with Weighted Inter-client Transfer , 2020, ICML.

[12]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[13]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[15]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[16]  Larry P. Heck,et al.  Class-incremental Learning via Deep Model Consolidation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Bruno A. Olshausen,et al.  Superposition of many models into one , 2019, NeurIPS.

[18]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[19]  Laurent Itti,et al.  Closed-Loop Memory GAN for Continual Learning , 2018, IJCAI.

[20]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[21]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[22]  Jascha Sohl-Dickstein,et al.  Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[23]  David Rolnick,et al.  Measuring and regularizing networks in function space , 2018, ICLR.

[24]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[25]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[26]  B. Dosher,et al.  Visual Perceptual Learning and Models. , 2017, Annual review of vision science.

[27]  Qiang Yang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[28]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[29]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[30]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[31]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[32]  J. Bowers Grandmother cells and localist representations: a review of current thinking , 2017 .

[33]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[34]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[38]  Babak Saleh,et al.  Large-scale Classification of Fine-Art Paintings: Learning The Right Metric on The Right Feature , 2015, ArXiv.

[39]  Peter N Steinmetz,et al.  Distributed Representation of Visual Objects by Single Neurons in the Human Brain , 2015, The Journal of Neuroscience.

[40]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[41]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[42]  Sachin S. Deshmukh,et al.  Perirhinal cortex represents nonspatial, but not spatial, information in rats foraging in the presence of objects: Comparison with lateral entorhinal cortex , 2012, Hippocampus.

[43]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[44]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[47]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[48]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[49]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[50]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[51]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[52]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[53]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[54]  D. Scott Perceptual learning. , 1974, Queen's nursing journal.