GateON: an unsupervised method for large scale continual learning

The objective of continual learning (CL) is to learn tasks sequentially without retraining on earlier tasks. However, when subjected to CL, traditional neural networks exhibit catastrophic forgetting and limited generalization. To overcome these problems, we introduce a novel method called 'Gate and Obstruct Network' (GateON). GateON combines learnable gating of activity and online estimation of parameter relevance to safeguard crucial knowledge from being overwritten. Our method generates partially overlapping pathways between tasks which permits forward and backward transfer during sequential learning. GateON addresses the issue of network saturation after parameter fixation by a re-activation mechanism of fixed neurons, enabling large-scale continual learning. GateON is implemented on a wide range of networks (fully-connected, CNN, Transformers), has low computational complexity, effectively learns up to 100 MNIST learning tasks, and achieves top-tier results for pre-trained BERT in CL-based NLP tasks.

[1]  C. Harvey,et al.  Representational drift: Emerging theories for continual learning and experimental future directions , 2022, Current Opinion in Neurobiology.

[2]  Andrew M. Saxe,et al.  Orthogonal representations for robust context-dependent task performance in brains and neural networks , 2022, Neuron.

[3]  Lucas O. Souza,et al.  Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments , 2021, Frontiers in Neurorobotics.

[4]  Bing Liu,et al.  Continual Learning with Knowledge Transfer for Sentiment Classification , 2021, ECML/PKDD.

[5]  Bing Liu,et al.  Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning , 2021, NeurIPS.

[6]  P. Chaudhari,et al.  Model Zoo: A Growing Brain That Learns Continually , 2021, ICLR.

[7]  Bing Liu,et al.  Adapting BERT for Continual Learning of a Sequence of Aspect Sentiment Classification Tasks , 2021, NAACL.

[8]  Alan Yuille,et al.  Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping , 2021, ArXiv.

[9]  Wulfram Gerstner,et al.  Learning in Volatile Environments With the Bayes Factor Surprise , 2021, Neural Computation.

[10]  Seyed Iman Mirzadeh,et al.  Linear Mode Connectivity in Multitask and Continual Learning , 2020, ICLR.

[11]  Matthias De Lange,et al.  Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Seyed Iman Mirzadeh,et al.  Understanding the Role of Training Regimes in Continual Learning , 2020, NeurIPS.

[13]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[14]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[15]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Hung-yi Lee,et al.  LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning , 2019, ICLR 2020.

[17]  David Filliat,et al.  Continual Learning for Robotics , 2019, Inf. Fusion.

[18]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[19]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[20]  Christopher Summerfield,et al.  Comparing continual task learning in minds and machines , 2018, Proceedings of the National Academy of Sciences.

[21]  Zhanxing Zhu,et al.  Reinforced Continual Learning , 2018, NeurIPS.

[22]  Murray Shanahan,et al.  Continual Reinforcement Learning with Complex Synapses , 2018, ICML.

[23]  Alexandros Karatzoglou,et al.  Overcoming catastrophic forgetting with hard attention to the task , 2018, ICML.

[24]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[27]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[28]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[29]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Tinne Tuytelaars,et al.  Expert Gate: Lifelong Learning with a Network of Experts , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[32]  S. Arun,et al.  Selective IT neurons are selective along many dimensions , 2016, Journal of neurophysiology.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  D. Leopold,et al.  Face-selective neurons maintain consistent visual responses across months , 2014, Proceedings of the National Academy of Sciences.

[35]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[36]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[37]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[38]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[39]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[40]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[41]  T. Tanaka,et al.  Adaptive resonance theory , 1997, Scholarpedia.

[42]  W. Abraham,et al.  Memory retention – the synaptic stability versus plasticity dilemma , 2005, Trends in Neurosciences.

[43]  ZhaoHong Han,et al.  EFFECTS OF THE SECOND LANGUAGE ON THE FIRST , 2004, Studies in Second Language Acquisition.

[44]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[45]  Zoltán Dienes,et al.  Transfer of implicit knowledge across domains? How implicit and how abstract? , 1997 .

[46]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[47]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[48]  Anthony V. Robins,et al.  Catastrophic forgetting in neural networks: the role of rehearsal mechanisms , 1993, Proceedings 1993 The First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[49]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[50]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[51]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[52]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[53]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[54]  Eric T. Nalisnick,et al.  Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality , 2019 .

[55]  Benjamin Frederick Goodrich,et al.  Neuron Clustering for Mitigating Catastrophic Forgetting in Supervised and Reinforcement Learning , 2015 .

[56]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[57]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[58]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.