Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively

In lifelong learning systems, especially those based on artificial neural networks, one of the biggest obstacles is the severe inability to retain old knowledge as new information is encountered. This phenomenon is known as catastrophic forgetting. In this paper, we present a new connectionist model, the Sequential Neural Coding Network, and its learning procedure, grounded in the neurocognitive theory of predictive coding. The architecture experiences significantly less forgetting as compared to standard neural models and outperforms a variety of previously proposed remedies and methods when trained across multiple task datasets in a stream-like fashion. The promising performance demonstrated in our experiments offers motivation that directly incorporating mechanisms prominent in real neuronal systems, such as competition, sparse activation patterns, and iterative input processing, can create viable pathways for tackling the challenge of lifelong machine learning.

[1]  Daniel Kifer,et al.  Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations , 2018, ArXiv.

[2]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Anthony V. Robins,et al.  Consolidation in Neural Networks and in the Sleeping Brain , 1996, Connect. Sci..

[4]  Robert M. French,et al.  Semi-distributed Representations and Catastrophic Forgetting in Connectionist Networks , 1992 .

[5]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[6]  Stefan Wermter,et al.  Lifelong Learning of Action Representations with Deep Neural Self-Organization , 2017, AAAI Spring Symposia.

[7]  Eder Santana,et al.  Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Shuai Wang,et al.  Learning Cumulatively to Become More Knowledgeable , 2016, KDD.

[9]  James L. McClelland,et al.  An interactive activation model of context effects in letter perception: I. An account of basic findings. , 1981 .

[10]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[11]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[12]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[13]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  David Reitter,et al.  Learning to Adapt by Minimizing Discrepancy , 2017, ArXiv.

[16]  Peter Elias,et al.  Predictive coding-I , 1955, IRE Trans. Inf. Theory.

[17]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[18]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[19]  Marc W. Howard,et al.  A distributed representation of temporal context , 2002 .

[20]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[21]  Alexander Gepperth,et al.  A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems , 2016, Cognitive Computation.

[22]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[23]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[24]  R. French Catastrophic Forgetting in Connectionist Networks , 2006 .

[25]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[26]  Alexandros Karatzoglou,et al.  Overcoming Catastrophic Forgetting with Hard Attention to the Task , 2018 .

[27]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[28]  Stephan Lewandowsky ON THE RELATION BETWEEN CATASTROPHIC INTERFERENCE AND GENERALIZATION IN CONNECTIONIST NETWORKS , 1994 .

[29]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[30]  W. Gerstner,et al.  Hebbian plasticity requires compensatory processes on multiple timescales , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[31]  Daniel Kifer,et al.  Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Yann LeCun,et al.  Structured sparse coding via lateral inhibition , 2011, NIPS.

[33]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[34]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[35]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[36]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[37]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[38]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[39]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[40]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[41]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[42]  Jonathan D. Power,et al.  Neural plasticity across the lifespan , 2017, Wiley interdisciplinary reviews. Developmental biology.

[43]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[44]  Anthony V. Robins,et al.  Catastrophic forgetting in neural networks: the role of rehearsal mechanisms , 1993, Proceedings 1993 The First New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems.

[45]  A. Schousboe,et al.  Neurotransmitters as developmental signals , 1991, Neurochemistry International.

[46]  Daniel Kifer,et al.  Conducting Credit Assignment by Aligning Local Representations , 2018, 1803.01834.

[47]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[48]  J. Changeux,et al.  A Neuronal Model of Predictive Coding Accounting for the Mismatch Negativity , 2012, The Journal of Neuroscience.

[49]  H. Adesnik,et al.  Lateral competition for cortical space by layer-specific horizontal circuits , 2010, Nature.

[50]  Rafal Bogacz,et al.  An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity , 2017, Neural Computation.

[51]  Randall C. O'Reilly,et al.  Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning , 2001, Neural Computation.

[52]  Robert M. French,et al.  Using Semi-Distributed Representations to Overcome Catastrophic Forgetting in Connectionist Networks , 1991 .

[53]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.