Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, training these models often relies on backpropagation through time (BPTT), which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of backpropagation itself does not permit the use of nondifferentiable activation functions and is inherently sequential, making parallelization of the underlying training process difficult. Here, we propose the parallel temporal neural coding network (P-TNCN), a biologically inspired model trained by the learning algorithm we call local representation alignment. It aims to resolve the difficulties and problems that plague recurrent networks trained by BPTT. The architecture requires neither unrolling in time nor the derivatives of its internal activation functions. We compare our model and learning procedure with other BPTT alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization. We show that it outperforms these on-sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we denote as Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, outperform full BPTT as well as variants such as sparse attentive backtracking. Significantly, the hidden unit correction phase of P-TNCN allows it to adapt to new data sets even if its synaptic weights are held fixed (zero-shot adaptation) and facilitates retention of prior generative knowledge when faced with a task sequence. We present results that show the P-TNCN’s ability to conduct zero-shot adaptation and online continual sequence modeling.

[1]  Terry Winograd,et al.  Understanding natural language , 1974 .

[2]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[5]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[6]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[7]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[8]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[9]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[12]  José Carlos Príncipe,et al.  Incremental backpropagation learning networks , 1996, IEEE Trans. Neural Networks.

[13]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[14]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[17]  R. O’Reilly Six principles for biologically based computational models of cortical cognition , 1998, Trends in Cognitive Sciences.

[18]  E. Miller,et al.  Prospective Coding for Objects in Primate Prefrontal Cortex , 1999, The Journal of Neuroscience.

[19]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Henry Markram,et al.  On the computational power of circuits of spiking neurons , 2004, J. Comput. Syst. Sci..

[21]  B. McNaughton,et al.  Local Sensory Cues and Place Cell Directionality: Additional Evidence of Prospective Coding in the Hippocampus , 2004, The Journal of Neuroscience.

[22]  Jochen J. Steil,et al.  Analyzing the weight dynamics of recurrent learning algorithms , 2005, Neurocomputing.

[23]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[24]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[25]  Wolfgang Prinz,et al.  Prospective coding in event representation , 2007, Cognitive Processing.

[26]  Christopher Joseph Pal,et al.  Semi-supervised classification with hybrid generative/discriminative methods , 2007, KDD '07.

[27]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[28]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[29]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[30]  James M. Rehg,et al.  Temporal causality for the analysis of visual events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Thomas J. Walsh,et al.  Integrating Sample-Based Planning and Model-Based Reinforcement Learning , 2010, AAAI.

[32]  João Gama,et al.  On evaluating stream learning algorithms , 2013, Machine Learning.

[33]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  G. Pourtois,et al.  What is Bottom-Up and What is Top-Down in Predictive Coding? , 2013, Front. Psychol..

[35]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[36]  Moshe Bar,et al.  Predictive Feedback and Conscious Visual Experience , 2012, Front. Psychology.

[37]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[38]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[39]  Shaogang Gong,et al.  Unsupervised Domain Adaptation for Zero-Shot Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Guillaume Charpiat,et al.  Training recurrent networks online without backtracking , 2015, ArXiv.

[41]  David Reitter,et al.  Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[42]  José Carlos Príncipe,et al.  Context Dependent Encoding Using Convolutional Dynamic Networks , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[44]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[45]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[46]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[47]  David Reitter,et al.  Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders , 2015, ArXiv.

[48]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[49]  Eder Santana,et al.  Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[51]  David Reitter,et al.  Learning to Adapt by Minimizing Discrepancy , 2017, ArXiv.

[52]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[53]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[54]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks , 2017, ArXiv.

[55]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[57]  Joelle Pineau,et al.  Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[58]  David Reitter,et al.  Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[59]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[60]  Daniel Kifer,et al.  Conducting Credit Assignment by Aligning Local Representations , 2018, 1803.01834.

[61]  Angelika Steger,et al.  Approximating Real-Time Recurrent Learning with Random Kronecker Factors , 2018, NeurIPS.

[62]  Yann Ollivier,et al.  Unbiased Online Recurrent Optimization , 2017, ICLR.

[63]  C. Lee Giles,et al.  A Neural Temporal Model for Human Motion Prediction , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[65]  Guang Shi,et al.  Fast Inference Predictive Coding: A Novel Model for Constructing Deep Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[66]  Jun Tani,et al.  Dealing With Large-Scale Spatio-Temporal Patterns in Imitative Interaction Between a Robot and a Human by Using the Predictive Coding Framework , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[67]  Xin Su,et al.  Generative Memory for Lifelong Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Xiangmin Xu,et al.  Hierarchical Lifelong Learning by Sharing Representations and Integrating Hypothesis , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.