Learning to Adapt by Minimizing Discrepancy

We explore whether useful temporal neural generative models can be learned from sequential data without back-propagation through time. We investigate the viability of a more neurocognitively-grounded approach in the context of unsupervised generative modeling of sequences. Specifically, we build on the concept of predictive coding, which has gained influence in cognitive science, in a neural framework. To do so we develop a novel architecture, the Temporal Neural Coding Network, and its learning algorithm, Discrepancy Reduction. The underlying directed generative model is fully recurrent, meaning that it employs structural feedback connections and temporal feedback connections, yielding information propagation cycles that create local learning signals. This facilitates a unified bottom-up and top-down approach for information transfer inside the architecture. Our proposed algorithm shows promise on the bouncing balls generative modeling problem. Further experiments could be conducted to explore the strengths and weaknesses of our approach.

[1]  David Reitter,et al.  Learning Simpler Language Models with the Differential State Framework , 2017, Neural Computation.

[2]  Robert C. Williamson,et al.  A Theory of Feature Learning , 2015, ArXiv.

[3]  Kyunghyun Cho,et al.  Simple Sparsification Improves Sparse Denoising Autoencoders in Denoising Highly Corrupted Images , 2013, ICML.

[4]  Joelle Pineau,et al.  Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[5]  Cheng Soon Ong,et al.  A Modular Theory of Feature Learning , 2016, ArXiv.

[6]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[7]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[8]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[9]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[10]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[11]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[12]  Gerben Van Den Broeke What auto-encoders could learn from brains - Generation as feedback in deep unsupervised learning and inference , 2016 .

[13]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[14]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[15]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[16]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[17]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Marc'Aurelio Ranzato,et al.  Training Language Models Using Target-Propagation , 2017, ArXiv.

[20]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[21]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[22]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[23]  A. Reber Implicit learning and tacit knowledge , 1993 .

[24]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[25]  David Reitter,et al.  Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders , 2015, ArXiv.

[26]  Axel Cleeremans,et al.  Can sequence learning be implicit? New evidence with the process dissociation procedure , 2001, Psychonomic bulletin & review.

[27]  Pierre Baldi,et al.  Learning in the Machine: Random Backpropagation and the Learning Channel , 2016, ArXiv.

[28]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[29]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[30]  Daniel Kifer,et al.  Unifying Adversarial Training Algorithms with Data Gradient Regularization , 2017, Neural Computation.

[31]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[32]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[33]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[34]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[35]  David Reitter,et al.  Learning a Deep Hybrid Model for Semi-Supervised Text Classification , 2015, EMNLP.

[36]  Yoshua Bengio,et al.  Towards Biologically Plausible Deep Learning , 2015, ArXiv.

[37]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[38]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[39]  Guillaume Charpiat,et al.  Training recurrent networks online without backtracking , 2015, ArXiv.

[40]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[41]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[42]  Yoshua Bengio,et al.  Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery , 2012, ArXiv.

[43]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[44]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[45]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[48]  Yann LeCun,et al.  Structured sparse coding via lateral inhibition , 2011, NIPS.

[49]  Nan Rosemary Ke,et al.  The Variational Walkback Algorithm , 2016 .

[50]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[51]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[52]  Silvio Savarese,et al.  Structured Recurrent Temporal Restricted Boltzmann Machines , 2014, ICML.

[53]  Somnath Paul,et al.  Event-Driven Random Back-Propagation: Enabling Neuromorphic Deep Learning Machines , 2016, Front. Neurosci..

[54]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[55]  G. Pourtois,et al.  What is Bottom-Up and What is Top-Down in Predictive Coding? , 2013, Front. Psychol..

[56]  Yann Ollivier,et al.  Unbiased Online Recurrent Optimization , 2017, ICLR.

[57]  David Reitter,et al.  Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[58]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[59]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[60]  Pierre Baldi,et al.  Learning in the machine: Random backpropagation and the deep learning channel , 2016, Artif. Intell..

[61]  Eder Santana,et al.  Exploiting Spatio-Temporal Structure with Recurrent Winner-Take-All Networks , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[62]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[63]  José Carlos Príncipe,et al.  Deep Predictive Coding Networks , 2013, ICLR.

[64]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[65]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[66]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[67]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[68]  Wulfram Gerstner,et al.  SPIKING NEURON MODELS Single Neurons , Populations , Plasticity , 2002 .

[69]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[70]  Moshe Bar,et al.  Predictive Feedback and Conscious Visual Experience , 2012, Front. Psychology.

[71]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[72]  Peter Elias,et al.  Predictive coding-I , 1955, IRE Trans. Inf. Theory.