How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation

We propose to exploit {\em reconstruction} as a layer-local training signal for deep learning. Reconstructions can be propagated in a form of target propagation playing a role similar to back-propagation but helping to reduce the reliance on derivatives in order to perform credit assignment across many levels of possibly strong non-linearities (which is difficult for back-propagation). A regularized auto-encoder tends produce a reconstruction that is a more likely version of its input, i.e., a small move in the direction of higher likelihood. By generalizing gradients, target propagation may also allow to train deep networks with discrete hidden units. If the auto-encoder takes both a representation of input and target (or of any side information) in input, then its reconstruction of input representation provides a target towards a representation that is more likely, conditioned on all the side information. A deep auto-encoder decoding path generalizes gradient propagation in a learned way that can could thus handle not just infinitesimal changes but larger, discrete changes, hopefully allowing credit assignment through a long chain of non-linear operations. In addition to each layer being a good auto-encoder, the encoder also learns to please the upper layers by transforming the data into a space where it is easier to model by them, flattening manifolds and disentangling factors. The motivations and theoretical justifications for this approach are laid down in this paper, along with conjectures that will have to be verified either mathematically or experimentally, including a hypothesis stating that such auto-encoder mediated target propagation could play in brains the role of credit assignment through many non-linear, noisy and discrete transformations.

[1]  Yann LeCun,et al.  Learning processes in an asymmetric threshold network , 1986 .

[2]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[3]  Yann LeCun PhD thesis: Modeles connexionnistes de l'apprentissage (connectionist learning models) , 1987 .

[4]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[5]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[6]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[7]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[8]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[9]  Geoffrey E. Hinton,et al.  Learning Mixture Models of Spatial Coherence , 1993, Neural Computation.

[10]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[11]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[13]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Aapo Hyvärinen,et al.  Temporal Coherence, Natural Image Sequences, and the Visual Cortex , 2002, NIPS.

[16]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[17]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[18]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[19]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[20]  Konrad Paul Kording,et al.  How are complex cell properties adapted to the statistics of natural stimuli? , 2004, Journal of neurophysiology.

[21]  Pascal Vincent,et al.  Non-Local Manifold Parzen Windows , 2005, NIPS.

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[24]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[25]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[26]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[28]  Bruno A. Olshausen,et al.  Learning Transformational Invariants from Natural Movies , 2008, NIPS.

[29]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[30]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[31]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[32]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[33]  Kevan A. C. Martin,et al.  Whose Cortical Column Would that Be? , 2010, Front. Neuroanat..

[34]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[35]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[36]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[37]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[38]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[39]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[40]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[41]  Richard Hans Robert Hahnloser,et al.  A Hebbian learning rule gives rise to mirror neurons and links them to control theoretic inverse models , 2013, Front. Neural Circuits.

[42]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[43]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[44]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[45]  Yoshua Bengio,et al.  Stochastic Ratio Matching of RBMs for Sparse High-Dimensional Inputs , 2013, NIPS.

[46]  Yoshua Bengio,et al.  Better Mixing via Deep Representations , 2012, ICML.

[47]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[48]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[49]  Li Yao,et al.  Multimodal Transitions for Generative Stochastic Networks , 2013, ICLR.

[50]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[51]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[52]  H. Jaeger,et al.  Smart Decisions by Small Adjustments: Iterating Denoising Autoencoders , 2014 .

[53]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[54]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[55]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[56]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[57]  Hugo Larochelle,et al.  A Deep and Tractable Density Estimator , 2013, ICML.

[58]  Max Welling,et al.  Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[59]  Richard Hans Robert Hahnloser,et al.  Evidence for a causal inverse model in an avian cortico-basal ganglia circuit , 2014, Proceedings of the National Academy of Sciences.

[60]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[61]  Yoshua Bengio,et al.  Knowledge Matters: Importance of Prior Information for Optimization , 2013, J. Mach. Learn. Res..