Conducting Credit Assignment by Aligning Local Representations

The use of back-propagation and its variants to train deep networks is often problematic for new users, with issues such as exploding gradients, vanishing gradients, and high sensitivity to weight initialization strategies often making networks difficult to train. In this paper, we present Local Representation Alignment (LRA), a training procedure that is much less sensitive to bad initializations, does not require modifications to the network architecture, and can be adapted to networks with highly nonlinear and discrete-valued activation functions. Furthermore, we show that one variation of LRA can start with a null initialization of network weights and still successfully train networks with a wide variety of nonlinearities, including tanh, ReLU-6, softplus, signum and others that are more biologically plausible. Experiments on MNIST and Fashion MNIST validate the performance of the algorithm and show that LRA can train networks robustly and effectively, succeeding even when back-propagation fails and outperforming other alternative learning algorithms, such as target propagation and feedback alignment.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[3]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[6]  Moshe Bar,et al.  Predictive Feedback and Conscious Visual Experience , 2012, Front. Psychology.

[7]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[8]  C. Stefanis Interneuronal mechanisms in the cortex. , 1969, UCLA forum in medical sciences.

[9]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[13]  David Reitter,et al.  Learning to Adapt by Minimizing Discrepancy , 2017, ArXiv.

[14]  Laurens van der Maaten,et al.  Barnes-Hut-SNE , 2013, ICLR.

[15]  Miguel Á. Carreira-Perpiñán,et al.  Distributed optimization of deeply nested systems , 2012, AISTATS.

[16]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[17]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[18]  David Reitter,et al.  Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization , 2015, ECML/PKDD.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[21]  Yanjun Qi,et al.  Unsupervised Feature Learning by Deep Sparse Coding , 2013, SDM.

[22]  S. Grossberg,et al.  How does a brain build a cognitive code? , 1980, Psychological review.

[23]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[24]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[25]  Joelle Pineau,et al.  Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[26]  Cheng Soon Ong,et al.  A Modular Theory of Feature Learning , 2016, ArXiv.

[27]  Aki Vehtari,et al.  Expectation propagation for neural networks with sparsity-promoting priors , 2013, J. Mach. Learn. Res..

[28]  Daniel Kifer,et al.  Unifying Adversarial Training Algorithms with Data Gradient Regularization , 2017, Neural Computation.

[29]  Marc'Aurelio Ranzato,et al.  Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition , 2010, ArXiv.

[30]  T. Lømo,et al.  Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output. , 1969, UCLA forum in medical sciences.

[31]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[32]  Yoshua Bengio,et al.  How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.

[33]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[34]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[35]  Y. L. Cun Learning Process in an Asymmetric Threshold Network , 1986 .

[36]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[37]  Professor Dr. John C. Eccles,et al.  The Cerebellum as a Neuronal Machine , 1967, Springer Berlin Heidelberg.

[38]  Surya Ganguli,et al.  Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net , 2017, NIPS.

[39]  S. Grossberg How does a brain build a cognitive code , 1980 .

[40]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[41]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[42]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[43]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[44]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[45]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[46]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[47]  David Reitter,et al.  Online Semi-Supervised Learning with Deep Hybrid Boltzmann Machines and Denoising Autoencoders , 2015, ArXiv.

[48]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[49]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[50]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[51]  Aaron C. Courville,et al.  Understanding Representations Learned in Deep Architectures , 2010 .

[52]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[53]  Javier R. Movellan,et al.  Contrastive Hebbian Learning in the Continuous Hopfield Model , 1991 .

[54]  Randall C. O'Reilly,et al.  Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning , 2001, Neural Computation.

[55]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[56]  David Sussillo,et al.  Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.

[57]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[58]  Joachim M. Buhmann,et al.  Kickback Cuts Backprop's Red-Tape: Biologically Plausible Credit Assignment in Neural Networks , 2014, AAAI.

[59]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[60]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[61]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[62]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[63]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[64]  Robert C. Williamson,et al.  A Theory of Feature Learning , 2015, ArXiv.

[65]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[66]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[67]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.