Random synaptic feedback weights support error backpropagation for deep learning

The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron's axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[3]  Stephen Grossberg,et al.  From Interactive Activation to Adaptive Resonance , 1987 .

[4]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[5]  Yu He,et al.  Asymptotic Convergence of Backpropagation , 1989, Neural Computation.

[6]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[7]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[10]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Alan F. Murray,et al.  International Joint Conference on Neural Networks , 1993 .

[12]  P. S. Sastry,et al.  Analysis of the back-propagation algorithm with momentum , 1994, IEEE Trans. Neural Networks.

[13]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[14]  Proceedings of the IEEE , 2018, IEEE Journal of Emerging and Selected Topics in Power Electronics.

[15]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[16]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[17]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[18]  C. Bell,et al.  The generation and subtraction of sensory expectations within cerebellum-like structures. , 1997, Brain, behavior and evolution.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Tatsuya Kimura,et al.  Cerebellar complex spikes encode both destinations and errors in arm movements , 1998, Nature.

[21]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[22]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[23]  M. Ito,et al.  Cerebellar long-term depression: characterization, signal transduction, and functional roles. , 2001, Physiological reviews.

[24]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[25]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[26]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[27]  Geoffrey E. Hinton The ups and downs of Hebb synapses. , 2003 .

[28]  C. Hansel,et al.  Bidirectional Parallel Fiber Plasticity in the Cerebellum under Climbing Fiber Control , 2004, Neuron.

[29]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[30]  Konrad P. Körding,et al.  Supervised and Unsupervised Learning with Two Sites of Synaptic Integration , 2001, Journal of Computational Neuroscience.

[31]  Pieter R. Roelfsema,et al.  Attention-Gated Reinforcement Learning of Internal Representations for Classification , 2005, Neural Computation.

[32]  Xiaohui Xie,et al.  Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks , 2003, Neural Computation.

[33]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[34]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[35]  P. J. Sjöström,et al.  A Cooperative Switch Determines the Sign of Synaptic Plasticity in Distal Dendrites of Neocortical Pyramidal Neurons , 2006, Neuron.

[36]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[37]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[38]  A. Kirkwood,et al.  Neuromodulators Control the Polarity of Spike-Timing-Dependent Synaptic Plasticity , 2007, Neuron.

[39]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[40]  Xiaoqin Wang,et al.  Neural substrates of vocalization feedback monitoring in primate auditory cortex , 2008, Nature.

[41]  Timothy P. Lillicrap,et al.  Sensitivity Derivatives for Flexible Sensorimotor Learning , 2008, Neural Computation.

[42]  K. Harris Stability of the fittest: organizing learning through retroaxonal signals , 2008, Trends in Neurosciences.

[43]  Wulfram Gerstner,et al.  An online Hebbian learning rule that performs independent component analysis , 2008, BMC Neuroscience.

[44]  Volodymyr Mnih,et al.  CUDAMat: a CUDA-based matrix class for Python , 2009 .

[45]  Georg B. Keller,et al.  Neural processing of auditory feedback during vocal practice in a songbird , 2009, Nature.

[46]  J. Kwag,et al.  The timing of external input controls the sign of plasticity at local synapses , 2009, Nature Neuroscience.

[47]  W. Senn,et al.  Reinforcement learning in populations of spiking neurons , 2008, Nature Neuroscience.

[48]  Karl J. Friston The free-energy principle: a unified brain theory? , 2010, Nature Reviews Neuroscience.

[49]  J. Wickens,et al.  Timing is not Everything: Neuromodulation Opens the STDP Gate , 2010, Front. Syn. Neurosci..

[50]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[51]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[52]  Luca Maria Gambardella,et al.  Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition , 2010, ArXiv.

[53]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[54]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[55]  Wulfram Gerstner,et al.  Frontiers in Synaptic Neuroscience Synaptic Neuroscience , 2022 .

[56]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[57]  Nathan Srebro,et al.  A GPU-tailored approach for training kernelized SVMs , 2011, KDD.

[58]  S. Dehaene,et al.  Evidence for a hierarchy of predictions and prediction errors in human cortex , 2011, Proceedings of the National Academy of Sciences.

[59]  Georg B. Keller,et al.  Sensorimotor Mismatch Signals in Primary Visual Cortex of the Behaving Mouse , 2012, Neuron.

[60]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[61]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[62]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Karl J. Friston,et al.  Canonical Microcircuits for Predictive Coding , 2012, Neuron.

[64]  Yann LeCun,et al.  Convolutional neural networks applied to house numbers digit classification , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[65]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[66]  Drew N. Robson,et al.  Brain-wide neuronal dynamics during motor adaptation in zebrafish , 2012, Nature.

[67]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[68]  Masao Ito Error detection and representation in the olivo-cerebellar system , 2013, Front. Neural Circuits.

[69]  T. Lillicrap,et al.  Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics , 2013, Neuron.

[70]  C. Gilbert,et al.  Top-down influences on visual processing , 2013, Nature Reviews Neuroscience.

[71]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[72]  Yan Yang,et al.  Duration of complex-spikes grades Purkinje cell plasticity and cerebellar motor learning , 2014, Nature.

[73]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[74]  Yoshua Bengio,et al.  Deep Generative Stochastic Networks Trainable by Backprop , 2013, ICML.

[75]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[76]  W. Senn,et al.  Learning by the Dendritic Prediction of Somatic Spiking , 2014, Neuron.

[77]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[78]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[79]  Y. Dan,et al.  Long-range and local circuits for top-down modulation of visual cortex processing , 2014, Science.

[80]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[81]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[82]  Georg B. Keller,et al.  Learning Enhances Sensory and Multiple Non-sensory Representations in Primary Visual Cortex , 2015, Neuron.

[83]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[84]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[85]  S. Ferrari,et al.  Author contributions , 2021 .