Random feedback weights support learning in deep neural networks

The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame to a neuron by computing exactly how it contributed to an error. To do this, it multiplies error signals by matrices consisting of all the synaptic weights on the neuron's axon and farther downstream. This operation requires a precisely choreographed transport of synaptic weight information, which is thought to be impossible in the brain. Here we present a surprisingly simple algorithm for deep learning, which assigns blame by multiplying error signals by random synaptic weights. We show that a network can learn to extract useful information from signals sent through these random feedback connections. In essence, the network learns to learn. We demonstrate that this new mechanism performs as quickly and accurately as backpropagation on a variety of problems and describe the principles which underlie its function. Our demonstration provides a plausible basis for how a neuron can be adapted using error signals generated at distal locations in the brain, and thus dispels long-held assumptions about the algorithmic constraints on learning in neural circuits.

[1]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[2]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[3]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[4]  Geoffrey E. Hinton,et al.  Learning Representations by Recirculation , 1987, NIPS.

[5]  Richard A. Andersen,et al.  A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons , 1988, Nature.

[6]  D. G. Stork,et al.  Is backpropagation biologically plausible? , 1989, International 1989 Joint Conference on Neural Networks.

[7]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[8]  Michael I. Jordan,et al.  A more biologically plausible learning rule for neural networks. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[9]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  D C Van Essen,et al.  Information processing in the primate visual system: an integrated systems perspective. , 1992, Science.

[11]  J. F. Kolen,et al.  Backpropagation without weight transport , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[12]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[13]  Randall C. O'Reilly,et al.  Biologically Plausible Error-Driven Learning Using Local Activation Differences: The Generalized Recirculation Algorithm , 1996, Neural Computation.

[14]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15]  Alexandre Pouget,et al.  Computational approaches to sensorimotor transformations , 2000, Nature Neuroscience.

[16]  H. Seung,et al.  Learning in Spiking Neural Networks by Reinforcement of Stochastic Synaptic Transmission , 2003, Neuron.

[17]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[18]  R. Douglas,et al.  Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[19]  Konrad P. Körding,et al.  Supervised and Unsupervised Learning with Two Sites of Synaptic Integration , 2001, Journal of Computational Neuroscience.

[20]  Xiaohui Xie,et al.  Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks , 2003, Neural Computation.

[21]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[22]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[23]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[24]  Timothy P. Lillicrap,et al.  Sensitivity Derivatives for Flexible Sensorimotor Learning , 2008, Neural Computation.

[25]  K. Harris Stability of the fittest: organizing learning through retroaxonal signals , 2008, Trends in Neurosciences.

[26]  Volodymyr Mnih,et al.  CUDAMat: a CUDA-based matrix class for Python , 2009 .

[27]  W. Senn,et al.  Reinforcement learning in populations of spiking neurons , 2008, Nature Neuroscience.

[28]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[29]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[30]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[31]  Douglas B. Tweed,et al.  Adaptive Optimal Control Without Weight Transport , 2012, Neural Computation.

[32]  T. Lillicrap,et al.  Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics , 2013, Neuron.