Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain

The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning. However, a key question remains as to whether backprop can be formulated in a manner suitable for implementation in neural circuitry. The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global signals as in standard backprop. Recently several algorithms for approximating backprop using only local signals have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes, or multiple sequential backwards phases with information being stored across phases. Here, we propose a novel algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, utilises only a single parallel backwards relaxation phase, and can operate on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.

[1]  Cordelia Schmid,et al.  IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2004, Washington, DC, USA, June 27 - July 2, 2004 , 2004, CVPR Workshops.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Colin J. Akerman,et al.  Random synaptic feedback weights support error backpropagation for deep learning , 2016, Nature Communications.

[4]  Yoshua Bengio,et al.  Early Inference in Energy-Based Models Approximates Back-Propagation , 2015, ArXiv.

[5]  Francis Crick,et al.  The recent excitement about neural networks , 1989, Nature.

[6]  Yoshua Bengio,et al.  Extending the Framework of Equilibrium Propagation to General Dynamics , 2018, ICLR.

[7]  Beren Millidge,et al.  Relaxing the Constraints on Predictive Coding Models , 2020, ArXiv.

[8]  Michael W. Spratling Reconciling Predictive Coding and Biased Competition Models of Cortical Function , 2008, Frontiers Comput. Neurosci..

[9]  Griewank,et al.  On automatic differentiation , 1988 .

[10]  M. Larkum A cellular mechanism for cortical associations: an organizing principle for the cerebral cortex , 2013, Trends in Neurosciences.

[11]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[12]  James C. R. Whittington,et al.  Theories of Error Back-Propagation in the Brain , 2019, Trends in Cognitive Sciences.

[13]  Benjamin Widom,et al.  Scaling laws , 2009, Scholarpedia.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Charles C. Margossian,et al.  A review of automatic differentiation and its efficient implementation , 2018, WIREs Data Mining Knowl. Discov..

[16]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[17]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[18]  Daniel L. K. Yamins,et al.  Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition , 2014, PLoS Comput. Biol..

[19]  Nikolaus Kriegeskorte,et al.  Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation , 2014, PLoS Comput. Biol..

[20]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[21]  Timothy P Lillicrap,et al.  Towards deep learning with segregated dendrites , 2016, eLife.

[22]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[23]  Ilya Sutskever,et al.  Jukebox: A Generative Model for Music , 2020, ArXiv.

[24]  Adam Santoro,et al.  Backpropagation and the brain , 2020, Nature Reviews Neuroscience.

[25]  Xiaohui Xie,et al.  Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network , 2003, Neural Computation.

[26]  Nikolaus Kriegeskorte,et al.  Deep neural networks: a new framework for modelling biological vision and brain information processing , 2015, bioRxiv.

[27]  Yoshua Bengio,et al.  Ghost Units Yield Biologically Plausible Backprop in Deep Neural Networks , 2019, ArXiv.

[28]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[29]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[30]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[31]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[32]  Benjamin F. Grewe,et al.  A Theoretical Framework for Target Propagation , 2020, NeurIPS.

[33]  E. Kuramoto,et al.  A morphological analysis of thalamocortical axon fibers of rat posterior thalamic nuclei: a single neuron tracing study with viral vectors. , 2012, Cerebral cortex.

[34]  M. Kay Language Models , 2006 .

[35]  Rafal Bogacz,et al.  An approximation of the error back-propagation algorithm in a predictive coding network with local Hebbian synaptic plasticity , 2015, bioRxiv.

[36]  B. Sakmann,et al.  Dendritic Spikes in Apical Dendrites of Neocortical Layer 2/3 Pyramidal Neurons , 2007, The Journal of Neuroscience.

[37]  David Reitter,et al.  Learning to Adapt by Minimizing Discrepancy , 2017, ArXiv.

[38]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[39]  Timothy P Lillicrap,et al.  Backpropagation through time and the brain , 2019, Current Opinion in Neurobiology.

[40]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  R. Lewin,et al.  MASTERING THE GAME , 1998 .

[43]  Daniel Kifer,et al.  Conducting Credit Assignment by Aligning Local Representations , 2018, 1803.01834.

[44]  Yali Amit,et al.  Deep Learning With Asymmetric Connections and Hebbian Updates , 2018, Front. Comput. Neurosci..

[45]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[46]  M. Longair The Theoretical Framework , 1998 .

[47]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[48]  David P. McGovern,et al.  Evaluating the neurophysiological evidence for predictive processing as a model of perception , 2020, Annals of the New York Academy of Sciences.

[49]  Konrad P. Körding,et al.  Supervised and Unsupervised Learning with Two Sites of Synaptic Integration , 2001, Journal of Computational Neuroscience.

[50]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[51]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[52]  Beren Millidge,et al.  Predictive Coding Approximates Backprop Along Arbitrary Computation Graphs , 2020, Neural Computation.

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  Pierre Baldi,et al.  A theory of local learning, the learning channel, and the optimality of backpropagation , 2015, Neural Networks.

[56]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[57]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[58]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[59]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[60]  P. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis , 2011, Proceedings of the National Academy of Sciences.

[61]  Yoshua Bengio,et al.  Generalization of Equilibrium Propagation to Vector Field Dynamics , 2018, ArXiv.

[62]  Alexander Ororbia,et al.  Biologically Motivated Algorithms for Propagating Local Target Representations , 2018, AAAI.

[63]  T. Freund,et al.  Total number and distribution of inhibitory and excitatory synapses on hippocampal CA1 pyramidal cells , 2001, Neuroscience.

[64]  Rafal Bogacz,et al.  An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity , 2017, Neural Computation.

[65]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[66]  Yoshua Bengio,et al.  STDP-Compatible Approximation of Backpropagation in an Energy-Based Model , 2017, Neural Computation.

[67]  Paul W. Glimcher Understanding dopamine and reinforcement learning: The dopamine reward prediction error hypothesis (Proceedings of the National Academy of Sciences of the United States of America (2011) 108, S3, (15647-15654) DOI: 10.1073/pnas.1014269108) , 2011 .

[68]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[69]  Karl J. Friston Learning and inference in the brain , 2003, Neural Networks.

[70]  Daniel Kifer,et al.  Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[71]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[72]  A. Pérez-Villalba Rhythms of the Brain, G. Buzsáki. Oxford University Press, Madison Avenue, New York (2006), Price: GB £42.00, p. 448, ISBN: 0-19-530106-4 , 2008 .

[73]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[74]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[75]  James Martens Second-order Optimization for Neural Networks , 2016 .

[76]  K. Rockland,et al.  Terminal arbors of individual “Feedback” axons projecting from area V2 to V1 in the macaque monkey: A study using immunohistochemistry of anterogradely transported Phaseolus vulgaris‐leucoagglutinin , 1989, The Journal of comparative neurology.

[77]  Bart van Merrienboer,et al.  Automatic differentiation in ML: Where we are and where we should be going , 2018, NeurIPS.

[78]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[79]  W. Senn,et al.  Learning by the Dendritic Prediction of Somatic Spiking , 2014, Neuron.

[80]  Yoshua Bengio,et al.  Dendritic cortical microcircuits approximate the backpropagation algorithm , 2018, NeurIPS.