Learning to solve the credit assignment problem

Backpropagation is driving today's artificial neural networks (ANNs). However, despite extensive research, it remains unclear if the brain implements this algorithm. Among neuroscientists, reinforcement learning (RL) algorithms are often seen as a realistic alternative: neurons can randomly introduce change, and use unspecific feedback signals to observe their effect on the cost and thus approximate their gradient. However, the convergence rate of such learning scales poorly with the number of involved neurons. Here we propose a hybrid learning approach. Each neuron uses an RL-type strategy to learn how to approximate the gradients that backpropagation would provide. We provide proof that our approach converges to the true gradient for certain classes of networks. In both feedforward and convolutional networks, we empirically show that our approach learns to approximate the gradient, and can match or the performance of exact gradient-based learning. Learning feedback weights provides a biologically plausible mechanism of achieving good performance, without the need for precise, pre-specified learning rules.

[1]  Xiaohui Xie,et al.  Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks , 2003, Neural Computation.

[2]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[3]  H. Haenssle,et al.  Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists , 2018, Annals of oncology : official journal of the European Society for Medical Oncology.

[4]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[5]  Timothy P Lillicrap,et al.  Dendritic solutions to the credit assignment problem , 2019, Current Opinion in Neurobiology.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Daniel Kifer,et al.  Lifelong Neural Predictive Coding: Sparsity Yields Less Forgetting when Learning Cumulatively , 2019, ArXiv.

[8]  Thomas Miconi,et al.  Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks , 2016, bioRxiv.

[9]  Arijit Raychowdhury,et al.  Direct Feedback Alignment With Sparse Connections for Local Learning , 2019, Front. Neurosci..

[10]  Masato Okada,et al.  Statistical Mechanics of On-line Node-perturbation Learning , 2011 .

[11]  Joel Z. Leibo,et al.  How Important Is Weight Symmetry in Backpropagation? , 2015, AAAI.

[12]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[13]  Peter C. Humphreys,et al.  Deep Learning without Weight Transport , 2019, NeurIPS.

[14]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Paul J. Werbos,et al.  Applications of advances in nonlinear sensitivity analysis , 1982 .

[16]  Evidence for eligibility traces in human learning , 2017 .

[17]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[18]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[19]  Pieter R. Roelfsema,et al.  How Attention Can Create Synaptic Tags for the Learning of Working Memories in Sequential Tasks , 2015, PLoS Comput. Biol..

[20]  Pascal Vincent,et al.  Generalized Denoising Auto-Encoders as Generative Models , 2013, NIPS.

[21]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[22]  Xiaohui Xie,et al.  Learning in neural networks by reinforcement of irregular spiking. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[23]  Yoshua Bengio,et al.  Difference Target Propagation , 2014, ECML/PKDD.

[24]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[25]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[26]  Nicolas Brunel,et al.  Cerebellar learning using perturbations , 2016, bioRxiv.

[27]  Quoc V. Le,et al.  Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.

[28]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[29]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[33]  Pierre Baldi,et al.  Learning in the machine: Random backpropagation and the deep learning channel , 2016, Artif. Intell..

[34]  Xiao-Jing Wang,et al.  Reward-based training of recurrent neural networks for cognitive and value-based tasks , 2016, bioRxiv.

[35]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[36]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[37]  Tomaso A. Poggio,et al.  Biologically-plausible learning algorithms can scale to large datasets , 2018, ICLR.

[38]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[39]  Ila R Fiete,et al.  Gradient learning in spiking neural networks by dynamic perturbation of conductances. , 2006, Physical review letters.

[40]  Yoshua Bengio,et al.  The Consciousness Prior , 2017, ArXiv.

[41]  Yonatan Loewenstein,et al.  Operant matching is a generic outcome of synaptic plasticity based on the covariance between reward and neural activity , 2006, Proceedings of the National Academy of Sciences.

[42]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43]  Jürgen Schmidhuber,et al.  Networks adjusting networks , 1990, Forschungsberichte, TU Munich.

[44]  Konrad P. Körding,et al.  Supervised and Unsupervised Learning with Two Sites of Synaptic Integration , 2001, Journal of Computational Neuroscience.

[45]  Wulfram Gerstner,et al.  Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules , 2018, Front. Neural Circuits.

[46]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[47]  Wulfram Gerstner,et al.  Multi-Timescale Memory Dynamics Extend Task Repertoire in a Reinforcement Learning Network With Attention-Gated Memory , 2017, Front. Comput. Neurosci..

[48]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[49]  Yan Wu,et al.  Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.

[50]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[51]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[52]  Razvan Pascanu,et al.  Sobolev Training for Neural Networks , 2017, NIPS.

[53]  Konrad Paul Kording,et al.  Towards learning-to-learn , 2018, Current Opinion in Behavioral Sciences.

[54]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[55]  Yoshua Bengio,et al.  Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation , 2016, Front. Comput. Neurosci..

[56]  Wolfgang Maass,et al.  A Reward-Modulated Hebbian Learning Rule Can Explain Experimentally Observed Network Reorganization in a Brain Control Task , 2010, The Journal of Neuroscience.

[57]  Geoffrey E. Hinton,et al.  Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures , 2018, NeurIPS.

[58]  Wulfram Gerstner,et al.  Biologically plausible deep learning - but how far can we go with shallow networks? , 2019, Neural Networks.

[59]  H. Seung,et al.  Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. , 2007, Journal of neurophysiology.

[60]  Liam Paninski,et al.  Reinforcement Learning Recruits Somata and Apical Dendrites across Layers of Primary Sensory Cortex , 2019, Cell reports.

[61]  Konrad Paul Kording,et al.  Spiking allows neurons to estimate their causal effect , 2018, bioRxiv.

[62]  L. F. Abbott,et al.  Feedback alignment in deep convolutional networks , 2018, ArXiv.

[63]  Daniel Kifer,et al.  Continual Learning of Recurrent Neural Networks by Locally Aligning Distributed Representations , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[64]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[65]  Max Jaderberg,et al.  Understanding Synthetic Gradients and Decoupled Neural Interfaces , 2017, ICML.

[66]  Timothy P Lillicrap,et al.  Towards deep learning with segregated dendrites , 2016, eLife.

[67]  S. Linnainmaa Taylor expansion of the accumulated rounding error , 1976 .

[68]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[69]  James Martens,et al.  On the Variance of Unbiased Online Recurrent Optimization , 2019, ArXiv.

[70]  Misha Denil,et al.  Noisy Activation Functions , 2016, ICML.

[71]  D. Marr A theory of cerebellar cortex , 1969, The Journal of physiology.

[72]  Masato Okada,et al.  Statistical Mechanics of Node-perturbation Learning with Noisy Baseline , 2017, ArXiv.

[73]  Daniel Kifer,et al.  Conducting Credit Assignment by Aligning Local Representations , 2018, 1803.01834.