Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity

The impressive lifelong learning in animal brains is primarily enabled by plastic changes in synaptic connectivity. Importantly, these changes are not passive, but are actively controlled by neuromodulation, which is itself under the control of the brain. The resulting self-modifying abilities of the brain play an important role in learning and adaptation, and are a major basis for biological reinforcement learning. Here we show for the first time that artificial neural networks with such neuromodulated plasticity can be trained with gradient descent. Extending previous work on differentiable Hebbian plasticity, we propose a differentiable formulation for the neuromodulation of plasticity. We show that neuromodulated plasticity improves the performance of neural networks on both reinforcement learning and supervised learning tasks. In one task, neuromodulated plastic LSTMs with millions of parameters outperform standard LSTMs on a benchmark language modeling task (controlling for the number of parameters). We conclude that differentiable neuromodulation of plasticity offers a powerful new framework for training neural networks.

[1]  H. Harlow,et al.  The formation of learning sets. , 1949, Psychological review.

[2]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[3]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4]  J. Schmidhuber Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets , 1993 .

[5]  Jürgen Schmidhuber,et al.  A ‘Self-Referential’ Weight Matrix , 1993 .

[6]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  S. Smith‐Roe,et al.  Coincident Activation of NMDA and Dopamine D1Receptors within the Nucleus Accumbens Core Is Required for Appetitive Instrumental Learning , 2000, The Journal of Neuroscience.

[9]  S. J. Martin,et al.  Synaptic plasticity and memory: an evaluation of the hypothesis. , 2000, Annual review of neuroscience.

[10]  W. K. Cullen,et al.  Dopamine-dependent facilitation of LTP induction in hippocampal CA1 by exposure to spatial novelty , 2003, Nature Neuroscience.

[11]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Paolo Calabresi,et al.  Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.

[14]  E. Izhikevich Solving the distal reward problem through linkage of STDP and dopamine signaling , 2007, BMC Neuroscience.

[15]  H. Seung,et al.  Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances. , 2007, Journal of neurophysiology.

[16]  Anatol C. Kreitzer,et al.  Striatal Plasticity and Basal Ganglia Circuit Function , 2008, Neuron.

[17]  Dario Floreano,et al.  Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios , 2008, ALIFE.

[18]  Erkki Oja,et al.  Oja learning rule , 2008, Scholarpedia.

[19]  Y. Niv Reinforcement learning in the brain , 2009 .

[20]  K. Molina-Luna,et al.  Dopamine in Motor Cortex Is Necessary for Skill Learning and Synaptic Plasticity , 2009, PloS one.

[21]  Henning Sprekeler,et al.  Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity , 2010, The Journal of Neuroscience.

[22]  K. Deisseroth,et al.  Optogenetic stimulation of a hippocampal engram activates fear memory recall , 2012, Nature.

[23]  G. Laurent,et al.  Conditional modulation of spike-timing-dependent plasticity for olfactory learning , 2012, Nature.

[24]  Sebastian Risi,et al.  A unified approach to evolving plasticity and neural geometry , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[25]  Jochen J. Steil,et al.  Solving the Distal Reward Problem with Rare Correlations , 2013, Neural Computation.

[26]  S. Lammel,et al.  Reward and aversion in a heterogeneous midbrain dopamine system , 2014, Neuropharmacology.

[27]  Sho Yagishita,et al.  A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2014, Science.

[28]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[29]  Wolfgang Maass,et al.  Emergence of complex computational structures from chaotic neural networks through reward-modulated Hebbian learning. , 2014, Cerebral cortex.

[30]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[31]  Su Z. Hong,et al.  Distinct Eligibility Traces for LTP and LTD in Cortical Synapses , 2015, Neuron.

[32]  Jean-Baptiste Mouret,et al.  Neural Modularity Helps Organisms Evolve to Learn New Skills without Forgetting Old Skills , 2015, PLoS Comput. Biol..

[33]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[34]  M. Howe,et al.  Rapid signaling in distinct dopaminergic axons during locomotion and reward , 2016, Nature.

[35]  栁下 祥 A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2016 .

[36]  W. Gerstner,et al.  Neuromodulated Spike-Timing-Dependent Plasticity, and Theory of Three-Factor Learning Rules , 2016, Front. Neural Circuits.

[37]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[38]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[39]  Thomas Miconi,et al.  Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks , 2016, bioRxiv.

[40]  Jürgen Schmidhuber,et al.  Gated Fast Weights for On-The-Fly Neural Program Generation , 2017 .

[41]  Simon D. Fisher,et al.  Reinforcement determines the timing dependence of corticostriatal synaptic plasticity in vivo , 2017, Nature Communications.

[42]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[43]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[44]  Jeff Clune,et al.  Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks , 2017, PloS one.

[45]  Benjamin T. Saunders,et al.  Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties , 2018, Nature Neuroscience.

[46]  Joel Z. Leibo,et al.  Prefrontal cortex as a meta-reinforcement learning system , 2018, bioRxiv.

[47]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[48]  Max Tegmark,et al.  Meta-learning autoencoders for few-shot prediction , 2018, ArXiv.

[49]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[50]  Sebastian Risi,et al.  Born to Learn: the Inspiration, Progress, and Future of Evolved Plastic Artificial Neural Networks , 2017, Neural Networks.

[51]  Wulfram Gerstner,et al.  Eligibility Traces and Plasticity on Behavioral Time Scales: Experimental Support of NeoHebbian Three-Factor Learning Rules , 2018, Front. Neural Circuits.

[52]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.