Adaptive learning and decision-making under uncertainty by metaplastic synapses guided by a surprise detection system

Recent experiments have shown that animals and humans have a remarkable ability to adapt their learning rate according to the volatility of the environment. Yet the neural mechanism responsible for such adaptive learning has remained unclear. To fill this gap, we investigated a biophysically inspired, metaplastic synaptic model within the context of a well-studied decision-making network, in which synapses can change their rate of plasticity in addition to their efficacy according to a reward-based learning rule. We found that our model, which assumes that synaptic plasticity is guided by a novel surprise detection system, captures a wide range of key experimental findings and performs as well as a Bayes optimal model, with remarkably little parameter tuning. Our results further demonstrate the computational power of synaptic plasticity, and provide insights into the circuit-level computation which underlies adaptive decision-making. DOI: http://dx.doi.org/10.7554/eLife.18073.001

[1]  Xiao-Jing Wang,et al.  Synaptic computation underlying probabilistic inference , 2010, Nature Neuroscience.

[2]  Xiao-Jing Wang,et al.  The importance of mixed selectivity in complex cognitive tasks , 2013, Nature.

[3]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[4]  Stefano Fusi,et al.  Dynamical Regimes in Neural Network Models of Matching Behavior , 2013, Neural Computation.

[5]  Dhanistha Panyasak,et al.  Circuits , 1995, Annals of the New York Academy of Sciences.

[6]  P. Dayan,et al.  Tonic dopamine: opportunity costs and the control of response vigor , 2007, Psychopharmacology.

[7]  N. Mackintosh A Theory of Attention: Variations in the Associability of Stimuli with Reinforcement , 1975 .

[8]  P. Glimcher,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 555–579 NUMBER 3(NOVEMBER) DYNAMIC RESPONSE-BY-RESPONSE MODELS OF MATCHING BEHAVIOR IN RHESUS MONKEYS , 2022 .

[9]  A. Fairhall,et al.  Multiple Timescale Encoding of Slowly Varying Whisker Stimulus Envelope in Cortical and Thalamic Neurons In Vivo , 2010, The Journal of Neuroscience.

[10]  I. Nelken,et al.  Multiple Time Scales of Adaptation in Auditory Cortex Neurons , 2004, The Journal of Neuroscience.

[11]  Aaron C. Courville,et al.  Bayesian theories of conditioning in a changing world , 2006, Trends in Cognitive Sciences.

[12]  L. Abbott,et al.  Cascade Models of Synaptically Stored Memories , 2005, Neuron.

[13]  A. Fairhall,et al.  Timescales of Inference in Visual Adaptation , 2009, Neuron.

[14]  Timothy E. J. Behrens,et al.  Choice, uncertainty and value in prefrontal and cingulate cortex , 2008, Nature Neuroscience.

[15]  Xiao-Jing Wang,et al.  Internal Representation of Task Rules by Recurrent Dynamics: The Importance of the Diversity of Neural Responses , 2010, Front. Comput. Neurosci..

[16]  Karim Nader,et al.  Memory consolidation of Pavlovian fear conditioning: a cellular and molecular perspective , 2001, Trends in Neurosciences.

[17]  Stefano Fusi,et al.  Efficient Partitioning of Memory Systems and Its Importance for Memory Consolidation , 2013, PLoS Comput. Biol..

[18]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[19]  W. Senn,et al.  Neocortical pyramidal cells respond as integrate-and-fire neurons to in vivo-like input currents. , 2003, Journal of neurophysiology.

[20]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[21]  Xiao-Jing Wang Decision Making in Recurrent Neuronal Circuits , 2008, Neuron.

[22]  C. Gallistel,et al.  The rat approximates an ideal detector of changes in rates of reward: implications for the law of effect. , 2001, Journal of experimental psychology. Animal behavior processes.

[23]  J. Kotaleski,et al.  Modelling the molecular mechanisms of synaptic plasticity using systems biology approaches , 2010, Nature Reviews Neuroscience.

[24]  K. Lloyd,et al.  Context-dependent decision-making: a simple Bayesian model , 2013, Journal of The Royal Society Interface.

[25]  Stefano Fusi,et al.  The Sparseness of Mixed Selectivity Neurons Controls the Generalization–Discrimination Trade-Off , 2013, The Journal of Neuroscience.

[26]  S Fusi,et al.  Forming classes by stimulus frequency: Behavior and theory , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Jonathan D. Cohen,et al.  An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. , 2005, Annual review of neuroscience.

[28]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[29]  John M. Pearson,et al.  Change detection, multiple controllers, and dynamic environments: insights from the brain. , 2013, Journal of the experimental analysis of behavior.

[30]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[31]  Gerstner Wulfram Tag-Trigger-Consolidation: A model of early and late long-term potentation and depression , 2009 .

[32]  Paul Smolen,et al.  Computational Design of Enhanced Learning Protocols , 2011, Nature Neuroscience.

[33]  R. Rescorla Spontaneous recovery. , 2004, Learning & memory.

[34]  Xiao-Jing Wang,et al.  Probabilistic Decision Making by Slow Reverberation in Cortical Circuits , 2002, Neuron.

[35]  Ryan P. Adams,et al.  Bayesian Online Changepoint Detection , 2007, 0710.3742.

[36]  L. Squire,et al.  The cognitive neuroscience of human memory since H.M. , 2011, Annual review of neuroscience.

[37]  Timothy E. J. Behrens,et al.  Perceptual Classification in a Rapidly Changing Environment , 2011, Neuron.

[38]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[39]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[40]  Mark C. W. van Rossum,et al.  State Based Model of Long-Term Potentiation and Synaptic Tagging and Capture , 2009, PLoS Comput. Biol..

[41]  J. Thorson,et al.  Distributed Relaxation Processes in Sensory Adaptation , 1974, Science.

[42]  Y. Loewenstein,et al.  Covariance-Based Synaptic Plasticity in an Attractor Network Model Accounts for Fast Adaptation in Free Operant Learning , 2013, The Journal of Neuroscience.

[43]  E. Miller,et al.  A Neural Circuit Model of Flexible Sensorimotor Mapping: Learning and Forgetting on Multiple Timescales , 2007, Neuron.

[44]  J. Wixted,et al.  On the Form of Forgetting , 1991 .

[45]  Xiao-Jing Wang,et al.  A Biophysically Based Neural Model of Matching Law Behavior: Melioration by Stochastic Synapses , 2006, The Journal of Neuroscience.

[46]  Joseph T. McGuire,et al.  Functionally Dissociable Influences on Learning Rate in a Dynamic Environment , 2014, Neuron.

[47]  Yutaka Sakai,et al.  The Actor-Critic Learning Is Behind the Matching Law: Matching Versus Optimal Behaviors , 2008, Neural Computation.

[48]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[49]  H. Seo,et al.  A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[50]  K. Deisseroth Circuit dynamics of adaptive and maladaptive behaviour , 2014, Nature.

[51]  Grant R. Gordon,et al.  Norepinephrine triggers release of glial ATP to increase postsynaptic efficacy , 2005, Nature Neuroscience.

[52]  Kiyohito Iigaya,et al.  Neural network models of decision making with learning on multiple timescales , 2014 .

[53]  Konrad Paul Kording,et al.  The dynamics of memory as a consequence of optimal adaptation to a changing body , 2007, Nature Neuroscience.

[54]  L. Abbott,et al.  Limits on the memory storage capacity of bounded synapses , 2007, Nature Neuroscience.

[55]  Akane Sano,et al.  A cholinergic trigger drives learning-induced plasticity at hippocampal synapses , 2013, Nature Communications.

[56]  J. E. Mazur,et al.  Past experience, recency, and spontaneous recovery in choice behavior , 1996 .

[57]  Peter Dayan,et al.  Optimal Recall from Bounded Metaplastic Synapses: Predicting Functional Adaptations in Hippocampal Area CA3 , 2014, PLoS Comput. Biol..

[58]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[59]  W. Gerstner,et al.  Temporal whitening by power-law adaptation in neocortical neurons , 2013, Nature Neuroscience.

[60]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[61]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .

[62]  John M. Pearson,et al.  Surprise Signals in Anterior Cingulate Cortex: Neuronal Encoding of Unsigned Reward Prediction Errors Driving Adjustment in Behavior , 2011, The Journal of Neuroscience.

[63]  Gavin Rumbaugh,et al.  Synaptic evidence for the efficacy of spaced learning , 2012, Proceedings of the National Academy of Sciences.

[64]  R. Malenka,et al.  Synaptic Plasticity: Multiple Forms, Functions, and Mechanisms , 2008, Neuropsychopharmacology.

[65]  Daniel J. Amit,et al.  Learning in Neural Networks with Material Synapses , 1994, Neural Computation.

[66]  S. J. Martin,et al.  Synaptic plasticity and memory: an evaluation of the hypothesis. , 2000, Annual review of neuroscience.

[67]  Peter Dayan,et al.  Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems , 2001 .

[68]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[69]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[70]  John Rinzel,et al.  Dynamics of Feature Categorization , 2013, Neural Computation.

[71]  Joshua I. Gold,et al.  A Mixture of Delta-Rules Approximation to Bayesian Inference in Change-Point Problems , 2013, PLoS Comput. Biol..

[72]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[73]  Wulfram Gerstner,et al.  Tag-Trigger-Consolidation: A Model of Early and Late Long-Term-Potentiation and Depression , 2008, PLoS Comput. Biol..

[74]  Zeb Kurth-Nelson,et al.  Learning-Induced Plasticity in Medial Prefrontal Cortex Predicts Preference Malleability , 2015, Neuron.