A cholinergic feedback circuit to regulate striatal population uncertainty and optimize reinforcement learning

Convergent evidence suggests that the basal ganglia support reinforcement learning by adjusting action values according to reward prediction errors. However, adaptive behavior in stochastic environments requires the consideration of uncertainty to dynamically adjust the learning rate. We consider how cholinergic tonically active interneurons (TANs) may endow the striatum with such a mechanism in computational models spanning three Marr's levels of analysis. In the neural model, TANs modulate the excitability of spiny neurons, their population response to reinforcement, and hence the effective learning rate. Long TAN pauses facilitated robustness to spurious outcomes by increasing divergence in synaptic weights between neurons coding for alternative action values, whereas short TAN pauses facilitated stochastic behavior but increased responsiveness to change-points in outcome contingencies. A feedback control system allowed TAN pauses to be dynamically modulated by uncertainty across the spiny neuron population, allowing the system to self-tune and optimize performance across stochastic environments. DOI: http://dx.doi.org/10.7554/eLife.12029.001

[1]  E. Vaadia,et al.  Midbrain Dopaminergic Neurons and Striatal Cholinergic Interneurons Encode the Difference between Reward and Aversive Events at Different Epochs of Probabilistic Classical Conditioning Trials , 2008, The Journal of Neuroscience.

[2]  D. Surmeier,et al.  Cholinergic modulation of Kir2 channels selectively elevates dendritic excitability in striatopallidal neurons , 2007, Nature Neuroscience.

[3]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[4]  Tonically Active Neurons in the Striatum of the Monkey Rapidly Signal a Switch in Behavioral Set , 2002 .

[5]  B. Balleine,et al.  δ-Opioid and Dopaminergic Processes in Accumbens Shell Modulate the Cholinergic Control of Predictive Learning and Choice , 2014, The Journal of Neuroscience.

[6]  P. Calabresi,et al.  Blockade of M2‐like muscarinic receptors enhances long‐term potentiation at corticostriatal synapses , 1998 .

[7]  A. Graybiel,et al.  Effect of the nigrostriatal dopamine system on acquired neural responses in the striatum of behaving monkeys. , 1994, Science.

[8]  Laura A. Bradfield,et al.  The Thalamostriatal Pathway and Cholinergic Control of Goal-Directed Action: Interlacing New with Existing Learning in the Striatum , 2013, Neuron.

[9]  Yuchun Zhang,et al.  Involvement of Ih in Dopamine Modulation of Tonic Firing in Striatal Cholinergic Interneurons , 2007, The Journal of Neuroscience.

[10]  Benedikt Grothe,et al.  Experience-dependent refinement of inhibitory inputs to auditory coincidence-detector neurons , 2002, Nature Neuroscience.

[11]  K. Deisseroth,et al.  Phasic Firing in Dopaminergic Neurons Is Sufficient for Behavioral Conditioning , 2009, Science.

[12]  M. Frank,et al.  Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. , 2009, Nature neuroscience.

[13]  Kenji F. Tanaka,et al.  Functional Connectome of the Striatal Medium Spiny Neuron , 2011, The Journal of Neuroscience.

[14]  P. Apicella,et al.  Tonically active neurons in the striatum differentiate between delivery and omission of expected reward in a probabilistic task context , 2009, The European journal of neuroscience.

[15]  Anatol C. Kreitzer,et al.  Distinct roles for direct and indirect pathway striatal neurons in reinforcement , 2012, Nature Neuroscience.

[16]  Michael J. Frank,et al.  Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Nonmedicated Parkinsonism , 2005, Journal of Cognitive Neuroscience.

[17]  Rajesh P. N. Rao Hierarchical Bayesian Inference in Networks of Spiking Neurons , 2004, NIPS.

[18]  S. Wang,et al.  Coincidence detection in single dendritic spines mediated by calcium release , 2000, Nature Neuroscience.

[19]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[20]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[21]  Charles J. Wilson,et al.  Spontaneous Activity of Neostriatal Cholinergic Interneurons In Vitro , 1999, The Journal of Neuroscience.

[22]  Brian Mingus,et al.  The Emergent neural modeling system , 2008, Neural Networks.

[23]  A. Graybiel,et al.  Neurons in the thalamic CM-Pf complex supply striatal neurons with information about behaviorally significant sensory events. , 2001, Journal of neurophysiology.

[24]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[25]  Anne G E Collins,et al.  Cognitive control over learning: creating, clustering, and generalizing task-set structure. , 2013, Psychological review.

[26]  P. Apicella,et al.  Cortical and Thalamic Excitation Mediate the Multiphasic Responses of Striatal Cholinergic Interneurons to Motivationally Salient Stimuli , 2014, The Journal of Neuroscience.

[27]  L. Alberi,et al.  Midbrain Dopaminergic Neurons , 2003 .

[28]  C. Mathys,et al.  Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning , 2013, Neuron.

[29]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[30]  T. Wichmann,et al.  GABAergic inputs from direct and indirect striatal projection neurons onto cholinergic interneurons in the primate putamen , 2013, The Journal of comparative neurology.

[31]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[32]  Tomaso Poggio,et al.  From Understanding Computation to Understanding Neural Circuitry , 1976 .

[33]  Trevor W Robbins,et al.  Lesions of the Medial Striatum in Monkeys Produce Perseverative Impairments during Reversal Learning Similar to Those Produced by Lesions of the Orbitofrontal Cortex , 2008, The Journal of Neuroscience.

[34]  Y. Smith,et al.  Cholinergic interneurons in the dorsal and ventral striatum: anatomical and functional considerations in normal and diseased conditions , 2015, Annals of the New York Academy of Sciences.

[35]  F. Gregory Ashby,et al.  A Computational Model of How Cholinergic Interneurons Protect Striatal-dependent Learning , 2011, Journal of Cognitive Neuroscience.

[36]  De Vries Book review: R.C. O'Reilly and Y. Munakata: Computational explorations in cognitive neuroscience: understanding the mind by stimulating the brain. Cambridge, Mass: The MIT Press. , 2002 .

[37]  Weixing Shen,et al.  Cholinergic Suppression of KCNQ Channel Currents Enhances Excitability of Striatal Medium Spiny Neurons , 2005, The Journal of Neuroscience.

[38]  M. Ragozzino,et al.  Acetylcholine activity in selective striatal regions supports behavioral flexibility , 2009, Neurobiology of Learning and Memory.

[39]  D. Lovinger,et al.  Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. , 2012, Cell reports.

[40]  K. Gurney,et al.  A Physiologically Plausible Model of Action Selection and Oscillatory Activity in the Basal Ganglia , 2006, The Journal of Neuroscience.

[41]  栁下 祥 A critical time window for dopamine actions on the structural plasticity of dendritic spines , 2016 .

[42]  G. Kane Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations, vol 2: Psychological and Biological Models , 1994 .

[43]  D. Bullock,et al.  A dopamine-acetylcholine cascade: simulating learned and lesion-induced behavior of striatal cholinergic interneurons. , 2008, Journal of neurophysiology.

[44]  Etienne Koechlin,et al.  Foundations of human reasoning in the prefrontal cortex , 2014, Science.

[45]  Anne G E Collins,et al.  Opponent actor learning (OpAL): modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. , 2014, Psychological review.

[46]  Peter Bossaerts,et al.  Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings , 2011, PLoS Comput. Biol..

[47]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[48]  Michael J. Frank,et al.  Hold your horses: A dynamic computational role for the subthalamic nucleus in decision making , 2006, Neural Networks.

[49]  Joshua I. Gold,et al.  Bayesian Online Learning of the Hazard Rate in Change-Point Problems , 2010, Neural Computation.

[50]  M. Frank,et al.  Dopaminergic Genes Predict Individual Differences in Susceptibility to Confirmation Bias , 2011, The Journal of Neuroscience.

[51]  Karl J. Friston,et al.  A Bayesian Foundation for Individual Learning Under Uncertainty , 2011, Front. Hum. Neurosci..

[52]  A. Stocco,et al.  Acetylcholine-Based Entropy in Response Selection: A Model of How Striatal Interneurons Modulate Exploration, Exploitation, and Response Variability in Decision-Making , 2012, Front. Neurosci..

[53]  J. Reynolds,et al.  Spontaneous firing and evoked pauses in the tonically active cholinergic interneurons of the striatum , 2011, Neuroscience.

[54]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[55]  A. D. Smith,et al.  Substance P-Containing terminals in synaptic contact with cholinergic neurons in the neostriatum and basal forebrain: a double immunocytochemical study in the rat , 1986, Brain Research.

[56]  Paolo Calabresi,et al.  Dopamine-mediated regulation of corticostriatal synaptic plasticity , 2007, Trends in Neurosciences.

[57]  S. Cragg Meaningful silences: how dopamine listens to the ACh pause , 2006, Trends in Neurosciences.

[58]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[59]  Charles J. Wilson,et al.  Origin of the slow afterhyperpolarization and slow rhythmic bursting in striatal cholinergic interneurons. , 2006, Journal of neurophysiology.

[60]  Angela J. Yu,et al.  Uncertainty, Neuromodulation, and Attention , 2005, Neuron.

[61]  M. Ullsperger,et al.  Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices , 2011, The Journal of Neuroscience.

[62]  E. Koechlin,et al.  Reasoning, Learning, and Creativity: Frontal Lobe Function and Human Decision-Making , 2012, PLoS biology.

[63]  A. Graybiel,et al.  Activity of striatal neurons reflects dynamic encoding and recoding of procedural memories , 2005, Nature.

[64]  M. Ragozzino,et al.  Involvement of the dorsomedial striatum in behavioral flexibility: role of muscarinic cholinergic receptors , 2002, Brain Research.

[65]  K. Deisseroth,et al.  Striatal Dopamine Release Is Triggered by Synchronized Activity in Cholinergic Interneurons , 2012, Neuron.

[66]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[67]  Michael E. Ragozzino,et al.  Differential involvement of M1-type and M4-type muscarinic cholinergic receptors in the dorsomedial striatum in task switching , 2008, Neurobiology of Learning and Memory.

[68]  Geoffrey E. Hinton,et al.  OPTIMAL PERCEPTUAL INFERENCE , 1983 .

[69]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[70]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[71]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[72]  John N. J. Reynolds,et al.  Dopamine-dependent plasticity of corticostriatal synapses , 2002, Neural Networks.

[73]  P. Apicella,et al.  The Role of Striatal Tonically Active Neurons in Reward Prediction Error Signaling during Instrumental Task Performance , 2011, The Journal of Neuroscience.

[74]  Roger Ratcliff,et al.  Reinforcement-Based Decision Making in Corticostriatal Circuits: Mutual Constraints by Neurocomputational and Diffusion Models , 2012, Neural Computation.

[75]  D. Blei,et al.  Context, learning, and extinction. , 2010, Psychological review.

[76]  G Bernardi,et al.  Blockade of M2-like muscarinic receptors enhances long-term potentiation at corticostriatal synapses. , 1998, The European journal of neuroscience.

[77]  R. Dolan,et al.  The Known Unknowns: Neural Representation of Second-Order Uncertainty, and Ambiguity , 2011, The Journal of Neuroscience.

[78]  T. Aosaki,et al.  Acetylcholine–dopamine balance hypothesis in the striatum: An update , 2010, Geriatrics & gerontology international.

[79]  D. James Surmeier,et al.  Thalamic Gating of Corticostriatal Signaling by Cholinergic Interneurons , 2010, Neuron.

[80]  R. O’Reilly,et al.  Computational Explorations in Cognitive Neuroscience , 2009 .

[81]  Seth A. Herd,et al.  The Leabra Cognitive Architecture: How to Play 20 Principles with Nature and Win! , 2012 .

[82]  Seth A. Herd,et al.  The Leabra Cognitive Architecture , 2017 .

[83]  A M Graybiel,et al.  The basal ganglia and adaptive motor control. , 1994, Science.

[84]  P. Glimcher,et al.  Statistics of midbrain dopamine neuron spike trains in the awake primate. , 2007, Journal of neurophysiology.

[85]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[86]  Ilana B. Witten,et al.  Cholinergic Interneurons Control Local Circuit Activity and Cocaine Conditioning , 2010, Science.

[87]  C. Gerfen,et al.  Modulation of striatal projection systems by dopamine. , 2011, Annual review of neuroscience.

[88]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[89]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[90]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[91]  Sabrina Ravel,et al.  Responses of Tonically Active Neurons in the Monkey Striatum Discriminate between Motivationally Opposing Stimuli , 2003, The Journal of Neuroscience.

[92]  Wei Ji Ma,et al.  Bayesian inference with probabilistic population codes , 2006, Nature Neuroscience.

[93]  L. Butcher,et al.  Cholinergic neurons in the caudate-putamen complex proper are intrinsically organized: A combined evans blue and acetylcholinesterase analysis , 1981, Brain Research Bulletin.

[94]  Andrew Faulkner,et al.  Vividness of Visual Imagery and Incidental Recall of Verbal Cues, When Phenomenological Availability Reflects Long-Term Memory Accessibility , 2013, Front. Psychology.

[95]  A. Graybiel,et al.  Temporal and spatial characteristics of tonically active neurons of the primate's striatum. , 1995, Journal of neurophysiology.