Striatal and Tegmental Neurons Code Critical Signals for Temporal-Difference Learning of State Value in Domestic Chicks

To ensure survival, animals must update the internal representations of their environment in a trial-and-error fashion. Psychological studies of associative learning and neurophysiological analyses of dopaminergic neurons have suggested that this updating process involves the temporal-difference (TD) method in the basal ganglia network. However, the way in which the component variables of the TD method are implemented at the neuronal level is unclear. To investigate the underlying neural mechanisms, we trained domestic chicks to associate color cues with food rewards. We recorded neuronal activities from the medial striatum or tegmentum in a freely behaving condition and examined how reward omission changed neuronal firing. To compare neuronal activities with the signals assumed in the TD method, we simulated the behavioral task in the form of a finite sequence composed of discrete steps of time. The three signals assumed in the simulated task were the prediction signal, the target signal for updating, and the TD-error signal. In both the medial striatum and tegmentum, the majority of recorded neurons were categorized into three types according to their fitness for three models, though these neurons tended to form a continuum spectrum without distinct differences in the firing rate. Specifically, two types of striatal neurons successfully mimicked the target signal and the prediction signal. A linear summation of these two types of striatum neurons was a good fit for the activity of one type of tegmental neurons mimicking the TD-error signal. The present study thus demonstrates that the striatum and tegmentum can convey the signals critically required for the TD method. Based on the theoretical and neurophysiological studies, together with tract-tracing data, we propose a novel model to explain how the convergence of signals represented in the striatum could lead to the computation of TD error in tegmental dopaminergic neurons.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  T. Robbins,et al.  The effects of ibotenic acid lesions of the nucleus accumbens on spatial learning and extinction in the rat , 1989, Behavioural Brain Research.

[3]  A. Reiner,et al.  The patterns of neurotransmitter and neuropeptide co-occurrence among striatal projection neurons: conclusions based on recent findings , 1990, Brain Research Reviews.

[4]  A. Reiner,et al.  Ultrastructural single‐ and double‐label immunohistochemical studies of substance P‐containing terminals and dopaminergic neurons in the substantia nigra in pigeons , 1991, The Journal of comparative neurology.

[5]  A. Reiner,et al.  The distribution of GABA‐containing perikarya, fibers, and terminals in the forebrain and midbrain of pigeons, with particular reference to the basal ganglia and its projection targets , 1994, The Journal of comparative neurology.

[6]  A. Csillag,et al.  Connectivity of the lobus parolfactorius of the domestic chicken (Gallus domesticus): An anterograde and retrograde pathway tracing study , 1994, The Journal of comparative neurology.

[7]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[8]  A. Barto,et al.  Adaptive Critics and the Basal Ganglia , 1994 .

[9]  A. Reiner,et al.  Organization of the avian “corticostriatal” projection system: A retrograde and anterograde pathway tracing study in pigeons , 1995, The Journal of comparative neurology.

[10]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[11]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[12]  B. Richmond,et al.  Neuronal Signals in the Monkey Ventral Striatum Related to Progress through a Predictable Series of Trials , 1998, The Journal of Neuroscience.

[13]  J. A. Pruszynski,et al.  Neural correlates , 2023 .

[14]  O. Hikosaka,et al.  Expectation of reward modulates cognitive signals in the basal ganglia , 1998, Nature Neuroscience.

[15]  J. Hollerman,et al.  Modifications of reward expectation-related neuronal activity during learning in primate striatum. , 1998, Journal of neurophysiology.

[16]  O. Güntürkün,et al.  Afferent and efferent connections of the caudolateral neostriatum in the pigeon (Columba livia): A retro‐ and anterograde pathway tracing study , 1999, The Journal of comparative neurology.

[17]  Richard B. Ivry,et al.  Hemispheric Asymmetries , 2000, Encyclopedia of Personality and Individual Differences.

[18]  Toshiya Matsushima,et al.  The role of basal ganglia in reinforcement learning and imprinting in domestic chicks , 2001, Neuroreport.

[19]  T. Matsushima,et al.  Reward-related neuronal activities in basal ganglia of domestic chicks , 2001, Neuroreport.

[20]  Onur Güntürkün,et al.  Working Memory Neurons in Pigeons , 2002, The Journal of Neuroscience.

[21]  Eytan Ruppin,et al.  Actor-critic models of the basal ganglia: new anatomical and computational perspectives , 2002, Neural Networks.

[22]  B. Richmond,et al.  Anterior Cingulate: Single Neuronal Signals Related to Degree of Reward Expectancy , 2002, Science.

[23]  A. Csillag,et al.  Selective striatal connections of midbrain dopaminergic nuclei in the chick (Gallus domesticus) , 2002, Cell and Tissue Research.

[24]  Naoya Aoki,et al.  The Mind Through Chick Eyes : Memory, Cognition and Anticipation , 2003, Zoological science.

[25]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[26]  T. Matsushima,et al.  Neural correlates of memorized associations and cued movements in archistriatum of the domestic chick , 2003, The European journal of neuroscience.

[27]  T. Matsushima,et al.  Localized Lesion of Caudal Part of Lobus Parolfactorius Caused Impulsive Choice in the Domestic Chick: Evolutionarily Conserved Function of Ventral Striatum , 2003, The Journal of Neuroscience.

[28]  Patricia H. Janak,et al.  Dynamics of neural coding in the accumbens during extinction and reinstatement of rewarded behavior , 2004, Behavioural Brain Research.

[29]  Gerald E. Hough,et al.  Revised nomenclature for avian telencephalon and some related brainstem nuclei , 2004, The Journal of comparative neurology.

[30]  T. Matsushima,et al.  Excitotoxic lesions of the medial striatum delay extinction of a reinforcement color discrimination operant task in domestic chicks; a functional role of reward anticipation. , 2004, Brain research. Cognitive brain research.

[31]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[32]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[33]  Naoya Aoki,et al.  Neural correlates of the proximity and quantity of anticipated food rewards in the ventral striatum of domestic chicks , 2005, The European journal of neuroscience.

[34]  S. Grillner,et al.  Mechanisms for selection of basic motor programs – roles for the striatum and pallidum , 2005, Trends in Neurosciences.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[36]  T. Matsushima,et al.  Localized lesions of arcopallium intermedium of the lateral forebrain caused a handling‐cost aversion in the domestic chick performing a binary choice task , 2006, The European journal of neuroscience.

[37]  Samuel D. Gale,et al.  Physiological properties of zebra finch ventral tegmental area and substantia nigra pars compacta neurons. , 2006, Journal of neurophysiology.

[38]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[39]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[40]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[41]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[42]  K. Doya Reinforcement learning: Computational theory and biological mechanisms , 2007, HFSP journal.

[43]  Trevor W Robbins,et al.  Lesions of the Medial Striatum in Monkeys Produce Perseverative Impairments during Reversal Learning Similar to Those Produced by Lesions of the Orbitofrontal Cortex , 2008, The Journal of Neuroscience.

[44]  P. Rueda-Orozco,et al.  Impairment of endocannabinoids activity in the dorsolateral striatum delays extinction of behavior in a procedural memory task in rats , 2008, Neuropharmacology.

[45]  W. K. Simmons,et al.  Circular analysis in systems neuroscience: the dangers of double dipping , 2009, Nature Neuroscience.

[46]  Jung Hoon Sul,et al.  Role of Striatum in Updating Values of Chosen Actions , 2009, The Journal of Neuroscience.

[47]  P. Apicella,et al.  Tonically active neurons in the striatum differentiate between delivery and omission of expected reward in a probabilistic task context , 2009, The European journal of neuroscience.

[48]  P. Montague,et al.  Theoretical and Empirical Studies of Learning , 2009 .

[49]  I. Pavlov,et al.  Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex , 2010, Annals of Neurosciences.

[50]  I. Hernádi,et al.  Reward Prediction Error Coding in Dorsal Striatal Neurons , 2010, The Journal of Neuroscience.

[51]  T. Robbins,et al.  Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats , 2010, Behavioural Brain Research.

[52]  A. Csillag,et al.  Efferent connections of nucleus accumbens subdivisions of the domestic chicken (gallus domesticus): An anterograde pathway tracing study , 2011, The Journal of comparative neurology.

[53]  T. Matsushima,et al.  Social Facilitation Revisited: Increase in Foraging Efforts and Synchronization of Running in Domestic Chicks , 2011, Front. Neurosci..

[54]  Anne E Carpenter,et al.  Neuron-type specific signals for reward and punishment in the ventral tegmental area , 2011, Nature.

[55]  O. Güntürkün,et al.  Hemispheric Asymmetries: The Comparative View , 2012, Front. Psychology.

[56]  Sachie K. Ogawa,et al.  Whole-Brain Mapping of Direct Inputs to Midbrain Dopamine Neurons , 2012, Neuron.

[57]  Kelly R. Tan,et al.  Cocaine Disinhibits Dopamine Neurons by Potentiation of GABA Transmission in the Ventral Tegmental Area , 2013, Science.

[58]  T. Matsushima,et al.  Competitor suppresses neuronal representation of food reward in the nucleus accumbens/medial striatum of domestic chicks , 2014, Behavioural Brain Research.

[59]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[60]  Naoshige Uchida,et al.  Arithmetic and local circuitry underlying dopamine prediction errors , 2015, Nature.

[61]  W. Schultz Neuronal Reward and Decision Signals: From Theories to Data. , 2015, Physiological reviews.

[62]  B. A. Conway,et al.  The effects of laforin, malin, Stbd1, and Ptg deficiencies on heart glycogen levels in Pompe disease mouse models , 2015 .

[63]  P. Tobler,et al.  Discrete coding of stimulus value, reward expectation, and reward prediction error in the dorsal striatum. , 2015, Journal of neurophysiology.

[64]  T. Matsushima,et al.  Dissociation of the neural substrates of foraging effort and its social facilitation in the domestic chick , 2015, Behavioural Brain Research.

[65]  Jeremiah Y. Cohen,et al.  Distributed and Mixed Information in Monosynaptic Inputs to Dopamine Neurons , 2016, Neuron.

[66]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.