A pallidus-habenula-dopamine pathway signals inferred stimulus values.

The reward value of a stimulus can be learned through two distinct mechanisms: reinforcement learning through repeated stimulus-reward pairings and abstract inference based on knowledge of the task at hand. The reinforcement mechanism is often identified with midbrain dopamine neurons. Here we show that a neural pathway controlling the dopamine system does not rely exclusively on either stimulus-reward pairings or abstract inference but instead uses a combination of the two. We trained monkeys to perform a reward-biased saccade task in which the reward values of two saccade targets were related in a systematic manner. Animals used each trial's reward outcome to learn the values of both targets: the target that had been presented and whose reward outcome had been experienced (experienced value) and the target that had not been presented but whose value could be inferred from the reward statistics of the task (inferred value). We then recorded from three populations of reward-coding neurons: substantia nigra dopamine neurons; a major input to dopamine neurons, the lateral habenula; and neurons that project to the lateral habenula, located in the globus pallidus. All three populations encoded both experienced values and inferred values. In some animals, neurons encoded experienced values more strongly than inferred values, and the animals showed behavioral evidence of learning faster from experience than from inference. Our data indicate that the pallidus-habenula-dopamine pathway signals reward values estimated through both experience and inference.

[1]  D R MEYER,et al.  Food deprivation and discrimination reversal learning by monkeys. , 1951, Journal of experimental psychology.

[2]  A. Parent,et al.  The origin of forebrain afferents to the habenula in rat, cat and monkey , 1981, Brain Research Bulletin.

[3]  K. Wilcox,et al.  Stimulation of the lateral habenula inhibits dopamine-containing neurons in the substantia nigra and ventral tegmental area of the rat , 1986, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  M. Gluck,et al.  Hippocampal mediation of stimulus representation: A computational theory , 1993, Hippocampus.

[5]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[6]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[7]  J. D. McGaugh,et al.  Inactivation of Hippocampus or Caudate Nucleus with Lidocaine Differentially Affects Expression of Place and Response Learning , 1996, Neurobiology of Learning and Memory.

[8]  J. Hollerman,et al.  Dopamine neurons report an error in the temporal prediction of reward during learning , 1998, Nature Neuroscience.

[9]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[10]  K. Doya,et al.  Parallel neural networks for learning sequential procedures , 1999, Trends in Neurosciences.

[11]  E. Miller,et al.  An integrative theory of prefrontal cortex function. , 2001, Annual review of neuroscience.

[12]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[13]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[14]  Tatsuo K Sato,et al.  Correlated Coding of Motivation and Outcome of Decision by Dopamine Neurons , 2003, The Journal of Neuroscience.

[15]  R. Wise Dopamine, learning and motivation , 2004, Nature Reviews Neuroscience.

[16]  O. Hikosaka,et al.  Dopamine Neurons Can Represent Context-Dependent Prediction Error , 2004, Neuron.

[17]  O. Hikosaka,et al.  A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. , 2004, Journal of neurophysiology.

[18]  L. Squire,et al.  Robust habit learning in the absence of awareness and independent of the medial temporal lobe , 2005, Nature.

[19]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[21]  O. Hikosaka,et al.  Immediate changes in anticipatory activity of caudate neurons associated with reversal of position-reward contingency. , 2005, Journal of neurophysiology.

[22]  M. Kawato,et al.  Different neural correlates of reward expectation and reward expectation error in the putamen and caudate nucleus during stimulus-action-reward association learning. , 2006, Journal of neurophysiology.

[23]  H. Yin,et al.  The role of the basal ganglia in habit formation , 2006, Nature Reviews Neuroscience.

[24]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[25]  David S. Touretzky,et al.  Representation and Timing in Theories of the Dopamine System , 2006, Neural Computation.

[26]  P. Shepard,et al.  Lateral Habenula Stimulation Inhibits Rat Midbrain Dopamine Neurons through a GABAA Receptor-Mediated Mechanism , 2007, The Journal of Neuroscience.

[27]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[28]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[29]  M. Roesch,et al.  Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards , 2007, Nature Neuroscience.

[30]  K. Sakai Task set and prefrontal cortex. , 2008, Annual review of neuroscience.

[31]  Simon Hong,et al.  The Globus Pallidus Sends Reward-Related Signals to the Lateral Habenula , 2008, Neuron.

[32]  I. Tsuda,et al.  Reward prediction based on stimulus categorization in primate lateral prefrontal cortex , 2008, Nature Neuroscience.

[33]  P. Dayan,et al.  Reinforcement learning: The Good, The Bad and The Ugly , 2008, Current Opinion in Neurobiology.

[34]  W. Pan,et al.  Tripartite Mechanism of Extinction Suggested by Dopamine Neuron Activity and Temporal Difference Model , 2008, The Journal of Neuroscience.

[35]  O. Hikosaka,et al.  Two types of dopamine neuron distinctly convey positive and negative motivational signals , 2009, Nature.

[36]  P. Dayan,et al.  States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning , 2010, Neuron.