Lateral Intraparietal Cortex and Reinforcement Learning during a Mixed-Strategy Game

Activity of the neurons in the lateral intraparietal cortex (LIP) displays a mixture of sensory, motor, and memory signals. Moreover, they often encode signals reflecting the accumulation of sensory evidence that certain eye movements might lead to a desirable outcome. However, when the environment changes dynamically, animals are also required to combine the information about its previously chosen actions and their outcomes appropriately to update continually the desirabilities of alternative actions. Here, we investigated whether LIP neurons encoded signals necessary to update an animal's decision-making strategies adaptively during a computer-simulated matching-pennies game. Using a reinforcement learning algorithm, we estimated the value functions that best predicted the animal's choices on a trial-by-trial basis. We found that, immediately before the animal revealed its choice, ∼18% of LIP neurons changed their activity according to the difference in the value functions for the two targets. In addition, a somewhat higher fraction of LIP neurons displayed signals related to the sum of the value functions, which might correspond to the state value function or an average rate of reward used as a reference point. Similar to the neurons in the prefrontal cortex, many LIP neurons also encoded the signals related to the animal's previous choices. Thus, the posterior parietal cortex might be a part of the network that provides the substrate for forming appropriate associations between actions and outcomes.

[1]  H. Seo,et al.  Valuation of uncertain and delayed rewards in primate prefrontal cortex , 2009, Neural Networks.

[2]  Yuhong Jiang,et al.  Inferior parietal lobule supports decision making under uncertainty in humans. , 2009, Cerebral cortex.

[3]  Daeyeol Lee,et al.  Behavioral and Neural Changes after Gains and Losses of Conditioned Reinforcers , 2009, The Journal of Neuroscience.

[4]  M. Dorris,et al.  Role of the Superior Colliculus in Choosing Mixed-Strategy Saccades , 2009, The Journal of Neuroscience.

[5]  H. Seo,et al.  Cortical mechanisms for reinforcement learning in competitive games , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Benjamin Y. Hayden,et al.  Posterior Cingulate Cortex Mediates Outcome-Contingent Allocation of Behavior , 2008, Neuron.

[7]  Joseph J. Paton,et al.  Moment-to-Moment Tracking of State Value in the Amygdala , 2008, The Journal of Neuroscience.

[8]  Daeyeol Lee,et al.  Prefrontal Coding of Temporally Discounted Values during Intertemporal Choice , 2008, Neuron.

[9]  P. Glimcher,et al.  Value Representations in the Primate Striatum during Matching Behavior , 2008, Neuron.

[10]  Daeyeol Lee Game theory and neural basis of social decision making , 2008, Nature Neuroscience.

[11]  Samuel M. McClure,et al.  BOLD Responses Reflecting Dopaminergic Signals in the Human Ventral Tegmental Area , 2008, Science.

[12]  P. Glimcher,et al.  Action and Outcome Encoding in the Primate Caudate Nucleus , 2007, The Journal of Neuroscience.

[13]  Daeyeol Lee,et al.  Encoding of action history in the rat ventral striatum. , 2007, Journal of neurophysiology.

[14]  R. Andersen,et al.  Posterior Parietal Cortex Encodes Autonomously Selected Motor Plans , 2007, Neuron.

[15]  H. Seo,et al.  Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. , 2007, Cerebral cortex.

[16]  H. Seo,et al.  Temporal Filtering of Reward Signals in the Dorsal Anterior Cingulate Cortex during a Mixed-Strategy Game , 2007, The Journal of Neuroscience.

[17]  J. Gold,et al.  The neural basis of decision making. , 2007, Annual review of neuroscience.

[18]  Michael N. Shadlen,et al.  Probabilistic reasoning by neurons , 2007, Nature.

[19]  H. Seo,et al.  Mechanisms of Reinforcement Learning and Decision Making in the Primate Dorsolateral Prefrontal Cortex , 2007, Annals of the New York Academy of Sciences.

[20]  Xiao-Jing Wang,et al.  Neural mechanism for stochastic behaviour during a competitive game , 2006, Neural Networks.

[21]  Timothy E. J. Behrens,et al.  Optimal decision making and the anterior cingulate cortex , 2006, Nature Neuroscience.

[22]  Jonathan D. Cohen,et al.  Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[23]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[24]  C. Padoa-Schioppa,et al.  Neurons in the orbitofrontal cortex encode economic value , 2006, Nature.

[25]  K. Doya,et al.  The computational neurobiology of learning and reward , 2006, Current Opinion in Neurobiology.

[26]  Daeyeol Lee Neural basis of quasi-rational decision making , 2006, Current Opinion in Neurobiology.

[27]  Robert Leonard Theory of Games and Economic Behavior , 2006 .

[28]  Joseph J. Paton,et al.  The primate amygdala represents the positive and negative value of visual stimuli during learning , 2006, Nature.

[29]  K. Doya,et al.  Representation of Action-Specific Reward Values in the Striatum , 2005, Science.

[30]  Jeffrey C. Cooper,et al.  Functional magnetic resonance imaging of reward prediction , 2005, Current opinion in neurology.

[31]  D. Barraclough,et al.  Reinforcement learning and decision making in monkeys during a competitive game. , 2004, Brain research. Cognitive brain research.

[32]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[33]  P. Glimcher,et al.  Activity in Posterior Parietal Cortex Is Correlated with the Relative Subjective Desirability of Action , 2004, Neuron.

[34]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[35]  Daeyeol Lee Behavioral Context and Coherent Oscillations in the Supplementary Motor Area , 2004, The Journal of Neuroscience.

[36]  D. Barraclough,et al.  Prefrontal cortex and decision making in a mixed-strategy game , 2004, Nature Neuroscience.

[37]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[38]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[39]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[40]  Y. Pawitan In all likelihood : statistical modelling and inference using likelihood , 2002 .

[41]  O. Hikosaka,et al.  Visual and Anticipatory Bias in Three Cortical Eye Fields of the Monkey during an Adaptive Decision-Making Task , 2002, The Journal of Neuroscience.

[42]  B. Richmond,et al.  Anterior Cingulate: Single Neuronal Signals Related to Degree of Reward Expectancy , 2002, Science.

[43]  O. Hikosaka,et al.  Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. , 2002, Journal of neurophysiology.

[44]  M. Shadlen,et al.  Effect of Expected Reward Magnitude on the Response of Neurons in the Dorsolateral Prefrontal Cortex of the Macaque , 1999, Neuron.

[45]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[46]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[47]  P. Goldman-Rakic,et al.  Matching patterns of activity in primate prefrontal area 8a and parietal area 7ip neurons during a spatial working memory task. , 1998, Journal of neurophysiology.

[48]  Masataka Watanabe Reward expectancy in primate prefrental neurons , 1996, Nature.

[49]  Dilip Mookherjee,et al.  Learning behavior in an experimental matching pennies game , 1994 .

[50]  R. Andersen,et al.  Saccade-related activity in the lateral intraparietal area. I. Temporal properties; comparison with area 7a. , 1991, Journal of neurophysiology.

[51]  A P Georgopoulos,et al.  On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex , 1982, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[52]  A. Tversky,et al.  Decision, probability, and utility: Prospect theory: An analysis of decision under risk , 1979 .

[53]  H. Helson Adaptation-level as a basis for a quantitative theory of frames of reference. , 1948, Psychological review.

[54]  A. Cooper,et al.  Predictive Reward Signal of Dopamine Neurons , 2011 .

[55]  Barry J Richmond,et al.  Dynamic changes in representations of preceding and upcoming reward in monkey orbitofrontal cortex. , 2008, Cerebral cortex.

[56]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[57]  Cerebral Cortex doi:10.1093/cercor/bhj046 Reward Encoding in the Monkey Anterior Cingulate Cortex , 2005 .

[58]  M. Kubovy,et al.  On the pleasures of the mind. , 1999 .

[59]  D. W. Hands The Matching Law: Papers In Psychology And Economics , 1999 .

[60]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[61]  W. J. Langford Statistical Methods , 1959, Nature.

[62]  H. Seung,et al.  JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR 2005, 84, 581–617 NUMBER 3(NOVEMBER) LINEAR-NONLINEAR-POISSON MODELS OF PRIMATE CHOICE DYNAMICS , 2022 .