Neural Correlates of Temporal Credit Assignment in the Parietal Lobe

Empirical studies of decision making have typically assumed that value learning is governed by time, such that a reward prediction error arising at a specific time triggers temporally-discounted learning for all preceding actions. However, in natural behavior, goals must be acquired through multiple actions, and each action can have different significance for the final outcome. As is recognized in computational research, carrying out multi-step actions requires the use of credit assignment mechanisms that focus learning on specific steps, but little is known about the neural correlates of these mechanisms. To investigate this question we recorded neurons in the monkey lateral intraparietal area (LIP) during a serial decision task where two consecutive eye movement decisions led to a final reward. The underlying decision trees were structured such that the two decisions had different relationships with the final reward, and the optimal strategy was to learn based on the final reward at one of the steps (the "F" step) but ignore changes in this reward at the remaining step (the "I" step). In two distinct contexts, the F step was either the first or the second in the sequence, controlling for effects of temporal discounting. We show that LIP neurons had the strongest value learning and strongest post-decision responses during the transition after the F step regardless of the serial position of this step. Thus, the neurons encode correlates of temporal credit assignment mechanisms that allocate learning to specific steps independently of temporal discounting.

[1]  Robert C. Wilson,et al.  Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex , 2011, Nature Neuroscience.

[2]  Y. Niv,et al.  Learning latent structure: carving nature at its joints , 2010, Current Opinion in Neurobiology.

[3]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[4]  S. Kakade,et al.  Learning and selective attention , 2000, Nature Neuroscience.

[5]  Kenway Louie,et al.  Separating Value from Choice: Delay Discounting Activity in the Lateral Intraparietal Area , 2010, The Journal of Neuroscience.

[6]  Stefano Fusi,et al.  Emotion, cognition, and mental state representation in amygdala and prefrontal cortex. , 2010, Annual review of neuroscience.

[7]  M. Shadlen,et al.  Representation of Confidence Associated with a Decision by Neurons in the Parietal Cortex , 2009, Science.

[8]  Robert C. Wilson,et al.  Rational regulation of learning dynamics by pupil–linked arousal systems , 2012, Nature Neuroscience.

[9]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[10]  Puiu F. Balan,et al.  Attention as a decision in information space , 2010, Trends in Cognitive Sciences.

[11]  Dana H. Ballard,et al.  Credit Assignment in Multiple Goal Embodied Visuomotor Behavior , 2010, Front. Psychology.

[12]  John R. Anderson,et al.  From recurrent choice to skill learning: a reinforcement-learning model. , 2006, Journal of experimental psychology. General.

[13]  N. Mackintosh,et al.  Two theories of attention: a review and a possible integration , 2010 .

[14]  Hatim A. Zariwala,et al.  Neural correlates, computation and behavioural impact of decision confidence , 2008, Nature.

[15]  E. Vaadia,et al.  Midbrain dopamine neurons encode decisions for future action , 2006, Nature Neuroscience.

[16]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  H. Seo,et al.  Neural basis of reinforcement learning and decision making. , 2012, Annual review of neuroscience.

[19]  E. Miller,et al.  Neural Activity in the Primate Prefrontal Cortex during Associative Learning , 1998, Neuron.

[20]  Peter Bossaerts,et al.  Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings , 2011, PLoS Comput. Biol..

[21]  Daeyeol Lee,et al.  Signals for Previous Goal Choice Persist in the Dorsomedial, but Not Dorsolateral Striatum of Rats , 2013, The Journal of Neuroscience.

[22]  P. Glimcher,et al.  The Neurobiology of Decision: Consensus and Controversy , 2009, Neuron.

[23]  P. Glimcher,et al.  Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex , 2011, The Journal of Neuroscience.

[24]  Samuel M. McClure,et al.  Short-term memory traces for action bias in human reinforcement learning , 2007, Brain Research.

[25]  M. Goldberg,et al.  Attention, intention, and priority in the parietal lobe. , 2010, Annual review of neuroscience.

[26]  Joseph J. Paton,et al.  The primate amygdala represents the positive and negative value of visual stimuli during learning , 2006, Nature.

[27]  C. Olson,et al.  Monkey Supplementary Eye Field Neurons Signal the Ordinal Position of Both Actions and Objects , 2009, The Journal of Neuroscience.

[28]  Daeyeol Lee,et al.  Prefrontal Neural Correlates of Memory for Sequences , 2007, The Journal of Neuroscience.

[29]  H. Seo,et al.  A reservoir of time constants for memory traces in cortical neurons , 2011, Nature Neuroscience.

[30]  W. Newsome,et al.  Choosing the greater of two goods: neural currencies for valuation and decision making , 2005, Nature Reviews Neuroscience.

[31]  Puiu F. Balan,et al.  Integration of Visuospatial and Effector Information during Symbolically Cued Limb Movements in Monkey Lateral Intraparietal Area , 2006, The Journal of Neuroscience.