Neural signature of fictive learning signals in a sequential investment task

Reinforcement learning models now provide principled guides for a wide range of reward learning experiments in animals and humans. One key learning (error) signal in these models is experiential and reports ongoing temporal differences between expected and experienced reward. However, these same abstract learning models also accommodate the existence of another class of learning signal that takes the form of a fictive error encoding ongoing differences between experienced returns and returns that “could-have-been-experienced” if decisions had been different. These observations suggest the hypothesis that, for all real-world learning tasks, one should expect the presence of both experiential and fictive learning signals. Motivated by this possibility, we used a sequential investment game and fMRI to probe ongoing brain responses to both experiential and fictive learning signals generated throughout the game. Using a large cohort of subjects (n = 54), we report that fictive learning signals strongly predict changes in subjects' investment behavior and correlate with fMRI signals measured in dopaminoceptive structures known to be involved in valuation and choice.

[1]  Samuel M. McClure,et al.  Policy Adjustment in a Dynamic Economic Game , 2006, PloS one.

[2]  A. Damasio,et al.  Deciding Advantageously Before Knowing the Advantageous Strategy , 1997, Science.

[3]  D. Kahneman,et al.  Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses tasks with monetary payoffs , 2001 .

[4]  R. Sugden,et al.  Regret Theory: An alternative theory of rational choice under uncertainty Review of Economic Studies , 1982 .

[5]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[6]  Jonathan D. Cohen,et al.  Computational roles for dopamine in behavioural control , 2004, Nature.

[7]  W. Newsome,et al.  Matching Behavior and the Representation of Value in the Parietal Cortex , 2004, Science.

[8]  D. Heeger,et al.  Topographic organization for delayed saccades in human posterior parietal cortex. , 2005, Journal of neurophysiology.

[9]  David E. Bell,et al.  Regret in Decision Making under Uncertainty , 1982, Oper. Res..

[10]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[11]  K. Doya,et al.  A Neural Correlate of Reward-Based Behavioral Learning in Caudate Nucleus: A Functional Magnetic Resonance Imaging Study of a Stochastic Decision Task , 2004, The Journal of Neuroscience.

[12]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[13]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[14]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[16]  P. Glimcher,et al.  Midbrain Dopamine Neurons Encode a Quantitative Reward Prediction Error Signal , 2005, Neuron.

[17]  N. Roese,et al.  What Might Have Been: The Social Psychology of Counterfactual Thinking , 1995 .

[18]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[19]  A. Redish,et al.  Addiction as a Computational Process Gone Awry , 2004, Science.

[20]  S. Quartz,et al.  Getting to Know You: Reputation and Trust in a Two-Person Economic Exchange , 2005, Science.

[21]  Jonathan D. Cohen,et al.  The Neural Basis of Economic Decision-Making in the Ultimatum Game , 2003, Science.

[22]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[23]  M. T. Madsen,et al.  Contrast enhancement in OSEM reconstruction-2002 IEEE nuclear science symposium and medical imaging conference , 2002, 2002 IEEE Nuclear Science Symposium Conference Record.

[24]  W. Schultz,et al.  Dopamine responses comply with basic assumptions of formal learning theory , 2001, Nature.

[25]  J. Ashburner,et al.  Nonlinear spatial normalization using basis functions , 1999, Human brain mapping.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  J. O'Doherty,et al.  Reward representations and reward-related learning in the human brain: insights from neuroimaging , 2004, Current Opinion in Neurobiology.

[28]  J. O'Doherty,et al.  Empathic neural responses are modulated by the perceived fairness of others , 2006, Nature.

[29]  A. Barto,et al.  LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING Scaling Up to the Real World , 2003 .

[30]  A. Sirigu,et al.  The Involvement of the Orbitofrontal Cortex in the Experience of Regret , 2004, Science.

[31]  Jonathan D. Cohen,et al.  Imaging valuation models in human choice. , 2006, Annual review of neuroscience.

[32]  Brian Knutson,et al.  Anticipation of Increasing Monetary Reward Selectively Recruits Nucleus Accumbens , 2001, The Journal of Neuroscience.

[33]  J. E. Davis,et al.  Crossing cultural divides: moral conflict and the Cairo population conference. , 1995, Virginia review of sociology.

[34]  A. Dickinson,et al.  Neuronal coding of prediction errors. , 2000, Annual review of neuroscience.

[35]  Michael L. Platt,et al.  Neural correlates of decision variables in parietal cortex , 1999, Nature.

[36]  Yu-Lin Ko,et al.  Giant coronary artery aneurysm mimicking a paracardiac mass. , 2003, Chang Gung medical journal.

[37]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[38]  A. Tversky,et al.  Prospect Theory : An Analysis of Decision under Risk Author ( s ) : , 2007 .

[39]  P. Dayan,et al.  A framework for mesencephalic dopamine systems based on predictive Hebbian learning , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[40]  S. Inati,et al.  An fMRI study of reward-related probability learning , 2005, NeuroImage.

[41]  I. Ritov,et al.  Emotion-based choice , 1999 .

[42]  Karl J. Friston,et al.  Movement‐Related effects in fMRI time‐series , 1996, Magnetic resonance in medicine.

[43]  Y. Geda,et al.  Pathological gambling caused by drugs used to treat Parkinson disease. , 2005, Archives of neurology.

[44]  D. Wilkin,et al.  Neuron , 2001, Brain Research.

[45]  A. Tversky,et al.  Prospect theory: analysis of decision under risk , 1979 .

[46]  Camelia M. Kuhnen,et al.  The Neural Basis of Financial Risk Taking , 2005, Neuron.

[47]  G. Pagnoni,et al.  A Neural Basis for Social Cooperation , 2002, Neuron.

[48]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[49]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.