Learning relative values in the striatum induces violations of normative decision making

To decide optimally between available options, organisms need to learn the values associated with these options. Reinforcement learning models offer a powerful explanation of how these values are learnt from experience. However, human choices often violate normative principles. We suggest that seemingly counterintuitive decisions may arise as a natural consequence of the learning mechanisms deployed by humans. Here, using fMRI and a novel behavioural task, we show that, when suddenly switched to novel choice contexts, participants’ choices are incongruent with values learnt by standard learning algorithms. Instead, behaviour is compatible with the decisions of an agent learning how good an option is relative to an option with which it had previously been paired. Striatal activity exhibits the characteristics of a prediction error used to update such relative option values. Our data suggest that choices can be biased by a tendency to learn option values with reference to the available alternatives.

[1]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[2]  Timothy Edward John Behrens,et al.  Ventromedial Prefrontal and Anterior Cingulate Cortex Adopt Choice and Default Reference Frames during Sequential Multi-Alternative Choice , 2013, The Journal of Neuroscience.

[3]  Jennifer A. Mangels,et al.  A Neostriatal Habit Learning System in Humans , 1996, Science.

[4]  M E Bitterman,et al.  The overlearning-extinction effect and successive negative contrast in honeybees (Apis mellifera). , 1984, Journal of comparative psychology.

[5]  Pete C. Trimmer,et al.  The evolution of decision rules in complex environments , 2014, Trends in Cognitive Sciences.

[6]  Alex Kacelnik,et al.  State-Dependent Learned Valuation Drives Choice in an Invertebrate , 2006, Science.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[8]  L. Hunt,et al.  A mechanism for value-guided choice based on the excitation-inhibition balance in prefrontal cortex , 2012, Nature Neuroscience.

[9]  N. Chater,et al.  Preference reversal in multiattribute choice. , 2010, Psychological review.

[10]  M. Ullsperger,et al.  Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices , 2011, The Journal of Neuroscience.

[11]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[12]  Nick Chater,et al.  Salience driven value integration explains decision biases and preference reversal , 2012, Proceedings of the National Academy of Sciences.

[13]  A. Rangel,et al.  Value normalization in decision making: theory and evidence , 2012, Current Opinion in Neurobiology.

[14]  M. Subrahmanyam Theory and Evidence , 2013 .

[15]  Tim W Fawcett,et al.  An Adaptive Response to Uncertainty Generates Positive and Negative Contrast Effects , 2013, Science.

[16]  Josiah R. Boivin,et al.  A Causal Link Between Prediction Errors, Dopamine Neurons and Learning , 2013, Nature Neuroscience.

[17]  P. Greengard,et al.  Dichotomous Dopaminergic Control of Striatal Synaptic Plasticity , 2008, Science.

[18]  Timothy Edward John Behrens,et al.  Dissociable contributions of ventromedial prefrontal and posterior parietal cortex to value-guided choice , 2014, NeuroImage.

[19]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[20]  Marios G Philiastides,et al.  A mechanistic account of value computation in the human brain , 2010, Proceedings of the National Academy of Sciences.

[21]  A. Dickinson,et al.  Parallel and interactive learning processes within the basal ganglia: Relevance for the understanding of addiction , 2009, Behavioural Brain Research.

[22]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[23]  M. Reuter,et al.  Genetically Determined Differences in Learning from Errors , 2007, Science.

[24]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[25]  D. Kahneman,et al.  Functional Imaging of Neural Responses to Expectancy and Experience of Monetary Gains and Losses tasks with monetary payoffs , 2001 .

[26]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[27]  D. Kumaran,et al.  Frames, Biases, and Rational Decision-Making in the Human Brain , 2006, Science.

[28]  R. J. McDonald,et al.  Multiple memory systems: The power of interactions , 2004, Neurobiology of Learning and Memory.

[29]  Timothy Edward John Behrens,et al.  Reward-Guided Learning with and without Causal Attribution , 2016, Neuron.

[30]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[31]  Stephen M. Smith,et al.  Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data , 2001, NeuroImage.

[32]  Karl J. Friston,et al.  Temporal Difference Models and Reward-Related Learning in the Human Brain , 2003, Neuron.

[33]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[34]  A I Houston,et al.  The ecological rationality of state-dependent valuation. , 2012, Psychological review.

[35]  Timothy E. J. Behrens,et al.  Neural Mechanisms of Foraging , 2012, Science.

[36]  Timothy Edward John Behrens,et al.  How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action , 2009, Neuron.

[37]  P. Dayan,et al.  Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[38]  Thomas H. B. FitzGerald,et al.  The Role of Human Orbitofrontal Cortex in Value Comparison for Incommensurable Objects , 2009, The Journal of Neuroscience.

[39]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[40]  M. Gluck,et al.  Interactive memory systems in the human brain , 2001, Nature.

[41]  R. Joosten,et al.  Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons , 2008, Science.

[42]  Scott D. Brown,et al.  Not Just for Consumers , 2013, Psychological science.

[43]  Alex Kacelnik,et al.  State-dependent learning and suboptimal choice: when starlings prefer long over short delays to food , 2005, Animal Behaviour.

[44]  M. Carandini,et al.  Normalization as a canonical neural computation , 2011, Nature Reviews Neuroscience.

[45]  Mark Jenkinson,et al.  Fast, automated, N‐dimensional phase‐unwrapping algorithm , 2003, Magnetic resonance in medicine.

[46]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[47]  Michael Brady,et al.  Improved Optimization for the Robust and Accurate Linear Registration and Motion Correction of Brain Images , 2002, NeuroImage.

[48]  Mark W. Woolrich,et al.  Advances in functional and structural MR image analysis and implementation as FSL , 2004, NeuroImage.

[49]  Alex Kacelnik,et al.  State-dependent valuation learning in fish: Banded tetras prefer stimuli associated with greater past deprivation , 2009, Behavioural Processes.

[50]  Michael J Frank,et al.  Negative symptoms and the failure to represent the expected reward value of actions: behavioral and computational modeling evidence. , 2012, Archives of general psychiatry.

[51]  Stephen M. Smith,et al.  A global optimisation method for robust affine registration of brain images , 2001, Medical Image Anal..

[52]  M. Woolrich,et al.  Mechanisms underlying cortical activity during value-guided choice , 2011, Nature Neuroscience.

[53]  Anne G E Collins,et al.  A Reinforcement Learning Mechanism Responsible for the Valuation of Free Choice , 2014, Neuron.

[54]  Karl J. Friston,et al.  Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning , 2004, Science.

[55]  M. Ullsperger,et al.  Differential Modulation of Reinforcement Learning by D2 Dopamine and NMDA Glutamate Receptor Antagonism , 2014, The Journal of Neuroscience.

[56]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.