Learning To Minimize Efforts versus Maximizing Rewards: Computational Principles and Neural Correlates

The mechanisms of reward maximization have been extensively studied at both the computational and neural levels. By contrast, little is known about how the brain learns to choose the options that minimize action cost. In principle, the brain could have evolved a general mechanism that applies the same learning rule to the different dimensions of choice options. To test this hypothesis, we scanned healthy human volunteers while they performed a probabilistic instrumental learning task that varied in both the physical effort and the monetary outcome associated with choice options. Behavioral data showed that the same computational rule, using prediction errors to update expectations, could account for both reward maximization and effort minimization. However, these learning-related variables were encoded in partially dissociable brain areas. In line with previous findings, the ventromedial prefrontal cortex was found to positively represent expected and actual rewards, regardless of effort. A separate network, encompassing the anterior insula, the dorsal anterior cingulate, and the posterior parietal cortex, correlated positively with expected and actual efforts. These findings suggest that the same computational rule is applied by distinct brain systems, depending on the choice dimension—cost or benefit—that has to be learned.

[1]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[2]  J. Pearce,et al.  A model for Pavlovian learning: variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980, Psychological review.

[3]  J. Pearce,et al.  A model for Pavlovian learning: Variations in the effectiveness of conditioned but not of unconditioned stimuli. , 1980 .

[4]  G. Loewenstein,et al.  Anomalies in Intertemporal Choice: Evidence and an Interpretation , 1992 .

[5]  L. Green,et al.  Temporal discounting and preference reversals in choice between delayed outcomes , 1994, Psychonomic bulletin & review.

[6]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[7]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[8]  Karl J. Friston,et al.  Brain Systems Mediating Aversive Conditioning: an Event-Related fMRI Study , 1998, Neuron.

[9]  Rajesh P. N. Rao,et al.  Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. , 1999 .

[10]  E. Rolls,et al.  Abstract reward and punishment representations in the human orbitofrontal cortex , 2001, Nature Neuroscience.

[11]  A. Kacelnik,et al.  To walk or to fly? How birds choose among foraging modes. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  A. Dale,et al.  Dorsal anterior cingulate cortex: A role in reward-based decision making , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  P. Dayan,et al.  Reward, Motivation, and Reinforcement Learning , 2002, Neuron.

[14]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[15]  P. Montague,et al.  Activity in human ventral striatum locked to errors of reward prediction , 2002, Nature Neuroscience.

[16]  R Turner,et al.  Optimized EPI for fMRI studies of the orbitofrontal cortex , 2003, NeuroImage.

[17]  Matthew F S Rushworth,et al.  Functional Specialization within Medial Frontal Cortex of the Anterior Cingulate for Evaluating Effort-Related Decisions , 2003, The Journal of Neuroscience.

[18]  Michael J. Frank,et al.  By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism , 2004, Science.

[19]  G. Glover,et al.  The Role of Ventral Frontostriatal Circuitry in Reward-Based Learning in Humans , 2005, The Journal of Neuroscience.

[20]  M. Walton,et al.  The mesocortical dopamine projection to anterior cingulate cortex plays no role in guiding effort-related decisions. , 2005, Behavioral neuroscience.

[21]  Karl J. Friston,et al.  A theory of cortical responses , 2005, Philosophical Transactions of the Royal Society B: Biological Sciences.

[22]  Jesper Andersson,et al.  Valid conjunction inference with the minimum statistic , 2005, NeuroImage.

[23]  Edward E. Smith,et al.  Altering expectancy dampens neural response to aversive taste in primary taste cortex , 2006, Nature Neuroscience.

[24]  Matthew F. S. Rushworth,et al.  Weighing up the benefits of work: Behavioral and neural analyses of effort-related decision making , 2006, Neural Networks.

[25]  Nikolaus Weiskopf,et al.  Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: A whole-brain analysis at 3 T and 1.5 T , 2006, NeuroImage.

[26]  M. Walton,et al.  Separate neural pathways process different decision costs , 2006, Nature Neuroscience.

[27]  R. Dolan,et al.  Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans , 2006, Nature.

[28]  J. O'Doherty,et al.  The Role of the Ventromedial Prefrontal Cortex in Abstract State-Based Inference during Decision Making in Humans , 2006, The Journal of Neuroscience.

[29]  M. Roesch,et al.  Encoding of Time-Discounted Rewards in Orbitofrontal Cortex Is Independent of Value Representation , 2006, Neuron.

[30]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[31]  G. Berns,et al.  Intertemporal choice – toward an integrative framework , 2007, Trends in Cognitive Sciences.

[32]  Karl J. Friston,et al.  Free-energy and the brain , 2007, Synthese.

[33]  P. Dayan,et al.  Differential Encoding of Losses and Gains in the Human Striatum , 2007, The Journal of Neuroscience.

[34]  P. Glimcher,et al.  The neural correlates of subjective value during intertemporal choice , 2007, Nature Neuroscience.

[35]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[36]  Mark W Woolrich,et al.  Associative learning of social value , 2008, Nature.

[37]  Colin Camerer,et al.  Dissociating the Role of the Orbitofrontal Cortex and the Striatum in the Computation of Goal Values and Prediction Errors , 2008, The Journal of Neuroscience.

[38]  Brian Knutson,et al.  Individual Differences in Insular Sensitivity During Loss Anticipation Predict Avoidance Learning , 2008, Psychological science.

[39]  S. Quartz,et al.  Human Insula Activation Reflects Risk Prediction Errors As Well As Risk , 2008, The Journal of Neuroscience.

[40]  C. Pennartz,et al.  Is a bird in the hand worth two in the future? The neuroeconomics of intertemporal decision-making , 2008, Progress in Neurobiology.

[41]  C. Büchel,et al.  Overlapping and Distinct Neural Systems Code for Subjective Value during Intertemporal and Risky Decision Making , 2009, The Journal of Neuroscience.

[42]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[43]  P. Dayan,et al.  How Humans Integrate the Prospects of Pain and Reward during Choice , 2009, The Journal of Neuroscience.

[44]  M. Pessiglione,et al.  Brain Hemispheres Selectively Track the Expected Value of Contralateral Options , 2009, The Journal of Neuroscience.

[45]  Timothy Edward John Behrens,et al.  Effort-Based Cost–Benefit Valuation and the Human Brain , 2009, The Journal of Neuroscience.

[46]  N. Daw,et al.  Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values , 2009, The Journal of Neuroscience.

[47]  C. Law,et al.  Reinforcement learning can account for associative and perceptual learning on a visual decision task , 2009, Nature Neuroscience.

[48]  Karl J. Friston,et al.  A Dual Role for Prediction Error in Associative Learning , 2008, Cerebral cortex.

[49]  Karl J. Friston,et al.  Comparing Families of Dynamic Causal Models , 2010, PLoS Comput. Biol..

[50]  M. Walton,et al.  Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine , 2009, Nature Neuroscience.

[51]  Mathias Pessiglione,et al.  Separate Valuation Subsystems for Delay and Effort Decision Costs , 2010, The Journal of Neuroscience.

[52]  J. O'Doherty,et al.  Appetitive and Aversive goal values are encoded in the medial orbitofrontal cortex at the time of decision-making , 2009, NeuroImage.

[53]  C. Büchel,et al.  Neural representations of subjective reward value , 2010, Behavioural Brain Research.

[54]  Antonio Rangel,et al.  Economic choices can be made using only stimulus values , 2010, Proceedings of the National Academy of Sciences.

[55]  Michael W. Spratling Predictive Coding as a Model of Response Properties in Cortical Area V1 , 2010, The Journal of Neuroscience.

[56]  Robert C. Wilson,et al.  An Approximately Bayesian Delta-Rule Model Explains the Dynamics of Belief Updating in a Changing Environment , 2010, The Journal of Neuroscience.

[57]  S. Haber,et al.  The Reward Circuit: Linking Primate Anatomy and Human Imaging , 2010, Neuropsychopharmacology.

[58]  P. Glimcher,et al.  Testing the Reward Prediction Error Hypothesis with an Axiomatic Model , 2010, The Journal of Neuroscience.

[59]  Karl J. Friston,et al.  A Bayesian Foundation for Individual Learning Under Uncertainty , 2011, Front. Hum. Neurosci..

[60]  Jan Peters,et al.  The neural mechanisms of inter-temporal decision-making: understanding variability , 2011, Trends in Cognitive Sciences.

[61]  Colin Camerer,et al.  Transformation of stimulus value signals into motor commands during simple choice , 2011, Proceedings of the National Academy of Sciences.

[62]  M. Roesch,et al.  Attention for Learning Signals in Anterior Cingulate Cortex , 2011, The Journal of Neuroscience.

[63]  John M. Pearson,et al.  Surprise Signals in Anterior Cingulate Cortex: Neuronal Encoding of Unsigned Reward Prediction Errors Driving Adjustment in Behavior , 2011, The Journal of Neuroscience.

[64]  J. Kable,et al.  Ventromedial Frontal Lobe Damage Disrupts Value Maximization in Humans , 2011, The Journal of Neuroscience.

[65]  M. Walton,et al.  Re‐evaluating the role of the orbitofrontal cortex in reward and reinforcement , 2012, The European journal of neuroscience.

[66]  Dino J. Levy,et al.  The root of all value: a neural common currency for choice , 2012, Current Opinion in Neurobiology.

[67]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[68]  B. Richmond,et al.  Is Working More Costly than Waiting in Monkeys? , 2012, PloS one.

[69]  James A. R. Marshall,et al.  Does natural selection favour the Rescorla-Wagner rule? , 2012, Journal of theoretical biology.

[70]  M. Pessiglione,et al.  Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning , 2012, Neuron.

[71]  William D. Penny,et al.  Comparing Dynamic Causal Models using AIC, BIC and Free Energy , 2012, NeuroImage.

[72]  Kyle E. Mathewson,et al.  Dissociable neural representations of reinforcement and belief prediction errors underlie strategic learning , 2012, Proceedings of the National Academy of Sciences.

[73]  P. Tobler,et al.  Parabolic discounting of monetary rewards by physical effort , 2013, Behavioural Processes.

[74]  Soyoung Q. Park,et al.  Neural Integration of Risk and Effort Costs by the Frontal Pole: Only upon Request , 2013, The Journal of Neuroscience.

[75]  R. Paz,et al.  Functional Connectivity between Amygdala and Cingulate Cortex for Adaptive Aversive Learning , 2013, Neuron.

[76]  Bijan Pesaran,et al.  Action selection in multi-effector decision making , 2013, NeuroImage.

[77]  Joshua I. Gold,et al.  A Mixture of Delta-Rules Approximation to Bayesian Inference in Change-Point Problems , 2013, PLoS Comput. Biol..

[78]  Joseph W. Kable,et al.  The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value , 2013, NeuroImage.

[79]  J. Daunizeau,et al.  Neurocomputational account of how the human brain decides when to have a break , 2013, Proceedings of the National Academy of Sciences.

[80]  P. Dayan,et al.  Effort and Valuation in the Brain: The Effects of Anticipation and Execution , 2013, The Journal of Neuroscience.

[81]  R. Turner,et al.  Limited Encoding of Effort by Dopamine Neurons in a Cost–Benefit Trade-off Task , 2013, The Journal of Neuroscience.

[82]  Peter Ford Dominey,et al.  Medial prefrontal cortex and the adaptive regulation of reinforcement learning parameters. , 2013, Progress in brain research.

[83]  Ravi V. Chacko,et al.  Effects of Amygdala Lesions on Reward-Value Coding in Orbital and Medial Prefrontal Cortex , 2013, Neuron.

[84]  Karl J. Friston,et al.  Bayesian model selection for group studies — Revisited , 2014, NeuroImage.

[85]  Lionel Rigoux,et al.  VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data , 2014, PLoS Comput. Biol..