The modulation of savouring by prediction error and its effects on choice

When people anticipate uncertain future outcomes, they often prefer to know their fate in advance. Inspired by an idea in behavioral economics that the anticipation of rewards is itself attractive, we hypothesized that this preference of advance information arises because reward prediction errors carried by such information can boost the level of anticipation. We designed new empirical behavioral studies to test this proposal, and confirmed that subjects preferred advance reward information more strongly when they had to wait for rewards for a longer time. We formulated our proposal in a reinforcement-learning model, and we showed that our model could account for a wide range of existing neuronal and behavioral data, without appealing to ambiguous notions such as an explicit value for information. We suggest that such boosted anticipation significantly drives risk-seeking behaviors, most pertinently in gambling. DOI: http://dx.doi.org/10.7554/eLife.13747.001

[1]  Peter Dayan,et al.  Dopamine: generalization and bonuses , 2002, Neural Networks.

[2]  Paola Sapienza,et al.  Time Discounting for Primary and Monetary Rewards , 2010 .

[3]  W. Schultz,et al.  Adaptive Coding of Reward Value by Dopamine Neurons , 2005, Science.

[4]  Holly C. Miller,et al.  Preference for 50% reinforcement over 75% reinforcement by pigeons , 2009, Learning & behavior.

[5]  Raymond J. Dolan,et al.  The anatomy of choice: active inference and agency , 2013, Front. Hum. Neurosci..

[6]  T. Zentall Resolving the paradox of suboptimal choice. , 2016, Journal of experimental psychology. Animal learning and cognition.

[7]  Ethan S. Bromberg-Martin,et al.  Lateral habenula neurons signal errors in the prediction of reward information , 2011, Nature Neuroscience.

[8]  B. Hayden,et al.  The Psychology and Neuroscience of Curiosity , 2015, Neuron.

[9]  S. Kapur,et al.  Direct Activation of the Ventral Striatum in Anticipation of Aversive Stimuli , 2003, Neuron.

[10]  Vaughn L. Hetrick,et al.  Mesolimbic Dopamine Signals the Value of Work , 2015, Nature Neuroscience.

[11]  Okihide Hikosaka,et al.  Selective and graded coding of reward uncertainty by neurons in the primate anterodorsal septal region , 2013, Nature Neuroscience.

[12]  P. Tobler,et al.  Restricting Temptations: Neural Mechanisms of Precommitment , 2013, Neuron.

[13]  G. Pagnoni,et al.  Neurobiological Substrates of Dread , 2006, Science.

[14]  Samuel M. McClure,et al.  Time Discounting for Primary Rewards , 2007, The Journal of Neuroscience.

[15]  O. Hikosaka,et al.  Lateral habenula as a source of negative reward signals in dopamine neurons , 2007, Nature.

[16]  Benjamin Y. Hayden,et al.  Temporal Discounting Predicts Risk Sensitivity in Rhesus Macaques , 2007, Current Biology.

[17]  P. Phillips,et al.  Dynamic shaping of dopamine signals during probabilistic Pavlovian conditioning , 2015, Neurobiology of Learning and Memory.

[18]  G. Loewenstein,et al.  Preferences for sequences of outcomes. , 1993 .

[19]  Raymond J. Dolan,et al.  Disentangling the Roles of Approach, Activation and Valence in Instrumental and Pavlovian Responding , 2011, PLoS Comput. Biol..

[20]  R. Dunn,et al.  Contiguity and conditioned reinforcement in probabilistic choice. , 1997, Journal of the experimental analysis of behavior.

[21]  Kenji Doya,et al.  Humans Can Adopt Optimal Discounting Strategy under Real-Time Constraints , 2006, PLoS Comput. Biol..

[22]  N. Aplin,et al.  PSYCHOLOGICAL EXPECTED UTILITY THEORY AND ANTICIPATORY FEELINGS * A , 1997 .

[23]  Raymond J. Dolan,et al.  Dread and the Disvalue of Future Pain , 2013, PLoS Comput. Biol..

[24]  T. Zentall,et al.  Observing Behavior in Pigeons: The Effect of Reinforcement Probability and Response Cost Using a Symmetrical Choice Procedure , 1999 .

[25]  C. Fiorillo Two Dimensions of Value: Dopamine Neurons Represent Reward But Not Aversiveness , 2013, Science.

[26]  J. O'Doherty,et al.  Reward Value Coding Distinct From Risk Attitude-Related Uncertainty Coding in Human Reward Systems , 2006, Journal of neurophysiology.

[27]  Marco Vasconcelos,et al.  Irrational choice and the value of information , 2015, Scientific Reports.

[28]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[29]  M. Ungless,et al.  Phasic excitation of dopamine neurons in ventral VTA by noxious stimuli , 2009, Proceedings of the National Academy of Sciences.

[30]  Tommy C. Blanchard,et al.  Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision. , 2015, Journal of neurophysiology.

[31]  Elliot A. Ludvig,et al.  When good news leads to bad choices. , 2016, Journal of the experimental analysis of behavior.

[32]  Zeb Kurth-Nelson,et al.  Single- and cross-commodity discounting among cocaine addicts: the commodity and its temporal location determine discounting rate , 2011, Psychopharmacology.

[33]  P. Glimcher,et al.  An "as soon as possible" effect in human intertemporal decision making: behavioral evidence and neural mechanisms. , 2010, Journal of neurophysiology.

[34]  E. Vaadia,et al.  Coincident but Distinct Messages of Midbrain Dopamine and Striatal Tonically Active Neurons , 2004, Neuron.

[35]  Jordan Litman Curiosity and the pleasures of learning: Wanting and liking new information , 2005 .

[36]  C. Gallistel,et al.  Toward a neurobiology of temporal cognition: advances and challenges , 1997, Current Opinion in Neurobiology.

[37]  Tommy C. Blanchard,et al.  Orbitofrontal Cortex Uses Distinct Codes for Different Choice Attributes in Decisions Motivated by Curiosity , 2015, Neuron.

[38]  Peter Dayan,et al.  Pavlovian-Instrumental Interaction in ‘Observing Behavior’ , 2010, PLoS Comput. Biol..

[39]  Saori C. Tanaka,et al.  Low-Serotonin Levels Increase Delayed Reward Discounting in Humans , 2008, The Journal of Neuroscience.

[40]  Jessica P. Stagner,et al.  Suboptimal choice behavior by pigeons , 2010, Psychonomic bulletin & review.

[41]  W. Schultz,et al.  Discrete Coding of Reward Probability and Uncertainty by Dopamine Neurons , 2003, Science.

[42]  R. Zatorre,et al.  Anatomically distinct dopamine release during anticipation and experience of peak emotion to music , 2011, Nature Neuroscience.

[43]  Ethan S. Bromberg-Martin,et al.  Midbrain Dopamine Neurons Signal Preference for Advance Information about Upcoming Rewards , 2009, Neuron.

[44]  P. Glimcher,et al.  Phasic Dopamine Release in the Rat Nucleus Accumbens Symmetrically Encodes a Reward Prediction Error Term , 2014, The Journal of Neuroscience.

[45]  D. Kahneman,et al.  Well-being : the foundations of hedonic psychology , 1999 .

[46]  R. Barnet,et al.  Suboptimal choice in a percentage-reinforcement procedure: effects of signal condition and terminal-link length. , 1990, Journal of the experimental analysis of behavior.

[47]  P. Dayan,et al.  A computational and neural model of momentary subjective well-being , 2014, Proceedings of the National Academy of Sciences.

[48]  P. Dayan,et al.  Dopaminergic Modulation of Decision Making and Subjective Well-Being , 2015, The Journal of Neuroscience.

[49]  G. Loewenstein The psychology of curiosity: A review and reinterpretation. , 1994 .

[50]  Samuel M. McClure,et al.  Temporal Prediction Errors in a Passive Learning Task Activate Human Striatum , 2003, Neuron.

[51]  Peter Dayan,et al.  Tamping Ramping: Algorithmic, Implementational, and Computational Explanations of Phasic Dopamine Signals in the Accumbens , 2015, PLoS Comput. Biol..

[52]  C. Fiorillo Transient activation of midbrain dopamine neurons by reward risk , 2011, Neuroscience.

[53]  Pierre-Yves Oudeyer,et al.  Information-seeking, curiosity, and attention: computational and neural mechanisms , 2013, Trends in Cognitive Sciences.

[54]  Karl J. Friston,et al.  Active inference and epistemic value , 2015, Cognitive neuroscience.

[55]  A. Hariri,et al.  Preference for Immediate over Delayed Rewards Is Associated with Magnitude of Ventral Striatal Activity , 2006, The Journal of Neuroscience.

[56]  A. Graybiel,et al.  Prolonged Dopamine Signalling in Striatum Signals Proximity and Value of Distant Rewards , 2013, Nature.

[57]  M. Platt,et al.  Risk-sensitive neurons in macaque posterior cingulate cortex , 2005, Nature Neuroscience.

[58]  S. Lammel,et al.  Reward and aversion in a heterogeneous midbrain dopamine system , 2014, Neuropharmacology.

[59]  G. Loewenstein Anticipation and the Valuation of Delayed Consumption , 1987 .

[60]  D. Campbell,et al.  Hedonic relativism and planning the good society , 1971 .

[61]  A. Odum,et al.  Impulsivity and cigarette smoking: delay discounting in current, never, and ex-smokers , 1999, Psychopharmacology.

[62]  Ilya E. Monosov,et al.  Neurons in the Primate Medial Basal Forebrain Signal Combined Information about Reward Uncertainty, Value, and Punishment Anticipation , 2015, The Journal of Neuroscience.

[63]  Kfir Eliaz,et al.  Paying for Confidence: An Experimental Study of the Demand for Non-Instrumental Information , 2009, Games Econ. Behav..

[64]  Holly C Miller,et al.  Decision making by humans in a behavioral task: Do humans, like pigeons, show suboptimal choice? , 2012, Learning & behavior.

[65]  P. Dayan,et al.  Cortical substrates for exploratory decisions in humans , 2006, Nature.

[66]  Marcia L. Spetch,et al.  When good pigeons make bad decisions: Choice with probabilistic delays and outcomes. , 2015, Journal of the experimental analysis of behavior.

[67]  Mel W. Khaw,et al.  Normalization is a general neural mechanism for context-dependent decision making , 2013, Proceedings of the National Academy of Sciences.