Confirmation bias in human reinforcement learning: Evidence from counterfactual feedback processing

Previous studies suggest that factual learning, that is, learning from obtained outcomes, is biased, such that participants preferentially take into account positive, as compared to negative, prediction errors. However, whether or not the prediction error valence also affects counterfactual learning, that is, learning from forgone outcomes, is unknown. To address this question, we analysed the performance of two groups of participants on reinforcement learning tasks using a computational model that was adapted to test if prediction error valence influences learning. We carried out two experiments: in the factual learning experiment, participants learned from partial feedback (i.e., the outcome of the chosen option only); in the counterfactual learning experiment, participants learned from complete feedback information (i.e., the outcomes of both the chosen and unchosen option were displayed). In the factual learning experiment, we replicated previous findings of a valence-induced bias, whereby participants learned preferentially from positive, relative to negative, prediction errors. In contrast, for counterfactual learning, we found the opposite valence-induced bias: negative prediction errors were preferentially taken into account, relative to positive ones. When considering valence-induced bias in the context of both factual and counterfactual learning, it appears that people tend to preferentially take into account information that confirms their current choice.

[1]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[2]  J. Busemeyer,et al.  The effect of foregone payoffs on underweighting small probability events , 2006 .

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  P. Dayan,et al.  Neural Prediction Errors Reveal a Risk-Sensitive Reinforcement-Learning Process in the Human Brain , 2012, The Journal of Neuroscience.

[5]  Romain D. Cazé,et al.  Adaptive properties of differential learning rates for positive and negative outcomes , 2013, Biological Cybernetics.

[6]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[7]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[8]  Ulrike Hahn,et al.  Unrealistic optimism about future life events: a cautionary note. , 2011, Psychological review.

[9]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[10]  W. Schultz,et al.  Neural mechanisms of observational learning , 2010, Proceedings of the National Academy of Sciences.

[11]  Timothy E. J. Behrens,et al.  Learning the value of information in an uncertain world , 2007, Nature Neuroscience.

[12]  Michael X. Cohen,et al.  Striatum-medial prefrontal cortex connectivity predicts developmental changes in reinforcement learning. , 2012, Cerebral cortex.

[13]  Colin Camerer,et al.  A framework for studying the neurobiology of value-based decision making , 2008, Nature Reviews Neuroscience.

[14]  Christian Büchel,et al.  How initial confirmatory experience potentiates the detrimental influence of bad advice , 2013, NeuroImage.

[15]  Karl J. Friston,et al.  Reinforcement Learning or Active Inference? , 2009, PloS one.

[16]  Richard Gonzalez,et al.  Computational Models for the Combination of Advice and Individual Learning , 2009, Cogn. Sci..

[17]  Timothy Edward John Behrens,et al.  How Green Is the Grass on the Other Side? Frontopolar Cortex and the Evidence in Favor of Alternative Courses of Action , 2009, Neuron.

[18]  M. Lebreton,et al.  Behavioural and neural characterization of optimistic reinforcement learning , 2017, Nature Human Behaviour.

[19]  P. Dayan,et al.  Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.

[20]  M. Pessiglione,et al.  Brain Hemispheres Selectively Track the Expected Value of Contralateral Options , 2009, The Journal of Neuroscience.

[21]  Timothy E. J. Behrens,et al.  Anxious individuals have difficulty learning the causal statistics of aversive environments , 2015, Nature Neuroscience.

[22]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[23]  Lionel Rigoux,et al.  VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data , 2014, PLoS Comput. Biol..

[24]  M. Frank,et al.  From reinforcement learning models to psychiatric and neurological disorders , 2011, Nature Neuroscience.

[25]  S. Schwartz,et al.  Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning , 2016, PloS one.

[26]  Hauke R. Heekeren,et al.  The Neural Basis of Following Advice , 2011, PLoS biology.

[27]  E. Koechlin,et al.  The Importance of Falsification in Computational Cognitive Modeling , 2017, Trends in Cognitive Sciences.

[28]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  M. Frank,et al.  Instructional control of reinforcement learning: A behavioral and neurocomputational investigation , 2009, Brain Research.

[31]  Bradley C. Love,et al.  You don’t want to know what you’re missing: When information about forgone rewards impedes dynamic decision making , 2010, Judgment and Decision Making.

[32]  N. Weinstein Unrealistic optimism about future life events , 1980 .

[33]  Sang Wan Lee,et al.  The structure of reinforcement-learning mechanisms in the human brain , 2015, Current Opinion in Behavioral Sciences.

[34]  P. Phillips,et al.  Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward , 2015, Proceedings of the National Academy of Sciences.

[35]  N. Daw,et al.  Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction , 2011, The Journal of Neuroscience.

[36]  Ido Erev,et al.  Replicated alternatives and the role of confusion, chasing, and regret in decisions from experience , 2007 .

[37]  J. Crocker,et al.  Self-Esteem and Self-Serving Biases in Reactions to Positive and Negative Events: An Integrative Review , 1993 .

[38]  Markus Ullsperger,et al.  Real and Fictive Outcomes Are Processed Differently but Converge on a Common Adaptive Mechanism , 2013, Neuron.

[39]  Michael J. Frank,et al.  Genetic triple dissociation reveals multiple roles for dopamine in reinforcement learning , 2007, Proceedings of the National Academy of Sciences.

[40]  M. Khamassi,et al.  Contextual modulation of value signals in reward and punishment learning , 2015, Nature Communications.

[41]  M. Frank,et al.  Dopaminergic Genes Predict Individual Differences in Susceptibility to Confirmation Bias , 2011, The Journal of Neuroscience.

[42]  Stefano Palminteri,et al.  The Computational Development of Reinforcement Learning during Adolescence , 2016, PLoS Comput. Biol..

[43]  E. Garne,et al.  Prescribing of Antidiabetic Medicines before, during and after Pregnancy: A Study in Seven European Regions , 2016, PloS one.

[44]  A. Schnitzler,et al.  Dissociation between Active and Observational Learning from Positive and Negative Feedback in Parkinsonism , 2012, PloS one.

[45]  L. Ross,et al.  Perseverance in self-perception and social perception: biased attributional processes in the debriefing paradigm. , 1975, Journal of personality and social psychology.

[46]  B. Franke,et al.  Dissociable Effects of Dopamine and Serotonin on Reversal Learning , 2013, Neuron.

[47]  Deanna Kuhn,et al.  Effects of Evidence on Attitudes: Is Polarization the Norm? , 1996 .

[48]  Love Bradley,et al.  You Don't Want To Know What You're Missing: When Information about Forgone Rewards Impedes Dynamic Decision Making , 2010 .

[49]  Ernst Fehr,et al.  On the psychology of poverty , 2014, Science.

[50]  B. Franke,et al.  Dissociable Effects of Dopamine and Serotonin on Reversal Learning , 2013, Neuron.

[51]  R. Hertwig,et al.  The description–experience gap in risky choice , 2009, Trends in Cognitive Sciences.

[52]  Ido Erev,et al.  Foregone with the Wind: Indirect Payoff Information and its Implications for Choice , 2006, Int. J. Game Theory.

[53]  Pete C. Trimmer,et al.  The evolution of decision rules in complex environments , 2014, Trends in Cognitive Sciences.

[54]  T. Sharot,et al.  Forming Beliefs: Why Valence Matters , 2016, Trends in Cognitive Sciences.