Emotion-Based Reinforcement Learning

Emotion-Based Reinforcement Learning Woo-Young Ahn 1 (ahnw@indiana.edu) Olga Rass 1 (rasso@indiana.edu) Yong-Wook Shin 2 (shaman@amc.seoul.kr) Jerome R. Busemeyer 1 (jbusemey@indiana.edu) Joshua W. Brown 1 (jwmbrown@indiana.edu) Brian F. O’Donnell 1 (bodonnel@indiana.edu) 1 Department of Psychological and Brain Sciences, Indiana University of Psychiatry, Ulsan University School of Medicine 2 Department Abstract rather than to maximize expected return. In decision affect theory, our emotional responses (R) are based on obtained outcomes, relevant comparisons, and beliefs about the likeli- hood of the outcomes: Studies have shown that counterfactual reasoning can shape human decisions. However, there is a gap in the litera- ture between counterfactual choices in description-based and experience-based paradigms. While studies using description- based paradigms suggest participants maximize expected sub- jective emotion, studies using experience-based paradigms as- sume that participants learn the values of options and se- lect what maximizes expected utility. In this study, we used computational modeling to test 1) whether participants make emotion-based decisions in experience-based paradigms, and 2) whether the impact of regret depends on its degree of unex- pectedness as suggested by the current regret theory. The re- sults suggest that 1) participants make emotion-based choices even in experience-based paradigms, and 2) the impact of re- gret is greater when it is expected than when it is unexpected. These results challenge the current theory of regret and suggest that reinforcement learning models may need to use counter- factual value functions when full information is provided. R Chosen Outcome Utility + Regret / Rejoice + Disappointment / Elation All counterfactual terms (regret, rejoice, disappointment, and elation) are weighted by their unexpectedness. Decision af- fect theory effectively explained various experimental results (Mellers et al., 1999) and Coricelli et al. (2005) used a mod- ified version of the theory to examine the neural correlates of regret using description-based paradigms. 1 Several studies have examined counterfactual decision- making using experience-based paradigms as well (Lohrenz, McCabe, Camerer, & Montague, 2007; Boorman, Behrens, & Rushworth, 2011; Hayden, Pearson, & Platt, 2009; Yechiam & Rakow, 2011). Although models used in the studies differ slightly from each other, all previous studies used reinforce- ment learning models, which assume that participants learn about chosen and foregone outcomes from trial-by-trial expe- rience and then choose an option that has the highest expected value. This study was developed from this gap in the liter- ature: to explain choice behaviors in description-based paradigms with full information, researchers have assumed participants would make emotion-based choices. To explain choice behaviors in experience-based paradigms, researchers have assumed that participants learn the obtained and fore- gone payoffs and do not make emotion-based choices. We tested whether individuals make emotion-based choices in experience-based paradigms by building computational mod- els for all competing hypotheses. This approach allowed us to quantitatively compare hypotheses in a rigorous way. Another aim of the study was to test whether regret would be weighted by its unexpectedness (i.e., surprising- ness). Mellers et al. (1999) claimed that “...unexpected out- Keywords: Decision making; Bayesian modeling; mathemat- ical modeling; regret; reinforcement learning. Introduction In our daily lives, we constantly face decisions to make and assess the costs and benefits of possible options (e.g., “Should I buy a lottery or just buy a snack with this money?”, “Should I buy Apple or Google stock?”). Usually we know only the outcome of our choices. On rare occasions, we also know what would have happened if we had made different choices (e.g., stock market). Having ‘complete feedback’ (or full in- formation) under risk or uncertainty can evoke strong emo- tions such as regret or disappointment that are triggered by our capacity to reason counterfactually. The effects of counterfactual reasoning have received much attention, and several theories have been proposed. A grow- ing consensus suggests that disappointment and elation are elicited by comparison between different states (e.g., “my grant was not funded...”) whereas regret and rejoice come from comparison between different choices (e.g., “I should have married another person...”). Also, the unique aspect of regret is a feeling of responsibility that comes with negative outcomes from choices. Among several theories of counterfactual decision-making, decision affect theory is regarded as one of the leading models (Mellers, Schwartz, & Ritov, 1999). Decision affect theory assumes that individuals make emotion-based choices and want to maximize subjective expected pleasure (or emotion) 1 In description-based paradigms, the outcomes of all options and their probabilities are provided to participants and participants rarely receive feedback. In experience-based paradigms, participants must learn the outcomes or their probabilities from their personal experi- ence (Hertwig, Barren, Weber, & Erev, 2004).

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Christopher K. Hsee,et al.  Risk as Feelings , 2001, Psychological bulletin.

[3]  Eric J. Johnson,et al.  Mindful judgment and decision making. , 2009, Annual review of psychology.

[4]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[5]  I. Ritov,et al.  Decision Affect Theory: Emotional Reactions to the Outcomes of Risky Options , 1997 .

[6]  M. Lee How cognitive modeling can benefit from hierarchical Bayesian models. , 2011 .

[7]  I. Ritov,et al.  Emotion-based choice , 1999 .

[8]  E. Yechiam,et al.  The effect of foregone outcomes on choices from experience. , 2011, Experimental psychology.

[9]  R. Hertwig,et al.  Decisions from Experience and the Effect of Rare Events in Risky Choice , 2004, Psychological science.

[10]  Eldad Yechiam,et al.  Evaluating the reliance on past choices in adaptive learning models , 2007 .

[11]  Will M Aklin,et al.  The Balloon Analogue Risk Task (BART) differentiates smokers and nonsmokers. , 2003, Experimental and clinical psychopharmacology.

[12]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[13]  J. Kruschke Doing Bayesian Data Analysis: A Tutorial with R and BUGS , 2010 .

[14]  M. Lee Three case studies in the Bayesian analysis of cognitive models , 2008, Psychonomic bulletin & review.

[15]  Kevin McCabe,et al.  Neural signature of fictive learning signals in a sequential investment task , 2007, Proceedings of the National Academy of Sciences.

[16]  J. O'Doherty,et al.  Regret and its avoidance: a neuroimaging study of choice behavior , 2005, Nature Neuroscience.

[17]  Timothy E. J. Behrens,et al.  Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex , 2011, PLoS biology.

[18]  R. Duncan Luce,et al.  Individual Choice Behavior , 1959 .

[19]  John M. Pearson,et al.  Fictive Reward Signals in the Anterior Cingulate Cortex , 2009, Science.