Learning by appraising: an emotion-based approach to intrinsic reward design

In this paper, we investigate the use of emotional information in the learning process of autonomous agents. Inspired by four dimensions that are commonly postulated by appraisal theories of emotions, we construct a set of reward features to guide the learning process and behaviour of a reinforcement learning (RL) agent that inhabits an environment of which it has only limited perception. Much like what occurs in biological agents, each reward feature evaluates a particular aspect of the (history of) interaction of the agent history with the environment, thereby, in a sense, replicating some aspects of appraisal processes observed in humans and other animals. Our experiments in several foraging scenarios demonstrate that by optimising the relative contributions of each reward feature, the resulting “emotional” RL agents perform better than standard goal-oriented agents, particularly in consideration of their inherent perceptual limitations. Our results support the claim that biological evolutionary adaptive mechanisms such as emotions can provide crucial clues in creating robust, general-purpose reward mechanisms for autonomous artificial agents, thereby allowing them to overcome some of the challenges imposed by their inherent limitations.

[1]  I. R. Pierce Emotion and Personality, Vol. I: Psychological Aspects , 1961 .

[2]  Joost Broekens,et al.  On Affect and Self-adaptation: Potential Benefits of Valence-Controlled Action-Selection , 2007, IWINAC.

[3]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[4]  Michael Davis,et al.  The amygdala , 2000, Current Biology.

[5]  Ana Paiva,et al.  Emergence of emotional appraisal signals in reinforcement learning agents , 2015, Autonomous Agents and Multi-Agent Systems.

[6]  P. Young,et al.  Emotion and personality , 1963 .

[7]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[8]  Jean-Arcady Meyer,et al.  Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats , 2008 .

[9]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[10]  Richard L. Lewis,et al.  Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[11]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[12]  Leslie D. Kirby,et al.  Putting appraisal in context: Toward a relational model of appraisal and emotion , 2009 .

[13]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[14]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[15]  Catholijn M. Jonker,et al.  Emergent Dynamics of Joy, Distress, Hope and Fear in Reinforcement Learning Agents , 2014 .

[16]  John Yen,et al.  FLAME—Fuzzy Logic Adaptive Model of Emotions , 2000, Autonomous Agents and Multi-Agent Systems.

[17]  Nico H. Frijda,et al.  The Analysis of Emotions Dimensions of Variation , 1998 .

[18]  José R. Álvarez,et al.  Bio-inspired Modeling of Cognitive Tasks, Second International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2007, La Manga del Mar Menor, Spain, June 18-21, 2007, Proceedings, Part I , 2007, IWINAC.

[19]  Jonathan D. Cohen,et al.  Computational modeling of emotion: explorations through the anatomy and physiology of fear conditioning , 1997, Trends in Cognitive Sciences.

[20]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[21]  Daniel G. Bobrow,et al.  Natural Language Input for a Computer Problem Solving System , 1964 .

[22]  Richard L. Lewis,et al.  A computational unification of cognitive behavior and emotion , 2009, Cognitive Systems Research.

[23]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[24]  Sandra Clara Gadanho,et al.  Learning Behavior-Selection by Emotions and Cognition in a Multi-Goal Robot Task , 2003, J. Mach. Learn. Res..

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[27]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[28]  Preben Alstrøm,et al.  Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.

[29]  B. Everitt,et al.  Emotion and motivation: the role of the amygdala, ventral striatum, and prefrontal cortex , 2002, Neuroscience & Biobehavioral Reviews.

[30]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[31]  Pierre-Yves Oudeyer,et al.  Exploration strategies in developmental robotics: A unified probabilistic framework , 2013, 2013 IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics (ICDL).

[32]  Joseph E LeDoux Emotion Circuits in the Brain , 2000 .

[33]  Lee Spector,et al.  Genetic Programming for Reward Function Search , 2010, IEEE Transactions on Autonomous Mental Development.

[34]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[35]  Ira J. Roseman A model of appraisal in the emotion system: Integrating theory, research, and applications. , 2001 .

[36]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[37]  E. Deci,et al.  Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[38]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[39]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[40]  Richard L. Lewis,et al.  Strong mitigation: nesting search for good policies within search for good reward , 2012, AAMAS.

[41]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[42]  Dolores Cañamero,et al.  Modeling motivations and emotions as a basis for intelligent behavior , 1997, AGENTS '97.

[43]  Rainer Reisenzein,et al.  Emotions as metarepresentational states of mind: Naturalizing the belief–desire theory of emotion , 2009, Cognitive Systems Research.

[44]  Craig A. Smith,et al.  Appraisal theory: Overview, assumptions, varieties, controversies. , 2001 .

[45]  Richard L. Lewis,et al.  Reward Design via Online Gradient Ascent , 2010, NIPS.

[46]  Nico H. Frijda,et al.  The Analysis of Emotions , 1998 .

[47]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[48]  R. Lazarus Relational meaning and discrete emotions. , 2001 .

[49]  E. Vesterinen,et al.  Affective Computing , 2009, Encyclopedia of Biometrics.

[50]  P. Petta,et al.  Computational models of emotion , 2010 .

[51]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[52]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[53]  G. Fricchione Descartes’ Error: Emotion, Reason and the Human Brain , 1995 .

[54]  Csaba Szepesvári,et al.  Training parsers by inverse reinforcement learning , 2009, Machine Learning.

[55]  Allen Newell,et al.  The logic theory machine-A complex information processing system , 1956, IRE Trans. Inf. Theory.

[56]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[57]  Catholijn M. Jonker,et al.  Joy, distress, hope, and fear in reinforcement learning , 2014, AAMAS.

[58]  Robert Trappl,et al.  Cybernetics and Systems '98 : Proceedings of the Fourteenth European Meeting on Cybernetics and Systems Research, organized by the Austrian Society for Cybernetic Studies, held at the University of Vienna, Austria, 14-17 April 1998 , 1998 .

[59]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[60]  M. Minsky The Society of Mind , 1986 .

[61]  M. Dawkins Animal Minds and Animal Emotions1 , 2000 .

[62]  H. Simon,et al.  Motivational and emotional controls of cognition. , 1967, Psychological review.

[63]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[64]  K. Scherer Appraisal considered as a process of multilevel sequential checking. , 2001 .

[65]  Manuel Lopes,et al.  Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations , 2010, ECAI.

[66]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[67]  Richard L. Lewis,et al.  Where Do Rewards Come From , 2009 .

[68]  K. Scherer,et al.  Appraisal processes in emotion: Theory, methods, research. , 2001 .

[69]  K. Scherer,et al.  Appraisal processes in emotion. , 2003 .

[70]  Pierre-Yves Oudeyer,et al.  Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress , 2012, NIPS.

[71]  Ana Paiva,et al.  Emerging social awareness: Exploring intrinsic motivation in multiagent learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[72]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[73]  Ana Paiva,et al.  Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents , 2011, ACII.

[74]  K. Scherer,et al.  The Relationship of Emotion to Cognition: A Functional Approach to a Semantic Controversy , 1987 .

[75]  A. Damasio Descartes’ Error. Emotion, Reason and the Human Brain. New York (Grosset/Putnam) 1994. , 1994 .

[76]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[77]  Joost Broekens,et al.  Strategies for Affect-Controlled Action-Selection in Soar-RL , 2007, IWINAC.

[78]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[79]  Rosalind W. Picard,et al.  Affective Cognitive Learning and Decision Making : The Role of Emotions , 2006 .