An Improved Reinforcement Learning System Using Affective Factors

As a powerful and intelligent machine learning method, reinforcement learning (RL) has been widely used in many fields such as game theory, adaptive control, multi-agent system, nonlinear forecasting, and so on. The main contribution of this technique is its exploration and exploitation approaches to find the optimal solution or semi-optimal solution of goal-directed problems. However, when RL is applied to multi-agent systems (MASs), problems such as "curse of dimension", "perceptual aliasing problem", and uncertainty of the environment constitute high hurdles to RL. Meanwhile, although RL is inspired by behavioral psychology and reward/punishment from the environment is used, higher mental factors such as affects, emotions, and motivations are rarely adopted in the learning procedure of RL. In this paper, to challenge agents learning in MASs, we propose a computational motivation function, which adopts two principle affective factors "Arousal" and "Pleasure" of Russell's circumplex model of affects, to improve the learning performance of a conventional RL algorithm named Q-learning (QL). Compared with the conventional QL, computer simulations of pursuit problems with static and dynamic preys were carried out, and the results showed that the proposed method results in agents having a faster and more stable learning performance.

[1]  M. Obayashi,et al.  A robust reinforcement learning using the concept of sliding mode control , 2009, Artificial Life and Robotics.

[2]  Gerhard Weiss,et al.  Multi-Agent Systems , 2013 .

[3]  Akio Nozawa,et al.  Emergent of Burden Sharing of Robots with Emotion Model , 2005 .

[4]  Igor Aleksander Designing Conscious Systems , 2009, Cognitive Computation.

[5]  Kunikazu Kobayashi,et al.  Cooperative Behavior Acquisition in Multi-agent Reinforcement Learning System Using Attention Degree , 2012, ICONIP.

[6]  Kunikazu Kobayashi,et al.  A reinforcement learning system for swarm behaviors , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[7]  Kenji Doya,et al.  Metalearning and neuromodulation , 2002, Neural Networks.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Kunikazu Kobayashi,et al.  Nonlinear Prediction by Reinforcement Learning , 2005, ICIC.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  L. Greenberg Emotion and cognition in psychotherapy: The transforming power of affect. , 2008 .

[12]  Minoru Asada,et al.  Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..

[13]  Kunikazu Kobayashi,et al.  An Improved Internal Model for Swarm Formation and Adaptive Swarm Behavior Acquisition , 2009, J. Circuits Syst. Comput..

[14]  Kunikazu Kobayashi,et al.  Adaptive swarm behavior acquisition by a neuro-fuzzy system and reinforcement learning algorithm , 2009, Int. J. Intell. Comput. Cybern..

[15]  J. Russell Core affect and the psychological construction of emotion. , 2003, Psychological review.

[16]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Kagan Tumer,et al.  Quicker Q-Learning in Multi-Agent Systems , 2005 .

[18]  Kunikazu Kobayashi,et al.  Autonomic Behaviors of Swarm Robots Driven by Emotion and Curiosity , 2010, LSMS/ICSEE.

[19]  R. Larsen,et al.  Promises and problems with the circumplex model of emotion. , 1992 .

[20]  Nicholas Roy,et al.  Trajectory Optimization using Reinforcement Learning for Map Exploration , 2008, Int. J. Robotics Res..

[21]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[22]  W. Wundt Outlines of Psychology , 1897 .

[23]  Kunikazu Kobayashi,et al.  A Neuro-fuzzy Learning System for Adaptive Swarm Behaviors Dealing with Continuous State Space , 2008, ICIC.

[24]  Adam A. Augustine,et al.  Composition and consistency of the desired affective state: The role of personality and motivation , 2010, Motivation and emotion.

[25]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[26]  Sridhar Mahadevan,et al.  Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.

[27]  J. Russell A circumplex model of affect. , 1980 .

[28]  Kunikazu Kobayashi,et al.  Neural Forecasting Systems , 2008 .

[29]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[30]  J. Russell,et al.  Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. , 1999, Journal of personality and social psychology.

[31]  Kunikazu Kobayashi,et al.  Forecasting Time Series by SOFNN with Reinforcement Learning , 2007 .

[32]  Lionel Jouffe,et al.  Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[33]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[34]  M. Obayashi,et al.  Predicting Chaotic Time Series by Reinforcement Learning , 2003 .

[35]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[36]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[37]  C. L. M.,et al.  Outlines of Psychology , 1891, Nature.

[38]  Alberto RibesAbstract,et al.  Multi agent systems , 2019, Proceedings of the 2005 International Conference on Active Media Technology, 2005. (AMT 2005)..

[39]  Akio Nozawa,et al.  Characteristics of Behavior of Robots with Emotion Model , 2004 .

[40]  Kunikazu Kobayashi,et al.  An Improved Internal Model of Autonomous Robots by a Psychological Approach , 2011, Cognitive Computation.

[41]  S. Paradiso The Emotional Brain: The Mysterious Underpinnings of Emotional Life , 1998 .