LEARNING TO AVOID RISKY ACTIONS

When a reinforcement learning agent executes actions that can cause frequent damage to itself, it can learn, by using Q-learning, that these actions must not be executed again. However, there are other actions that do not cause damage frequently but only once in a while, for example, risky actions such as parachuting. These actions may imply punishment to the agent and, depending on its personality, it would be better to avoid them. Nevertheless, using the standard Q-learning algorithm, the agent is not able to learn to avoid them, because the result of these actions can be positive on average. In this article, an additional mechanism of Q-learning, inspired by the emotion of fear, is introduced in order to deal with those risky actions by considering the worst results. Moreover, there is a daring factor for adjusting the consideration of the risk. This mechanism is implemented on an autonomous agent living in a virtual environment. The results present the performance of the agent with different daring degrees.

[1]  Steven I. Marcus,et al.  Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes , 1999, Autom..

[2]  Dario Floreano,et al.  Asynchronous Learning by Emotions and Cognition , 2002 .

[3]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[4]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[5]  Shie Mannor,et al.  Reinforcement learning in the presence of rare events , 2008, ICML '08.

[6]  Matthias Heger,et al.  Consideration of Risk in Reinforcement Learning , 1994, ICML.

[7]  K. Lorenz,et al.  Motivation of human and animal behavior : an ethological view , 1973 .

[8]  R. Howard,et al.  Risk-Sensitive Markov Decision Processes , 1972 .

[9]  Dolores Canamero,et al.  Designing emotions for activity selection in autonomous agents , 2003 .

[10]  Miguel A. Salichs,et al.  Learning to deal with objects , 2009, 2009 IEEE 8th International Conference on Development and Learning.

[11]  Orlando Avila-García,et al.  Using Hormonal Feedback to Modulate Action Selection in a Competitive Scenario , 2004 .

[12]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[13]  A. Maslow Motivation and Personality , 1954 .

[14]  María Malfaz,et al.  A New Approach to Modeling Emotions and Their Use on a Decision-Making System for Artificial Agents , 2012, IEEE Transactions on Affective Computing.

[15]  Sandra Clara Gadanho,et al.  Reinforcement learning in autonomous robots : an empirical investigation of the role of emotions , 1999 .

[16]  Sandra Clara Gadanho,et al.  Asynchronous learning by emotions and cognition , 2002 .

[17]  Peter Geibel,et al.  Reinforcement Learning with Bounded Risk , 2001, ICML.