论文信息 - The Effects of Intrinsic Motivation Signals on Reinforcement Learning Strategies

The Effects of Intrinsic Motivation Signals on Reinforcement Learning Strategies

Using neurobiological and psychological models in robotics and machine learning was of growing interest in the last years. Whereas common algorithms in the reinforcement learning framework tend to get stuck in local maxima while exploring the environment, intrinsic motivation modules can be used to extend these algorithms and push the reinforcement learning agent out of its equilibrium, similar to a human who gets bored of a task he fulfills many times or makes no progress while trying to fulfill it. This thesis gives an overview of models of intrinsic motivation founded on neurobiology and psychology, before presenting a computational view of extending algorithms of the reinforcement learning framework with intrinsic motivation models. Several existing theoretical models and related work are presented, achieving a better performance than classic algorithms, regarding the exploration/exploitation trade-off and driving the autonomous learning of an agent. Three of these models, maximizing incompetence motivation (IM), maximizing competence motivation (CM) and competence progress motivation (CPM), are implemented, in which the authors define competence by the number of primitive actions an agent needs to reach a terminal state and add a negative intrinsic reward for reaching this terminal state, which increases or decreases proportionally to the competence of the agent. The models are evaluated on four simulated scenarios and compared with the performance of the classic reinforcement learning algorithm SARSA and a time-decreasing-epsilon (TDE) modification of it. Using CM achieves at best a faster convergence towards the same terminal state as SARSA, whereas using IM and CPM results in an agent being pushed out of its equilibrium of local maxima and leads to more exploration, while still maximizing the expected external reward. An agent using these models is able to learn skills which an agent using classic SARSA would never explore. The presented related work and the implemented models show that using models for intrinsic motivation together with reinforcement learning algorithms results in well-performing behavior for tasks, on which classic algorithms would fail or get stuck in local maxima, and so provide a useful base for future work and research to build up a fully autonomous learning system.

Jan Peters | E. Rückert | Svenja Stark | Tag der Einreichung

[1] B. Skinner,et al. Science and human behavior , 1953 .

[2] M. Csíkszentmihályi. The flow experience and its significance for human psychology. , 1988 .

[3] R. Shiffrin,et al. Controlled and automatic human information processing: I , 1977 .

[4] Andrew G. Barto,et al. An Adaptive Robot Motivational System , 2006, SAB.

[5] Richard L. Lewis,et al. Intrinsically Motivated Machines , 2011 .

[6] Andrew G. Barto,et al. Competence progress intrinsic motivation , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[7] Andrew G. Barto,et al. An intrinsic reward mechanism for efficient exploration , 2006, ICML.

[8] R. Arkin. Moving Up the Food Chain: Motivation and Emotion in Behavior-Based Robots , 2003 .

[9] W. Marsden. I and J , 2012 .

[10] W. N. Dember,et al. Analysis of exploratory, manipulatory, and curiosity behaviors. , 1957, Psychological review.

[11] R. W. White. Motivation reconsidered: the concept of competence. , 1959, Psychological review.

[12] Juyang Weng,et al. Developmental Robotics: Theory and Experiments , 2004, Int. J. Humanoid Robotics.

[13] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .

[14] K. Montgomery. Exploratory behavior and its relation to spontaneous alternation in a series of maze exposures. , 1952, Journal of comparative and physiological psychology.

[15] V. Vroom. Work and motivation , 1964 .

[16] 正樹渡邉,et al. Sensation Seeking とヘルスリスク行動との関連 , 1998 .

[17] E. Deci,et al. The general causality orientations scale: Self-determination in personality , 1985 .

[18] Pierre-Yves Oudeyer,et al. Active learning of inverse models with intrinsically motivated goal exploration in robots , 2013, Robotics Auton. Syst..

[19] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20] Pierre-Yves Oudeyer,et al. Maturationally-constrained competence-based intrinsically motivated learning , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[21] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[22] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .

[23] Peter Stone,et al. Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[24] Günther Palm,et al. Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax , 2011, KI.

[25] David R. Cox,et al. The Oxford Dictionary of Statistical Terms , 2006 .

[26] E. Deci,et al. Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions. , 2000, Contemporary educational psychology.

[27] J. Kagan. Motives and development. , 1972, Journal of personality and social psychology.

[28] K. Berridge. Food reward: Brain substrates of wanting and liking , 1996, Neuroscience & Biobehavioral Reviews.

[29] AUTOMATED DISCOVERY OF OPTIONS IN REINFORCEMENT LEARNING , 2003 .

[30] L. Festinger,et al. A Theory of Cognitive Dissonance , 2017 .

[31] Falko Rheinberg,et al. Intrinsische Motivation und Flow-Erleben , 2006 .

[32] Pierre-Yves Oudeyer,et al. What is Intrinsic Motivation? A Typology of Computational Approaches , 2007, Frontiers Neurorobotics.

[33] Pierre-Yves Oudeyer,et al. Robust intrinsically motivated exploration and active learning , 2009, 2009 IEEE 8th International Conference on Development and Learning.