论文信息 - Incremental Purposive Behavior Acquisition based on Modular Learning System

Incremental Purposive Behavior Acquisition based on Modular Learning System

Abstract. A simple and straightforward application of reinforcement learning methods to real robot tasks is considerably difficult due to a huge exploration space that easily scales up exponentially since recent robots tend to have many kinds of sensors. One of the potential solutions might be application of so-called “mixture of experts” proposed by Jacobs et al.[1]; it decomposes a whole state space to a number of areas so that each expert module can produce good performance in the assigned small area. This idea is very general and has a wide range of applications, however, we have to consider how to decompose the space to a number of small regions, assign each of them to a learning module or an expert, and define a goal for each of them. In order to cope with the issue, this paper presents a method of self task decomposition for modular learning system based on self-interpretation of instructions given by a coach. Unlike the conventional approaches, the system decomposes a long-term task into short-term subtasks so that one learning module with limited computational resources can acquire a purposive behavior for one of these subtasks. Since instructions are given from a viewpoint of coach who has no idea how the system learns, they are interpreted by the learner to find the candidates for subgoals. Finally, the top layer of the hierarchical reinforcement learning system coordinates the lower learning modules to accomplish the whole task. The method is applied to a simple soccer situation in the context of RoboCup.

[1] Minoru Asada,et al. Self Task Decomposition for Modular Learning System Through Interpretation of Instruction by Coach , 2005, RoboCup.

[2] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[3] P. T. Szymanski,et al. Adaptive mixtures of local experts are source coding solutions , 1993, IEEE International Conference on Neural Networks.

[4] Manuela M. Veloso,et al. Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[5] Jean-Arcady Meyer,et al. Learning Hierarchical Control Structures for Multiple Tasks and Changing Environments , 1998 .

[6] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[7] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[8] Bernhard Hengst,et al. Generating Hierarchical Structure in Reinforcement Learning from State Variables , 2000, PRICAI.

[9] Sridhar Mahadevan,et al. Robot Learning , 1993 .

[10] Pattie Maes,et al. Emergent Hierarchical Control Structures: Learning Reactive/Hierarchical Relationships in Reinforcement Environments , 1996 .

[11] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[12] Minoru Asada,et al. Multi-layered learning systems for vision-based behavior acquisition of a real mobile robot , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).