Incremental Purposive Behavior Acquisition based on Modular Learning System

Abstract. A simple and straightforward application of reinforcement learning methods to real robot tasks is considerably difficult due to a huge exploration space that easily scales up exponentially since recent robots tend to have many kinds of sensors. One of the potential solutions might be application of so-called “mixture of experts” proposed by Jacobs et al.[1]; it decomposes a whole state space to a number of areas so that each expert module can produce good performance in the assigned small area. This idea is very general and has a wide range of applications, however, we have to consider how to decompose the space to a number of small regions, assign each of them to a learning module or an expert, and define a goal for each of them. In order to cope with the issue, this paper presents a method of self task decomposition for modular learning system based on self-interpretation of instructions given by a coach. Unlike the conventional approaches, the system decomposes a long-term task into short-term subtasks so that one learning module with limited computational resources can acquire a purposive behavior for one of these subtasks. Since instructions are given from a viewpoint of coach who has no idea how the system learns, they are interpreted by the learner to find the candidates for subgoals. Finally, the top layer of the hierarchical reinforcement learning system coordinates the lower learning modules to accomplish the whole task. The method is applied to a simple soccer situation in the context of RoboCup.