Adaptive organization of generalized behavioral concepts for autonomous robots: schema-based modular reinforcement learning

In this paper, we introduce a reinforcement learning method for autonomous robots to obtain generalized behavioral concepts. Reinforcement learning is a well formulated method for autonomous robots to obtain a new behavioral concept by themselves. However, these behavioral concepts cannot be applied to other environments that are different from the place where the robots have learned the concepts. On the contrary, we, human beings, can apply our behavioral concepts to some different environments, objects, and/or situations. We think this ability owes to some memory structure like schema system that was originally proposed by J. Piaget. We previously proposed a modular-learning method called Dual-Schemata model. In this paper, we add a reinforcement learning mechanism to this model. By being provided with this structure, autonomous robots become able to obtain new generalized behavioral concepts by themselves. We also show this kind of structure enables autonomous robots to behave appropriately even in a novel socially interactive environment.

[1]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[2]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[3]  J. Flavell The Developmental psychology of Jean Piaget , 1963 .

[4]  D M Wolpert,et al.  Multiple paired forward and inverse models for motor control , 1998, Neural Networks.

[5]  T. Sawaragi,et al.  Design and performance of symbols self-organized within an autonomous agent interacting with varied environments , 2004, RO-MAN 2004. 13th IEEE International Workshop on Robot and Human Interactive Communication (IEEE Catalog No.04TH8759).

[6]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[7]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[8]  Tetsuo Sawaragi,et al.  Self-organization of inner symbols for chase: symbol organization and embodiment , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[9]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[10]  R. Schmidt Motor and action perspectives on motor behaviour , 1988 .

[11]  Tetsuo Sawaragi,et al.  Assimilation and accommodation for self-organizational learning of autonomous robots: proposal of dual-schemata model , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).