Mutual development of behavior acquisition and recognition based on value system

Both self-learning architecture (embedded structure) and explicit/implicit teaching from other agents (environmental design issue) are necessary not only for one behavior learning but more seriously for life-time behavior learning. This paper presents a method for a robot to understand unfamiliar behavior shown by others through the collaboration between behavior acquisition and recognition of observed behavior, where the state value has an important role not simply for behavior acquisition (reinforcement learning) but also for behavior recognition (observation). That is, the state value updates can be accelerated by observation without real trial and error while the learned values enrich the recognition system since it is based on estimation of the state value of the observed behavior. The validity of the proposed method is shown by applying it to a dynamic environment where two robots play soccer.

[1]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[2]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[3]  Satinder P. Singh,et al.  The Efficient Learning of Multiple Task Sequences , 1991, NIPS.

[4]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[5]  A. Goldman,et al.  Mirror neurons and the simulation theory of mind-reading , 1998, Trends in Cognitive Sciences.

[6]  Shin Ishii,et al.  Multi-agent reinforcement learning: an approach based on the other agent's internal model , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[7]  Shuji Hashimoto,et al.  Temperature Switching in Neural Network Ensemble , 2000 .

[8]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[9]  T. Omori,et al.  Identi fi cation and learning of other ’ s action strategies in cooperative task , 2002 .

[10]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[11]  Minoru Asada,et al.  Incremental purposive behavior acquisition based on self-interpretation of instructions by coach , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12]  Gordon Cheng,et al.  Learning tasks from observation and practice , 2004, Robotics Auton. Syst..

[13]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[14]  F. Takawira,et al.  Low complexity constant modulus based cyclic blind adaptive multiuser detection , 2004, 2004 IEEE Africon. 7th Africon Conference in Africa (IEEE Cat. No.04CH37590).

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  M. Asada,et al.  Learning Utility for Behavior Acquisition and Intention Inference of Other Agent , 2006 .

[17]  Minoru Asada,et al.  Emulation and behavior understanding through shared values , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Minoru Asada,et al.  Efficient Behavior Learning Based on State Value Estimation of Self and Others , 2008, Adv. Robotics.