Human instruction recognition and self behavior acquisition based on state value

A robot working with humans or other robots is supposed to be adaptive to changes in the environment. Reinforcement learning has been studied well for motor skill learning, robot behavior acquisition and adaptation of the behavior to the environmental changes. However, it is not practical that the robot learns and adapts its behavior only through trial and error by itself from scratch because huge exploration is needed. Fortunately, it is nothing unusual to have predecessors in the environment and it is reasonable to learn something from the observation of predecessors' behavior. In order to learn various behavior from the observation, the robot must segment the behavior based on reasonable criterion for itself and feedback the data to behavior learning by itself. This paper presents a case study for a robot to understand unfamiliar behavior shown by a human instructor through the collaboration between behavior acquisition and recognition of observed behavior, where the state value has an important role not simply for behavior acquisition (reinforcement learning) but also for behavior recognition (observation). The validity of the proposed method is shown by applying it to a dynamic environment where one robot and one human play soccer.

[1]  P. T. Szymanski,et al.  Adaptive mixtures of local experts are source coding solutions , 1993, IEEE International Conference on Neural Networks.

[2]  M. Asada,et al.  Learning Utility for Behavior Acquisition and Intention Inference of Other Agent , 2006 .

[3]  F. Takawira,et al.  Low complexity constant modulus based cyclic blind adaptive multiuser detection , 2004, 2004 IEEE Africon. 7th Africon Conference in Africa (IEEE Cat. No.04CH37590).

[4]  Gordon Cheng,et al.  Learning tasks from observation and practice , 2004, Robotics Auton. Syst..

[5]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Sridhar Mahadevan,et al.  Robot Learning , 1993 .

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[11]  Minoru Asada,et al.  Behavior development through interaction between acquisition and recognition of observed behaviors , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[12]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[13]  Minoru Asada,et al.  Emulation and behavior understanding through shared values , 2010, Robotics Auton. Syst..

[14]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..