Mutual Development of Behavior Acquisition and Recognition based on Value System

本論文では,強化学習における状態価値に基づいた行為獲得・他者行為認識の循環により,行為理解が効率的に安定して発達する手法を提案する.自身の試行錯誤の経験のみによる学習では獲得する行為が複雑になればなるほど多大な探索空間や莫大な学習時間が必要になる問題が強化学習による行為獲得には存在する.他者行為を観察し学習対象の行為の状態価値を推定し,それを自己の行動学習にフィードバックすることで行動学習を加速可能である.しかし,観測した他者行為を自己の行動学習に利用するためには,他者がどの行為を行っているのかを認識しなくてはならない.一方で,自己の行為の状態価値を基に他者の行為認識をロバストに行えることが先行研究によって示されている.行動学習と他者行為認識を交互に繰り返すことで,行為獲得を通した行為理解が効率的に安定して進められる.本手法の有効性を検証するため,RoboCup中型機リーグに出場しているロボットを想定したシミュレータ,及び実機に本手法を適用し,本手法の有効性を示す.

[1]  Shuji Hashimoto,et al.  Temperature Switching in Neural Network Ensemble , 2000 .

[2]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[3]  Minoru Asada,et al.  Incremental behavior acquisition based on reliability of observed behavior recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Jonas Karlsson,et al.  Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging , 1993 .

[5]  Minoru Asada,et al.  Behavior Understanding based on Shared Value , 2009 .

[6]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[7]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[8]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[9]  Gordon Cheng,et al.  Learning tasks from observation and practice , 2004, Robotics Auton. Syst..

[10]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[11]  A. Goldman,et al.  Mirror neurons and the simulation theory of mind-reading , 1998, Trends in Cognitive Sciences.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Manuela M. Veloso,et al.  Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[14]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[15]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[16]  Mitsuo Kawato,et al.  MOSAIC Reinforcement Learning Architecture: Symbolization by Predictability and Mimic Learning by Symbol , 2001 .

[17]  Klaus-Robert Müller,et al.  Analysis of switching dynamics with competing neural networks , 1995 .

[18]  Henrik I. Christensen,et al.  Multi-agent reinforcement learning: using macro actions to learn a mating task , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[19]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[20]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[21]  Shin Ishii,et al.  Multi-agent reinforcement learning: an approach based on the other agent's internal model , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[22]  Minoru Asada,et al.  Efficient Behavior Learning Based on State Value Estimation of Self and Others , 2008, Adv. Robotics.

[23]  Minoru Asada,et al.  Emulation and behavior understanding through shared values , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.