Acquisition of Competitive Behaviors in Multi-Agent System Based on a Modular Learning System

Existing reinforcement learning approaches have been suffering from policy alternation by others in multi-agent dynamic environments that may cause sudden changes in state transition probabilities of which constancy is needed for behavior learning to converge. A typical example is the case of RoboCup competitions because behaviors of other agents may change the state transition probabilities. A modular learning system would be able to solve this problem if we can assign each module to one situation in which the module can regard the state transition probabilities as constant. Scheduling for learning is introduced to avoid the complexity in autonomous situation assignment. Furthermore, introduction of macro actions reduces the exploration space and it would enable agents to learn competitive behaviors simulaneously in such an adversary environment. This paper presents a method of modular learning in a multi-agent environment in which the learning agents can learn their behaviors and adapt themselves to the resultant situations by the others’ behaviors.

[1]  Dieter Fox,et al.  Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[3]  Minoru Asada,et al.  Incremental Purposive Behavior Acquisition based on Modular Learning System , 2006, IAS.

[4]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[5]  P. T. Szymanski,et al.  Adaptive mixtures of local experts are source coding solutions , 1993, IEEE International Conference on Neural Networks.

[6]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[7]  Jun Morimoto,et al.  Hierarchical Reinforcement Learning of Low-Dimensional Subgoals and High-Dimensional Trajectories , 1998, ICONIP.

[8]  Minoru Asada,et al.  Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[9]  E. Uchibe,et al.  Reinforcement Learning with Multiple Heterogeneous Modules: A Framework for Developmental Robot Learning , 2005, Proceedings. The 4nd International Conference on Development and Learning, 2005..

[10]  Minoru Asada,et al.  Incremental behavior acquisition based on reliability of observed behavior recognition , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Minoru Asada,et al.  Cooperative behavior acquisition by asynchronous policy renewal that enables simultaneous learning in multiagent environment , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[13]  Minoru Asada,et al.  Incremental purposive behavior acquisition based on self-interpretation of instructions by coach , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[14]  Mitsuo Kawato,et al.  MOSAIC Reinforcement Learning Architecture: Symbolization by Predictability and Mimic Learning by Symbol , 2001 .

[15]  Henrik I. Christensen,et al.  Multi-agent reinforcement learning: using macro actions to learn a mating task , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[16]  Jun Morimoto,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).

[17]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[18]  Minoru Asada,et al.  Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..

[19]  Shuji Hashimoto,et al.  Temperature Switching in Neural Network Ensemble , 2000 .

[20]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[21]  Katsunari Shibata,et al.  Acquisition of box pushing by direct-vision-based reinforcement learning , 2003, SICE 2003 Annual Conference (IEEE Cat. No.03TH8734).