Team-partitioned, opaque-transition reinforcement learning

We present a novel multi-agent learning paradigm called team-partitioned, opaque-transition reinforcement learning (TPOT-RL). TPOT-RL introduces the use of action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space to aid higher-level reinforcement learning. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities. TPOT-RL is fully implemented and has been tested in the robotic soccer domain, a complex, multi-agent framework. This paper presents the algorithmic details of TPOT-RL as well as empirical results demonstrating the effectiveness of the developed multi-agent learning approach with learned features.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[4]  Michael L. Littman,et al.  Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[5]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[6]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Maja J. Mataric,et al.  Interaction and intelligent behavior , 1994 .

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  Pattie Maes,et al.  Incremental Self-Improvement for Life-Time Multi-Agent Reinforcement Learning , 1996 .

[11]  Hitoshi Matsubara,et al.  Learning Cooperative Behavior in Multi-Agent Environment - A Case Study of Choice of Play-Plans in Soccer , 1996, PRICAI.

[12]  James A. Hendler,et al.  Co-evolving Soccer Softbot Team Coordination with Genetic Programming , 1997, RoboCup.

[13]  Manuela M. Veloso,et al.  Using Decision Tree Confidence Factors for Multiagent Control , 1997, RoboCup.

[14]  Manuela M. Veloso,et al.  The CMUnited-97 Small Robot Team , 1997, RoboCup.

[15]  Hiroaki Kitano,et al.  RoboCup: A Challenge Problem for AI , 1997, AI Mag..

[16]  Michael Wooldridge,et al.  Proceedings of the second international conference on Autonomous agents , 1998 .

[17]  北野 宏明,et al.  RoboCup-97 : robot soccer World Cup I , 1998 .

[18]  Manuela M. Veloso,et al.  Layered Approach to Learning Client Behaviors in the Robocup Soccer Server , 1998, Appl. Artif. Intell..

[19]  Manuela M. Veloso,et al.  Using decision tree confidence factors for multi-agent control , 1998, AGENTS '98.

[20]  Manuela M. Veloso,et al.  Towards collaborative and adversarial learning: a case study in robotic soccer , 1998, Int. J. Hum. Comput. Stud..

[21]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.