Evolving policy geometry for scalable multiagent learning

A major challenge for traditional approaches to multiagent learning is to train teams that easily scale to include additional agents. The problem is that such approaches typically encode each agent's policy separately. Such separation means that computational complexity explodes as the number of agents in the team increases, and also leads to the problem of reinvention: Skills that should be shared among agents must be rediscovered separately for each agent. To address this problem, this paper presents an alternative evolutionary approach to multiagent learning called multiagent HyperNEAT that encodes the team as a pattern of related policies rather than as a set of individual agents. To capture this pattern, a policy geometry is introduced to describe the relationship between each agent's policy and its canonical geometric position within the team. Because policy geometry can encode variations of a shared skill across all of the policies it represents, the problem of reinvention is avoided. Furthermore, because the policy geometry of a particular team can be sampled at any resolution, it acts as a heuristic for generating policies for teams of any size, producing a powerful new capability for multiagent learning. In this paper, multiagent HyperNEAT is tested in predator-prey and room-clearing domains. In both domains the results are effective teams that can be successfully scaled to larger team sizes without any further training.

[1]  Trevor Nevitt Dupuy,et al.  The Evolution of Weapons and Warfare , 1980 .

[2]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[3]  John J. Grefenstette,et al.  A Coevolutionary Approach to Learning Sequential Decision Rules , 1995, ICGA.

[4]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[5]  Sandip Sen,et al.  Co-adaptation in a Team , 1997 .

[6]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[7]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[8]  Peter J. Bentley,et al.  Three Ways to Grow Designs: A Comparison of Embryogenies for an Evolutionary Design Problem , 1999, GECCO.

[9]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[10]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[11]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[12]  Jordan B. Pollack,et al.  Creating High-Level Components with a Generative Representation for Body-Brain Evolution , 2002, Artificial Life.

[13]  Josh Bongard,et al.  Evolving modular genetic regulatory networks , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[14]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[15]  Risto Miikkulainen,et al.  Neuroevolution for adaptive teams , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[16]  Risto Miikkulainen,et al.  A Taxonomy for Artificial Embryogeny , 2003, Artificial Life.

[17]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[18]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[19]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[20]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[21]  Kenneth O. Stanley,et al.  A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[22]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Kenneth O. Stanley,et al.  Generative encoding for multiagent learning , 2008, GECCO '08.

[24]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[25]  Jimmy Secretan,et al.  Picbreeder: evolving pictures collaboratively online , 2008, CHI.

[26]  Charles Ofria,et al.  Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[27]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[28]  Charles Ofria,et al.  The sensitivity of HyperNEAT to different geometric representations of a problem , 2009, GECCO.

[29]  Kenneth O. Stanley A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[30]  V. Ramakrishnan,et al.  Measurement of the top-quark mass with dilepton events selected using neuroevolution at CDF. , 2008, Physical review letters.

[31]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[32]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .