Scalable multiagent learning through indirect encoding of policy geometry

Multiagent systems present many challenging, real-world problems to artificial intelligence. Because it is difficult to engineer the behaviors of multiple cooperating agents by hand, multiagent learning has become a popular approach to their design. While there are a variety of traditional approaches to multiagent learning, many suffer from increased computational costs for large teams and the problem of reinvention (that is, the inability to recognize that certain skills are shared by some or all team member). This paper presents an alternative approach to multiagent learning called multiagent HyperNEAT that represents the team as a pattern of policies rather than as a set of individual agents. The main idea is that an agent’s location within a canonical team layout (which can be physical, such as positions on a sports team, or conceptual, such as an agent’s relative speed) tends to dictate its role within that team. This paper introduces the term policy geometry to describe this relationship between role and position on the team. Interestingly, such patterns effectively represent up to an infinite number of multiagent policies that can be sampled from the policy geometry as needed to allow training very large teams or, in some cases, scaling up the size of a team without additional learning. In this paper, multiagent HyperNEAT is compared to a traditional learning method, multiagent Sarsa(λ), in a predator–prey domain, where it demonstrates its ability to train large teams.

[1]  Jeffrey W. Roberts,et al.  遺伝子の分子生物学 = Molecular biology of the gene , 1970 .

[2]  Aristid Lindenmayer,et al.  Adding Continuous Components to L-Systems , 1974, L Systems.

[3]  Trevor Nevitt Dupuy,et al.  The Evolution of Weapons and Warfare , 1980 .

[4]  Lawrence Davis,et al.  Training Feedforward Neural Networks Using Genetic Algorithms , 1989, IJCAI.

[5]  John R. Koza,et al.  Genetic generation of both the weights and architecture for a neural network , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  Lee Altenberg,et al.  Evolving better representations through selective genome growth , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[7]  Kenneth A. De Jong,et al.  A Cooperative Coevolutionary Approach to Function Optimization , 1994, PPSN.

[8]  I. Harvey The artificial evolution of adaptive behaviour , 1994 .

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  Karl Sims,et al.  Evolving 3d morphology and behavior by competition , 1994 .

[11]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[12]  John J. Grefenstette,et al.  A Coevolutionary Approach to Learning Sequential Decision Rules , 1995, ICGA.

[13]  David B. Fogel,et al.  Evolving Neural Control Systems , 1995, IEEE Expert.

[14]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[15]  Larry D. Pyeatt,et al.  A comparison between cellular encoding and direct encoding for genetic neural networks , 1996 .

[16]  Lee Spector,et al.  Evolving Graphs and Networks with Edge Encoding: Preliminary Report , 1996 .

[17]  Hitoshi Iba Emergent Cooperation for Multiple Agents Using Genetic Programming , 1996, PPSN.

[18]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[19]  Sandip Sen,et al.  Co-adaptation in a Team , 1997 .

[20]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[21]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[22]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[23]  Stefano Nolfi,et al.  Co-evolving predator and prey robots , 1998, Artificial Life.

[24]  Sandip Sen,et al.  Shared memory based cooperative coevolution , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[25]  Andrew P. Martin Increasing Genomic Complexity by Gene Duplication and the Origin of Vertebrates , 1999, The American Naturalist.

[26]  Eugénio C. Oliveira,et al.  Multi-agent systems: which research for which applications , 1999, Robotics Auton. Syst..

[27]  Xin Yao,et al.  Evolving artificial neural networks , 1999, Proc. IEEE.

[28]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[29]  Risto Miikkulainen,et al.  Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[30]  Peter J. Bentley,et al.  Three Ways to Grow Designs: A Comparison of Evolved Embryogenies for a Design Problem , 1999 .

[31]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[32]  C. Lee Giles,et al.  Talking Helps: Evolving Communicating Agents for the Predator-Prey Pursuit Problem , 2000, Artificial Life.

[33]  John S. McCaskill,et al.  Reducing Collective Behavioural Complexity through Heterogeneity , 2000 .

[34]  Reinforcement Learning for 3 vs. 2 Keepaway , 2000, RoboCup.

[35]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[36]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[37]  Daniele Nardi,et al.  Coordination among heterogeneous robotic soccer players , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[38]  Manuela M. Veloso,et al.  Layered Learning , 2000, ECML.

[39]  Alan C. Schultz,et al.  Heterogeneity in the Coevolved Behaviors of Mobile Robots: The Emergence of Specialists , 2001, IJCAI.

[40]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[41]  Manuela M. Veloso,et al.  Multiagent learning using a variable learning rate , 2002, Artif. Intell..

[42]  Jordan B. Pollack,et al.  Creating High-Level Components with a Generative Representation for Body-Brain Evolution , 2002, Artificial Life.

[43]  P. E. Hotz,et al.  Evolving the morphology of a neural network for controlling a foveating retina: and its test on a real robot , 2002 .

[44]  Josh Bongard,et al.  Evolving modular genetic regulatory networks , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[45]  Steven M. Gustafson,et al.  Genetic Programming And Multi-agent Layered Learning By Reinforcements , 2002, GECCO.

[46]  Akira Hayashi,et al.  A multiagent reinforcement learning algorithm using extended optimal response , 2002, AAMAS '02.

[47]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[48]  Risto Miikkulainen,et al.  Neuroevolution for adaptive teams , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[49]  Yukinori Kakazu,et al.  An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning , 2003, Robotics Auton. Syst..

[50]  Risto Miikkulainen,et al.  A Taxonomy for Artificial Embryogeny , 2003, Artificial Life.

[51]  Risto Miikkulainen,et al.  Evolving Keepaway Soccer Players through Task Decomposition , 2003, GECCO.

[52]  Meir Kalech,et al.  On the Design of Social Diagnosis Algorithms for Multi-Agent Teams , 2003, IJCAI.

[53]  D. Polani,et al.  Learning competitive pricing strategies by multi-agent reinforcement learning , 2003 .

[54]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[55]  R. Paul Wiegand,et al.  Improving Coevolutionary Search for Optimal Multiagent Behaviors , 2003, IJCAI.

[56]  Lincoln Smith,et al.  Evolving controllers for a homogeneous system of physical robots: structured cooperation with minimal sensors , 2003, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[57]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[58]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[59]  Andrew B. Williams,et al.  Lessons learned in single-agent and multiagent learning with robot foraging , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[60]  Jeffrey K. Bassett,et al.  An Analysis of Cooperative Coevolutionary Algorithms A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University , 2003 .

[61]  Thomas Miconi When Evolving Populations is Better than Coevolving Individuals: The Blind Mice Problem , 2003, IJCAI.

[62]  Risto Miikkulainen,et al.  Competitive Coevolution through Evolutionary Complexification , 2011, J. Artif. Intell. Res..

[63]  Julian Francis Miller,et al.  Evolving a Self-Repairing, Self-Regulating, French Flag Organism , 2004, GECCO.

[64]  F. Bousqueta,et al.  Multi-agent simulations and ecosystem management : a review , 2004 .

[65]  Peter Stone,et al.  Behavior transfer for value-function-based reinforcement learning , 2005, AAMAS '05.

[66]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[67]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[68]  Risto Miikkulainen,et al.  Real-time neuroevolution in the NERO video game , 2005, IEEE Transactions on Evolutionary Computation.

[69]  Risto Miikkulainen,et al.  Neuroevolution of an automobile crash warning system , 2005, GECCO '05.

[70]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[71]  Risto Miikkulainen,et al.  Evolving Soccer Keepaway Players Through Task Decomposition , 2005, Machine Learning.

[72]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[73]  Felix A. Fischer,et al.  An integrated framework for adaptive reasoning about conversation patterns , 2005, AAMAS '05.

[74]  Kenneth O. Stanley and Bobby D. Bryant and Risto Miikkulainen,et al.  Real-Time Evolution in the NERO Video Game (Winner of CIG 2005 Best Paper Award) , 2005, CIG.

[75]  R. Paul Wiegand,et al.  Biasing Coevolutionary Search for Optimal Multiagent Behaviors , 2006, IEEE Transactions on Evolutionary Computation.

[76]  Shimon Whiteson,et al.  Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.

[77]  Marco Dorigo,et al.  Incremental Evolution of Robot Controllers for a Highly Integrated Task , 2006, SAB.

[78]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[79]  Sean Luke,et al.  Archive-based cooperative coevolutionary algorithms , 2006, GECCO '06.

[80]  Daniel Kudenko,et al.  Multi-agent Reinforcement Learning for Intrusion Detection , 2007, Adaptive Agents and Multi-Agents Systems.

[81]  Kenneth O. Stanley,et al.  Generating large-scale neural networks through discovering geometric regularities , 2007, GECCO '07.

[82]  Erik Talvitie,et al.  An Experts Algorithm for Transfer Learning , 2007, IJCAI.

[83]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[84]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[85]  Marco Dorigo,et al.  Self-Organized Coordinated Motion in Groups of Physically Connected Robots , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[86]  Peter Eggenberger-Hotz Evolving Morphologies of Simulated 3d Organisms Based on Differential Gene Expression , 2007 .

[87]  Kenneth O. Stanley,et al.  A Case Study on the Critical Role of Geometric Regularity in Machine Learning , 2008, AAAI.

[88]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[89]  Charles Ofria,et al.  How a Generative Encoding Fares as Problem-Regularity Decreases , 2008, PPSN.

[90]  Kenneth O. Stanley,et al.  Generative encoding for multiagent learning , 2008, GECCO '08.

[91]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[92]  Dario Floreano,et al.  Evolutionary Advantages of Neuromodulated Plasticity in Dynamic, Reward-based Scenarios , 2008, ALIFE.

[93]  Karl Tuyls,et al.  Theoretical Advantages of Lenient Learners: An Evolutionary Game Theoretic Perspective , 2008, J. Mach. Learn. Res..

[94]  Jimmy Secretan,et al.  Picbreeder: evolving pictures collaboratively online , 2008, CHI.

[95]  Koji Nakano,et al.  A State Predictor Based Reinforcement Learning System , 2008 .

[96]  Charles Ofria,et al.  HybrID: A Hybridization of Indirect and Direct Encodings for Evolutionary Computation , 2009, ECAL.

[97]  Charles Ofria,et al.  Evolving coordinated quadruped gaits with the HyperNEAT generative encoding , 2009, 2009 IEEE Congress on Evolutionary Computation.

[98]  Jan Koutník,et al.  HyperNEAT controlled robots learn how to drive on roads in simulated environment , 2009, 2009 IEEE Congress on Evolutionary Computation.

[99]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[100]  Risto Miikkulainen,et al.  Coevolution of Role-Based Cooperation in Multiagent Systems , 2009, IEEE Transactions on Autonomous Mental Development.

[101]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[102]  Dario Floreano,et al.  Genetic Team Composition and Level of Selection in the Evolution of Cooperation , 2009, IEEE Transactions on Evolutionary Computation.

[103]  Charles Ofria,et al.  The sensitivity of HyperNEAT to different geometric representations of a problem , 2009, GECCO.

[104]  Kenneth O. Stanley A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[105]  V. Ramakrishnan,et al.  Measurement of the top-quark mass with dilepton events selected using neuroevolution at CDF. , 2008, Physical review letters.

[106]  Koji Nakano,et al.  A state predictor-based reinforcement learning system , 2010 .

[107]  Kenneth O. Stanley,et al.  Autonomous Evolution of Topographic Regularities in Artificial Neural Networks , 2010, Neural Computation.

[108]  David B. Knoester,et al.  Neuroevolution of mobile ad hoc networks , 2010, GECCO '10.

[109]  Joel Lehman,et al.  Evolving policy geometry for scalable multiagent learning , 2010, AAMAS.

[110]  Sebastian Risi,et al.  Indirectly Encoding Neural Plasticity as a Pattern of Local Rules , 2010, SAB.

[111]  Kenneth O. Stanley,et al.  Evolving a Single Scalable Controller for an Octopus Arm with a Variable Number of Segments , 2010, PPSN.

[112]  Ana L. C. Bazzan,et al.  Improving Space Representation in Multiagent Learning via Tile Coding , 2010, SBIA.

[113]  Kenneth O. Stanley,et al.  Transfer learning through indirect encoding , 2010, GECCO '10.

[114]  Kenneth O. Stanley,et al.  Evolving Static Representations for Task Transfer , 2010, J. Mach. Learn. Res..

[115]  Kenneth O. Stanley,et al.  Indirect Encoding of Neural Networks for Scalable Go , 2010, PPSN.

[116]  Serge Kernbach,et al.  Incremental Online Evolution and Adaptation of Neural Networks for Robot Control in Dynamic Environments , 2010 .

[117]  A. E. Eiben,et al.  HyperNEAT for Locomotion Control in Modular Robots , 2010, ICES.

[118]  Charles Ofria,et al.  Investigating whether hyperNEAT produces modular neural networks , 2010, GECCO '10.

[119]  Kenneth O. Stanley,et al.  Picbreeder: A Case Study in Collaborative Evolutionary Exploration of Design Space , 2011, Evolutionary Computation.

[120]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .