Open-ended Learning in Symmetric Zero-sum Games

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

[1]  Lawrence Freedman The Problem of Strategy , 1980 .

[2]  W. Daniel Hillis,et al.  Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[3]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[4]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[5]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[6]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[7]  Kaisa Miettinen,et al.  Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[8]  Stefano Nolfi,et al.  Co-evolving predator and prey robots , 1998, Artificial Life.

[9]  Jordan B. Pollack,et al.  A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[10]  Jordan B. Pollack,et al.  Pareto Optimality in Coevolutionary Learning , 2001, ECAL.

[11]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[12]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[13]  B. Roberson The Colonel Blotto game , 2006 .

[14]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[15]  Risto Miikkulainen,et al.  Coevolution of neural networks using a layered pareto archive , 2006, GECCO.

[16]  Michael H. Bowling,et al.  A New Algorithm for Generating Equilibria in Massive Zero-Sum Games , 2007, AAAI.

[17]  Edwin D. de Jong,et al.  A Monotonic Archive for Pareto-Coevolution , 2007, Evolutionary Computation.

[18]  Sergiu Hart,et al.  Discrete Colonel Blotto and General Lotto games , 2008, Int. J. Game Theory.

[19]  Peter Bro Miltersen,et al.  On Range of Skill , 2008, AAAI.

[20]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[21]  Hod Lipson,et al.  Coevolution of Fitness Predictors , 2008, IEEE Transactions on Evolutionary Computation.

[22]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[23]  Yuan Yao,et al.  Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[24]  Asuman E. Ozdaglar,et al.  Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[25]  Edwin D. de Jong,et al.  Coevolutionary Principles , 2012, Handbook of Natural Computing.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[28]  Kenneth O. Stanley,et al.  Open-Ended Evolution: Perspectives from the OEE Workshop in York , 2016, Artificial Life.

[29]  Susan Stepney,et al.  Defining and simulating open-ended novelty: requirements, guidelines, and challenges , 2016, Theory in Biosciences.

[30]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  M. Baker Hodge theory in combinatorics , 2017, 1705.07960.

[32]  Kenneth O. Stanley,et al.  Minimal criterion coevolution: a new approach to open-ended search , 2017, GECCO.

[33]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[34]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[35]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[36]  Joel Z. Leibo,et al.  Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[37]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[38]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[39]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[40]  Rui Wang,et al.  Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.