论文信息 - Open-ended Learning in Symmetric Zero-sum Games

Open-ended Learning in Symmetric Zero-sum Games

Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.

[1] Lawrence Freedman. The Problem of Strategy , 1980 .

[2] W. Daniel Hillis,et al. Co-evolving parasites improve simulated evolution as an optimization procedure , 1990 .

[3] Peter J. Fleming,et al. Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[4] C. Fonseca,et al. GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[5] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[6] Richard K. Belew,et al. New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[7] Kaisa Miettinen,et al. Nonlinear multiobjective optimization , 1998, International series in operations research and management science.

[8] Stefano Nolfi,et al. Co-evolving predator and prey robots , 1998, Artificial Life.

[9] Jordan B. Pollack,et al. A Game-Theoretic Approach to the Simple Coevolutionary Algorithm , 2000, PPSN.

[10] Jordan B. Pollack,et al. Pareto Optimality in Coevolutionary Learning , 2001, ECAL.

[11] Avrim Blum,et al. Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[12] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..

[13] B. Roberson. The Colonel Blotto game , 2006 .

[14] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[15] Risto Miikkulainen,et al. Coevolution of neural networks using a layered pareto archive , 2006, GECCO.

[16] Michael H. Bowling,et al. A New Algorithm for Generating Equilibria in Massive Zero-Sum Games , 2007, AAAI.

[17] Edwin D. de Jong,et al. A Monotonic Archive for Pareto-Coevolution , 2007, Evolutionary Computation.

[18] Sergiu Hart,et al. Discrete Colonel Blotto and General Lotto games , 2008, Int. J. Game Theory.

[19] Peter Bro Miltersen,et al. On Range of Skill , 2008, AAAI.

[20] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[21] Hod Lipson,et al. Coevolution of Fitness Predictors , 2008, IEEE Transactions on Evolutionary Computation.

[22] Kenneth O. Stanley,et al. Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[23] Yuan Yao,et al. Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[24] Asuman E. Ozdaglar,et al. Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[25] Edwin D. de Jong,et al. Coevolutionary Principles , 2012, Handbook of Natural Computing.

[26] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[27] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[28] Kenneth O. Stanley,et al. Open-Ended Evolution: Perspectives from the OEE Workshop in York , 2016, Artificial Life.

[29] Susan Stepney,et al. Defining and simulating open-ended novelty: requirements, guidelines, and challenges , 2016, Theory in Biosciences.

[30] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31] M. Baker. Hodge theory in combinatorics , 2017, 1705.07960.

[32] Kenneth O. Stanley,et al. Minimal criterion coevolution: a new approach to open-ended search , 2017, GECCO.

[33] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[34] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.

[35] Pieter Abbeel,et al. Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[36] Joel Z. Leibo,et al. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[37] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[38] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[39] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[40] Rui Wang,et al. Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions , 2019, ArXiv.