论文信息 - A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets - 字舞流文

A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

Adversarial training, a special case of multi-objective optimization, is an increasingly prevalent machine learning technique: some of its most notable applications include GAN-based generative modeling and self-play techniques in reinforcement learning which have been applied to complex games such as Go or Poker. In practice, a \emph{single} pair of networks is typically trained in order to find an approximate equilibrium of a highly nonconcave-nonconvex adversarial problem. However, while a classic result in game theory states such an equilibrium exists in concave-convex games, there is no analogous guarantee if the payoff is nonconcave-nonconvex. Our main contribution is to provide an approximate minimax theorem for a large class of games where the players pick neural networks including WGAN, StarCraft II, and Blotto Game. Our findings rely on the fact that despite being nonconcave-nonconvex with respect to the neural networks parameters, these games are concave-convex with respect to the actual models (e.g., functions or distributions) represented by these neural networks.

Yoram Bachrach | David Balduzzi | Gauthier Gidel | Marta Garnelo | Wojciech Marian Czarnecki

[1] E. Kalai. Bounded Rationality and Strategic Complexity in Repeated Games , 1987 .

[2] J. Neumann,et al. Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[3] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[4] A. Rubinstein. Modeling Bounded Rationality , 1998 .

[5] Yan Wu,et al. LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[6] Andreas Krause,et al. An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[7] M. Sion. On general minimax theorems , 1958 .

[8] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[9] Joseph Y. Halpern,et al. 2 A Computational Game-Theoretic Framework 2 . 1 Bayesian Games , 2008 .

[10] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[11] K Fan,et al. Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[12] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[13] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[14] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[16] B. Roberson. The Colonel Blotto game , 2006 .

[17] A. Neyman. Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[18] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[20] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[21] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[22] Richard J. Lipton,et al. Simple strategies for large zero-sum games with applications to complexity theory , 1994, STOC '94.

[23] Rahul Savani,et al. Beyond Local Nash Equilibria for Adversarial Networks , 2018, BNCAI.

[24] Tanner Fiez,et al. Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[25] Joan Bruna,et al. A mean-field analysis of two-player zero-sum games , 2020, NeurIPS.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[28] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[29] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[30] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[31] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[32] C. Papadimitriou. Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[33] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[34] Kim C. Border,et al. Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[35] Sergiu Hart,et al. Discrete Colonel Blotto and General Lotto games , 2008, Int. J. Game Theory.

[36] Dongge Wang,et al. Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning , 2019, ArXiv.

[37] Walid Saad,et al. Generalized Colonel Blotto Game , 2017, 2018 Annual American Control Conference (ACC).

[38] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[39] Herbert A. Simon,et al. The Sciences of the Artificial , 1970 .

[40] Yingyu Liang,et al. Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[41] Yaoliang Yu,et al. Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization , 2020, ArXiv.

[42] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.

[43] Asuman Ozdaglar,et al. Do GANs always have Nash equilibria? , 2020, ICML.

[44] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[45] Pascal Vincent,et al. A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[46] Volkan Cevher,et al. Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.

[47] Anima Anandkumar,et al. Implicit competitive regularization in GANs , 2020, ICML.

[48] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[49] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[50] Gauthier Gidel,et al. Parametric Adversarial Divergences are Good Task Losses for Generative Modeling , 2017, ICLR.

[51] David Tse,et al. A Convex Duality Framework for GANs , 2018, NeurIPS.

[52] Mohammad Taghi Hajiaghayi,et al. From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.