论文信息 - A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets - 字舞流文

A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

Adversarial training, a special case of multi-objective optimization, is an increasingly prevalent machine learning technique: some of its most notable applications include GAN-based generative modeling and self-play techniques in reinforcement learning which have been applied to complex games such as Go or Poker. In practice, a \emph{single} pair of networks is typically trained in order to find an approximate equilibrium of a highly nonconcave-nonconvex adversarial problem. However, while a classic result in game theory states such an equilibrium exists in concave-convex games, there is no analogous guarantee if the payoff is nonconcave-nonconvex. Our main contribution is to provide an approximate minimax theorem for a large class of games where the players pick neural networks including WGAN, StarCraft II, and Blotto Game. Our findings rely on the fact that despite being nonconcave-nonconvex with respect to the neural networks parameters, these games are concave-convex with respect to the actual models (e.g., functions or distributions) represented by these neural networks.

Yoram Bachrach | David Balduzzi | Wojciech Marian Czarnecki | Gauthier Gidel | Marta Garnelo | Yoram Bachrach | Wojciech M. Czarnecki | D. Balduzzi | M. Garnelo | Gauthier Gidel

[1] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[2] M. Sion. On general minimax theorems , 1958 .

[3] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[4] Tanner Fiez,et al. Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[5] Yan Wu,et al. LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[6] Dongge Wang,et al. Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning , 2019, ArXiv.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Joan Bruna,et al. A mean-field analysis of two-player zero-sum games , 2020, NeurIPS.

[9] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[11] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[12] Mohammad Taghi Hajiaghayi,et al. From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.

[13] A. Rubinstein. Modeling Bounded Rationality , 1998 .

[14] Yaoliang Yu,et al. Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization , 2020, ArXiv.

[15] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[16] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[17] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[18] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[19] Walid Saad,et al. Generalized Colonel Blotto Game , 2017, 2018 Annual American Control Conference (ACC).

[20] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[21] C. Papadimitriou. Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[22] Yingyu Liang,et al. Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[23] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[25] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[26] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[27] A. Neyman. Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[28] B. Roberson. The Colonel Blotto game , 2006 .

[29] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[30] Peter Henderson,et al. An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[31] K Fan,et al. Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[32] Herbert A. Simon,et al. The Sciences of the Artificial , 1970 .

[33] Rahul Savani,et al. Beyond Local Nash Equilibria for Adversarial Networks , 2018, BNCAI.

[34] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[35] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[36] Andreas Krause,et al. An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[37] Volkan Cevher,et al. Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.