A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

Adversarial training, a special case of multi-objective optimization, is an increasingly prevalent machine learning technique: some of its most notable applications include GAN-based generative modeling and self-play techniques in reinforcement learning which have been applied to complex games such as Go or Poker. In practice, a \emph{single} pair of networks is typically trained in order to find an approximate equilibrium of a highly nonconcave-nonconvex adversarial problem. However, while a classic result in game theory states such an equilibrium exists in concave-convex games, there is no analogous guarantee if the payoff is nonconcave-nonconvex. Our main contribution is to provide an approximate minimax theorem for a large class of games where the players pick neural networks including WGAN, StarCraft II, and Blotto Game. Our findings rely on the fact that despite being nonconcave-nonconvex with respect to the neural networks parameters, these games are concave-convex with respect to the actual models (e.g., functions or distributions) represented by these neural networks.

[1]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[2]  M. Sion On general minimax theorems , 1958 .

[3]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[4]  Tanner Fiez,et al.  Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[5]  Yan Wu,et al.  LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[6]  Dongge Wang,et al.  Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning , 2019, ArXiv.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Joan Bruna,et al.  A mean-field analysis of two-player zero-sum games , 2020, NeurIPS.

[9]  Erich Elsen,et al.  High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[10]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[11]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[12]  Mohammad Taghi Hajiaghayi,et al.  From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.

[13]  A. Rubinstein Modeling Bounded Rationality , 1998 .

[14]  Yaoliang Yu,et al.  Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization , 2020, ArXiv.

[15]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[16]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[17]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[18]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[19]  Walid Saad,et al.  Generalized Colonel Blotto Game , 2017, 2018 Annual American Control Conference (ACC).

[20]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[21]  C. Papadimitriou Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[22]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[23]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[24]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[25]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[26]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[27]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[28]  B. Roberson The Colonel Blotto game , 2006 .

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[31]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[33]  Rahul Savani,et al.  Beyond Local Nash Equilibria for Adversarial Networks , 2018, BNCAI.

[34]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[35]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[36]  Andreas Krause,et al.  An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[37]  Volkan Cevher,et al.  Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.