A Limited-Capacity Minimax Theorem for Non-Convex Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

Adversarial training, a special case of multi-objective optimization, is an increasingly prevalent machine learning technique: some of its most notable applications include GAN-based generative modeling and self-play techniques in reinforcement learning which have been applied to complex games such as Go or Poker. In practice, a \emph{single} pair of networks is typically trained in order to find an approximate equilibrium of a highly nonconcave-nonconvex adversarial problem. However, while a classic result in game theory states such an equilibrium exists in concave-convex games, there is no analogous guarantee if the payoff is nonconcave-nonconvex. Our main contribution is to provide an approximate minimax theorem for a large class of games where the players pick neural networks including WGAN, StarCraft II, and Blotto Game. Our findings rely on the fact that despite being nonconcave-nonconvex with respect to the neural networks parameters, these games are concave-convex with respect to the actual models (e.g., functions or distributions) represented by these neural networks.

[1]  E. Kalai Bounded Rationality and Strategic Complexity in Repeated Games , 1987 .

[2]  J. Neumann,et al.  Theory of games and economic behavior , 1945, 100 Years of Math Milestones.

[3]  Erich Elsen,et al.  High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[4]  A. Rubinstein Modeling Bounded Rationality , 1998 .

[5]  Yan Wu,et al.  LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[6]  Andreas Krause,et al.  An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[7]  M. Sion On general minimax theorems , 1958 .

[8]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[9]  Joseph Y. Halpern,et al.  2 A Computational Game-Theoretic Framework 2 . 1 Bayesian Games , 2008 .

[10]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[11]  K Fan,et al.  Minimax Theorems. , 1953, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[15]  Tuomas Sandholm,et al.  Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[16]  B. Roberson The Colonel Blotto game , 2006 .

[17]  A. Neyman Bounded complexity justifies cooperation in the finitely repeated prisoners' dilemma , 1985 .

[18]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[20]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[21]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[22]  Richard J. Lipton,et al.  Simple strategies for large zero-sum games with applications to complexity theory , 1994, STOC '94.

[23]  Rahul Savani,et al.  Beyond Local Nash Equilibria for Adversarial Networks , 2018, BNCAI.

[24]  Tanner Fiez,et al.  Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[25]  Joan Bruna,et al.  A mean-field analysis of two-player zero-sum games , 2020, NeurIPS.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[28]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[29]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[30]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[31]  Noam Brown,et al.  Superhuman AI for multiplayer poker , 2019, Science.

[32]  C. Papadimitriou Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[33]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[34]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[35]  Sergiu Hart,et al.  Discrete Colonel Blotto and General Lotto games , 2008, Int. J. Game Theory.

[36]  Dongge Wang,et al.  Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning , 2019, ArXiv.

[37]  Walid Saad,et al.  Generalized Colonel Blotto Game , 2017, 2018 Annual American Control Conference (ACC).

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .

[40]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[41]  Yaoliang Yu,et al.  Optimality and Stability in Non-Convex-Non-Concave Min-Max Optimization , 2020, ArXiv.

[42]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[43]  Asuman Ozdaglar,et al.  Do GANs always have Nash equilibria? , 2020, ICML.

[44]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[45]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[46]  Volkan Cevher,et al.  Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.

[47]  Anima Anandkumar,et al.  Implicit competitive regularization in GANs , 2020, ICML.

[48]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[49]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[50]  Gauthier Gidel,et al.  Parametric Adversarial Divergences are Good Task Losses for Generative Modeling , 2017, ICLR.

[51]  David Tse,et al.  A Convex Duality Framework for GANs , 2018, NeurIPS.

[52]  Mohammad Taghi Hajiaghayi,et al.  From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.