论文信息 - Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets - 字舞流文

Minimax Theorem for Latent Games or: How I Learned to Stop Worrying about Mixed-Nash and Love Neural Nets

Adversarial training, a special case of multi-objective optimization, is an increasingly useful tool in machine learning. For example, two-player zero-sum games are important for generative modeling (GANs) and for mastering games like Go or Poker via self-play. A classic result in Game Theory states that one must mix strategies, as pure equilibria may not exist. Surprisingly, machine learning practitioners typically train a \emph{single} pair of agents -- instead of a pair of mixtures -- going against Nash's principle. Our main contribution is a notion of limited-capacity-equilibrium for which, as capacity grows, optimal agents -- not mixtures -- can learn increasingly expressive and realistic behaviors. We define \emph{latent games}, a new class of game where agents are mappings that transform latent distributions. Examples include generators in GANs, which transform Gaussian noise into distributions on images, and StarCraft II agents, which transform sampled build orders into policies. We show that minimax equilibria in latent games can be approximated by a \emph{single} pair of dense neural networks. Finally, we apply our latent game approach to solve differentiable Blotto, a game with an infinite strategy space.

Yoram Bachrach | David Balduzzi | Wojciech Marian Czarnecki | Gauthier Gidel | Marta Garnelo | Yoram Bachrach | Wojciech M. Czarnecki | D. Balduzzi | M. Garnelo | Gauthier Gidel

[1] C. Papadimitriou. Algorithmic Game Theory: The Complexity of Finding Nash Equilibria , 2007 .

[2] Sergiu Hart,et al. Discrete Colonel Blotto and General Lotto games , 2008, Int. J. Game Theory.

[3] Kim C. Border,et al. Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[4] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[5] Michael I. Jordan,et al. What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[6] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[7] Erich Elsen,et al. High Fidelity Speech Synthesis with Adversarial Networks , 2019, ICLR.

[8] B. Roberson. The Colonel Blotto game , 2006 .

[9] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[10] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[11] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.

[12] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[13] Yan Wu,et al. LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .

[16] Eitan Zemel,et al. Nash and correlated equilibria: Some complexity considerations , 1989 .

[17] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[18] Liwei Wang,et al. The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[19] Gauthier Gidel,et al. Parametric Adversarial Divergences are Good Task Losses for Generative Modeling , 2017, ICLR.

[20] Robert Leonard. Theory of Games and Economic Behavior , 2006 .

[21] Tuomas Sandholm,et al. Safe and Nested Subgame Solving for Imperfect-Information Games , 2017, NIPS.

[22] Johannes Schmidt-Hieber,et al. Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[23] Yingyu Liang,et al. Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[24] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[25] Dongge Wang,et al. Finding Mixed Strategy Nash Equilibrium for Continuous Games through Deep Learning , 2019, ArXiv.

[26] Lillian J. Ratliff,et al. Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.

[27] Andreas Krause,et al. An Online Learning Approach to Generative Adversarial Networks , 2017, ICLR.

[28] Volkan Cevher,et al. Finding Mixed Nash Equilibria of Generative Adversarial Networks , 2018, ICML.

[29] Richard J. Lipton,et al. Simple strategies for large zero-sum games with applications to complexity theory , 1994, STOC '94.

[30] M. Sion. On general minimax theorems , 1958 .

[31] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[32] Rahul Savani,et al. Beyond Local Nash Equilibria for Adversarial Networks , 2018, BNCAI.

[33] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[34] J. Neumann,et al. Communication on the Borel Notes , 1953 .

[35] Mohammad Taghi Hajiaghayi,et al. From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.

[36] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[37] Raman Arora,et al. Understanding Deep Neural Networks with Rectified Linear Units , 2016, Electron. Colloquium Comput. Complex..

[38] Boris Hanin,et al. Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations , 2017, Mathematics.

[39] Walid Saad,et al. Generalized Colonel Blotto Game , 2017, 2018 Annual American Control Conference (ACC).