论文信息 - Differentiable Game Mechanics - 字舞流文

Differentiable Game Mechanics

Deep learning is built on the foundational guarantee that gradient descent on an objective function converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, that exhibit multiple interacting losses. The behavior of gradient-based methods in games is not well understood -- and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new tools to understand and control the dynamics in n-player differentiable games. The key result is to decompose the game Jacobian into two components. The first, symmetric component, is related to potential games, which reduce to gradient descent on an implicit function. The second, antisymmetric component, relates to Hamiltonian games, a new class of games that obey a conservation law akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games. Basic experiments show SGA is competitive with recently proposed algorithms for finding stable fixed points in GANs -- while at the same time being applicable to, and having guarantees in, much more general cases.

Thore Graepel | Karl Tuyls | Jakob N. Foerster | David Balduzzi | Sébastien Racanière | James Martens | Alistair Letcher | James Martens | T. Graepel | K. Tuyls | D. Balduzzi | Sébastien Racanière | Alistair Letcher | S. Racanière

[1] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[2] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[3] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[4] R. Rosenthal. A class of games possessing pure-strategy Nash equilibria , 1973 .

[5] V. Arnold. Mathematical Methods of Classical Mechanics , 1974 .

[6] Loring W. Tu,et al. Differential forms in algebraic topology , 1982, Graduate texts in mathematics.

[7] S. Sternberg,et al. Symplectic Techniques in Physics , 1984 .

[8] M. Shub. Global Stability of Dynamical Systems , 1986 .

[9] Xiaoyun Lu,et al. Hamiltonian games , 1992, J. Comb. Theory, Ser. B.

[10] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[11] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[12] Michael H. Bowling,et al. Convergence and No-Regret in Multiagent Learning , 2004, NIPS.

[13] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[15] Gábor Lugosi,et al. Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..

[16] Victor R. Lesser,et al. A Multiagent Reinforcement Learning Algorithm with Non-linear Dynamics , 2008, J. Artif. Intell. Res..

[17] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[18] Francisco Facchinei,et al. Generalized Nash Equilibrium Problems , 2010, Ann. Oper. Res..

[19] Francisco Facchinei,et al. Convex Optimization, Game Theory, and Variational Inequality Theory , 2010, IEEE Signal Processing Magazine.

[20] Yuan Yao,et al. Statistical ranking and combinatorial Hodge theory , 2008, Math. Program..

[21] Asuman E. Ozdaglar,et al. Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[22] S. Hart,et al. Simple Adaptive Strategies: From Regret-matching To Uncoupled Dynamics , 2013 .

[23] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[24] L. F. Abbott,et al. Hierarchical Control Using Networks Trained with Higher-Level Forward Models , 2014, Neural Computation.

[25] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[26] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[27] Zhengyuan Zhou,et al. Learning in games with continuous action sets and unknown payoff functions , 2016, Mathematical Programming.

[28] David Pfau,et al. Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[29] Sridhar Mahadevan,et al. Online Monotone Optimization , 2016, ArXiv.

[30] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[31] Christos H. Papadimitriou,et al. From Nash Equilibria to Chain Recurrent Sets: Solution Concepts and Topology , 2016, ITCS.

[32] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.

[33] Marek Petrik,et al. Proximal Gradient Temporal Difference Learning Algorithms , 2016, IJCAI.

[34] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[35] David Balduzzi,et al. Strongly-Typed Agents are Guaranteed to Interact Safely , 2017, ICML.

[36] J. Zico Kolter,et al. Gradient descent GAN optimization is locally stable , 2017, NIPS.

[37] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[40] Sridhar Mahadevan,et al. Online Monotone Games , 2017, ArXiv.

[41] Sebastian Nowozin,et al. The Numerics of GANs , 2017, NIPS.

[42] David Pfau,et al. Unrolled Generative Adversarial Networks , 2016, ICLR.

[43] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[44] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Fei Xia,et al. Understanding GANs: the LQG Setting , 2017, ArXiv.

[46] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[47] Aleksander Madry,et al. A Classification-Based Study of Covariate Shift in GAN Distributions , 2017, ICML.

[48] Sridhar Mahadevan,et al. Global Convergence to the Equilibrium of GANs using Variational Inequalities , 2018, ArXiv.

[49] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[50] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.

[51] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[52] Sebastian Nowozin,et al. Which Training Methods for GANs do actually Converge? , 2018, ICML.

[53] Christos H. Papadimitriou,et al. From Nash Equilibria to Chain Recurrent Sets: An Algorithmic Solution Concept for Game Theory , 2018, Entropy.

[54] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.

[55] Thore Graepel,et al. Re-evaluating evaluation , 2018, NeurIPS.

[56] Chuan-Sheng Foo,et al. Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[57] Michael I. Jordan,et al. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[58] Ioannis Mitliagkas,et al. Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[59] Gauthier Gidel,et al. A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[60] Zhengyuan Zhou,et al. Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[61] Georgios Piliouras,et al. Multi-Agent Learning in Network Zero-Sum Games is a Hamiltonian System , 2019, AAMAS.

[62] Shimon Whiteson,et al. Stable Opponent Shaping in Differentiable Games , 2018, ICLR.