论文信息 - Stable Opponent Shaping in Differentiable Games - 字舞流文

Stable Opponent Shaping in Differentiable Games

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL. Opponent shaping is a powerful approach to improve learning dynamics in these games, accounting for player influence on others' updates. Learning with Opponent-Learning Awareness (LOLA) is a recent algorithm that exploits this response and leads to cooperation in settings like the Iterated Prisoner's Dilemma. Although experimentally successful, we show that LOLA agents can exhibit 'arrogant' behaviour directly at odds with convergence. In fact, remarkably few algorithms have theoretical guarantees applying across all (n-player, non-convex) games. In this paper we present Stable Opponent Shaping (SOS), a new method that interpolates between LOLA and a stable variant named LookAhead. We prove that LookAhead converges locally to equilibria and avoids strict saddles in all differentiable games. SOS inherits these essential guarantees, while also shaping the learning of opponents and consistently either matching or outperforming LOLA experimentally.

Shimon Whiteson | Jakob N. Foerster | David Balduzzi | Tim Rocktäschel | Alistair Letcher | Tim Rocktäschel | D. Balduzzi | Alistair Letcher | Shimon Whiteson

[1] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[2] Georgios Piliouras,et al. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.

[3] Zhengyuan Zhou,et al. Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[4] Francisco Facchinei,et al. Generalized Nash Equilibrium Problems , 2010, Ann. Oper. Res..

[5] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[6] Razvan Pascanu,et al. Imagination-Augmented Agents for Deep Reinforcement Learning , 2017, NIPS.

[7] Michael I. Jordan,et al. First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[8] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[10] J. Goodman. Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[11] Victor R. Lesser,et al. Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[12] James M. Ortega,et al. Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[13] Sebastian Nowozin,et al. The Numerics of GANs , 2017, NIPS.

[14] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[15] L. F. Abbott,et al. Hierarchical Control Using Networks Trained with Higher-Level Forward Models , 2014, Neural Computation.

[16] Manuela M. Veloso,et al. Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[17] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] Shimon Whiteson,et al. DiCE: The Infinitely Differentiable Monte-Carlo Estimator , 2018, ICML.

[19] Vincent Conitzer,et al. AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[20] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[21] Thore Graepel,et al. The Mechanics of n-Player Differentiable Games , 2018, ICML.

[22] J. Zico Kolter,et al. Gradient descent GAN optimization is locally stable , 2017, NIPS.

[23] Sridhar Mahadevan,et al. Global Convergence to the Equilibrium of GANs using Variational Inequalities , 2018, ArXiv.

[24] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.

[25] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[26] Ian Goodfellow,et al. Generative adversarial networks , 2020, Commun. ACM.