Consensus Multiplicative Weights Update: Learning to Learn using Projector-based Game Signatures

Recently, Optimistic Multiplicative Weights Update (OMWU) was proven to be the first constant step-size algorithm in the online no-regret framework to enjoy last-iterate convergence to Nash Equilibria in the constrained zero-sum bimatrix case, where weights represent the probabilities of playing pure strategies. We introduce the second such algorithm, Consensus MWU, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU. Our algorithm shows the importance of a new object, the simplex Hessian, as well as of the interaction of the game with the (eigen)space of vectors summing to zero, which we believe future research can build on. As for OMWU, CMWU has convergence guarantees in the zero-sum case only, but Cheung and Piliouras (2020) recently showed that OMWU and MWU display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we extend CMWU to non zero-sum games by introducing a new framework for online learning in games, where the update rule’s gradient and Hessian coefficients along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: the game signature. We construct the latter using a new canonical decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We show empirically that our new learning policy is able to exploit the game signature across a wide range of game types.

[1]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[3]  Panayotis Mertikopoulos,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[4]  Asuman E. Ozdaglar,et al.  Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[5]  G. Evans,et al.  Learning to Optimize , 2008 .

[6]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[7]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[10]  Wotao Yin,et al.  Learning A Minimax Optimizer: A Pilot Study , 2021, ICLR.

[11]  Thore Graepel,et al.  Differentiable Game Mechanics , 2019, J. Mach. Learn. Res..

[12]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[13]  Yun Kuen Cheung,et al.  Chaos, Extremism and Optimism: Volume Analysis of Learning in Games , 2020, NeurIPS.

[14]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[15]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[16]  Shiyu Chang,et al.  Training Stronger Baselines for Learning to Optimize , 2020, NeurIPS.

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Constantinos Daskalakis,et al.  Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[19]  Haipeng Luo,et al.  Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[20]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[21]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[22]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[23]  Sung-Ha Hwang,et al.  Strategic Decompositions of Normal Form Games: Zero-sum Games and Potential Games , 2016, Games Econ. Behav..

[24]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Differentiable Games , 2020, AISTATS.

[25]  Xiao Wang,et al.  Last iterate convergence in no-regret learning: constrained min-max optimization for convex-concave landscapes , 2020, AISTATS.