论文信息 - Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of problem parameter (e.g. cocoercive constant $\lambda$) and show, via a novel double-stopping time technique, that this adaptive algorithm achieves same finite-time last-iterate convergence rate as non-adaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish last-iterate convergence results--first qualitative almost sure convergence, then quantitative finite-time convergence rates-- all under non-decreasing step-sizes. To our knowledge, we provide the first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects--finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before.

[1] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[2] Luca Sanguinetti,et al. Distributed Stochastic Optimization via Matrix Exponential Learning , 2016, IEEE Transactions on Signal Processing.

[3] Stephen P. Boyd,et al. Stochastic Mirror Descent in Variationally Coherent Optimization Problems , 2017, NIPS.

[4] Avrim Blum,et al. On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[5] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[6] P. Hall,et al. Martingale Limit Theory and Its Application , 1980 .

[7] C. Tomlin,et al. Multi-Agent Online Learning with Imperfect Information , 2018 .

[8] Alexandre M. Bayen,et al. On Learning How Players Learn: Estimation of Learning Dynamics in the Routing Game , 2016, 2016 ACM/IEEE 7th International Conference on Cyber-Physical Systems (ICCPS).

[9] Peter W. Glynn,et al. Mirror descent learning in continuous games , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[10] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[11] Karl Tuyls,et al. Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[12] Francis Bach,et al. A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise , 2019, COLT.

[13] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[14] Alexandre M. Bayen,et al. Convergence of heterogeneous distributed learning in stochastic routing games , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15] Yoram Singer,et al. Convex Repeated Games and Fenchel Duality , 2006, NIPS.

[16] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[17] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[18] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[19] Peter W. Glynn,et al. Robust Power Management via Learning and Game Design , 2021, Oper. Res..

[20] Peter W. Glynn,et al. Learning in Games with Lossy Feedback , 2018, NeurIPS.

[21] Georgios Piliouras,et al. Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos , 2017, NIPS.

[22] Peter W. Glynn,et al. Countering Feedback Delays in Multi-Agent Learning , 2017, NIPS.

[23] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[24] Constantinos Daskalakis,et al. Training GANs with Optimism , 2017, ICLR.

[25] Andriy Zapechelnyuk,et al. No-regret dynamics and fictitious play , 2012, J. Econ. Theory.

[26] Georgios Piliouras,et al. Limits and limitations of no-regret learning in games , 2017, The Knowledge Engineering Review.

[27] Zhengyuan Zhou,et al. Learning in games with continuous action sets and unknown payoff functions , 2019, Math. Program..

[28] Stephen P. Boyd,et al. On the Convergence of Mirror Descent beyond Stochastic Convex Programming , 2017, SIAM J. Optim..

[29] Yishay Mansour,et al. From External to Internal Regret , 2005, J. Mach. Learn. Res..

[30] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[31] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.

[32] Kent Quanrud,et al. Online Learning with Adversarial Delays , 2015, NIPS.