Meta-Learning in Games

In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games. We establish the first meta-learning guarantees for a variety of fundamental and well-studied classes of games, including two-player zero-sum games, general-sum games, and Stackelberg games. In particular, we obtain rates of convergence to different game-theoretic equilibria that depend on natural notions of similarity between the sequence of games encountered, while at the same time recovering the known single-game guarantees when the sequence of games is arbitrary. Along the way, we prove a number of new results in the single-game regime through a simple and unified framework, which may be of independent interest. Finally, we evaluate our meta-learning algorithms on endgames faced by the poker agent Libratus against top human professionals. The experiments show that games with varying stack sizes can be solved significantly faster using our meta-learning techniques than by solving them separately, often by an order of magnitude.

[1]  Yang Cai,et al.  Accelerated Single-Call Methods for Constrained Min-Max Optimization , 2022, ICLR.

[2]  V. Cevher,et al.  No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation , 2022, NeurIPS.

[3]  G. Piliouras,et al.  Alternating Mirror Descent for Constrained Min-Max Games , 2022, NeurIPS.

[4]  Tianyi Zhou,et al.  Learning to Collaborate in Decentralized Learning of Personalized Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  R. Meir,et al.  Online Meta-Learning in Adversarial Multi-Armed Bandits , 2022, ArXiv.

[6]  Zhiwei Steven Wu,et al.  Meta-Learning Adversarial Bandits , 2022, ArXiv.

[7]  Haipeng Luo,et al.  Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games , 2022, NeurIPS.

[8]  Xi Chen,et al.  Memory Bounds for Continual Learning , 2022, 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS).

[9]  Adam Lerer,et al.  Equilibrium Finding in Normal-Form Games Via Greedy Regret Minimization , 2022, AAAI.

[10]  Ioannis Panageas,et al.  On Last-Iterate Convergence Beyond Zero-Sum Games , 2022, ICML.

[11]  Maria-Florina Balcan,et al.  Learning Predictions for Algorithms with Predictions , 2022, NeurIPS.

[12]  Nika Haghtalab,et al.  Oracle-Efficient Online Learning for Beyond Worst-Case Adversaries , 2022, ArXiv.

[13]  A. Rakhlin,et al.  Smoothed Online Learning is as Easy as Statistical Learning , 2022, COLT.

[14]  Haipeng Luo,et al.  No-Regret Learning in Time-Varying Zero-Sum Games , 2022, ICML.

[15]  Tuomas Sandholm,et al.  Fast Payoff Matrix Sparsification Techniques for Structured Extensive-Form Games , 2021, AAAI.

[16]  Jacob D. Abernethy,et al.  No-Regret Dynamics in the Fenchel Game: A Unified Framework for Algorithmic Convex Optimization , 2021, Mathematical Programming.

[17]  C. Daskalakis,et al.  Fast rates for nonparametric online learning: from realizability to learning in games , 2021, STOC.

[18]  Georgios Piliouras,et al.  Online Learning in Periodic Zero-Sum Games , 2021, NeurIPS.

[19]  Ali H. Sayed,et al.  Distributed Meta-Learning with Networked Agents , 2021, 2021 29th European Signal Processing Conference (EUSIPCO).

[20]  C. Daskalakis,et al.  Near-Optimal No-Regret Learning in General Games , 2021, NeurIPS.

[21]  Panayotis Mertikopoulos,et al.  The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities , 2021, COLT.

[22]  Stefanos Leonardos,et al.  Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games , 2021, ICLR.

[23]  Kimon Antonakopoulos,et al.  Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium , 2021, COLT.

[24]  P. Mertikopoulos,et al.  Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information , 2021, COLT.

[25]  Noah Golowich,et al.  Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.

[26]  Kane,et al.  Beyond the Worst-Case Analysis of Algorithms , 2020 .

[27]  Michael I. Jordan,et al.  Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization , 2020, AISTATS.

[28]  Noah Golowich,et al.  Tight last-iterate convergence rates for no-regret learning in multi-player games , 2020, NeurIPS.

[29]  A. Sayed,et al.  Dif-MAML: Decentralized Multi-Agent Meta-Learning , 2020, IEEE Open Journal of Signal Processing.

[30]  Zhi-Hua Zhou,et al.  Dynamic Regret of Convex and Smooth Functions , 2020, NeurIPS.

[31]  Haipeng Luo,et al.  Linear Last-iterate Convergence in Constrained Saddle-point Optimization , 2020, ICLR.

[32]  Sergei Vassilvitskii,et al.  Algorithms with predictions , 2020, Beyond the Worst-Case Analysis of Algorithms.

[33]  Praneeth Netrapalli,et al.  Follow the Perturbed Leader: Optimism and Fast Parallel Algorithms for Smooth Minimax Games , 2020, NeurIPS.

[34]  Noah Golowich,et al.  Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems , 2020, COLT.

[35]  J. Malick,et al.  On the convergence of single-call stochastic extra-gradient methods , 2019, NeurIPS.

[36]  Sham M. Kakade,et al.  On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..

[37]  Maria-Florina Balcan,et al.  Adaptive Gradient-Based Meta-Learning Methods , 2019, NeurIPS.

[38]  He Wang,et al.  Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games , 2019, ICML.

[39]  Donald Goldfarb,et al.  Increasing Iterate Averaging for Solving Saddle-Point Problems , 2019, AAAI.

[40]  Heinz H. Bauschke,et al.  Generalized monotone operators and their averaged resolvents , 2019, Mathematical Programming.

[41]  Sergey Levine,et al.  Online Meta-Learning , 2019, ICML.

[42]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[43]  Tuomas Sandholm,et al.  Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[44]  P. Mertikopoulos,et al.  Multiagent Online Learning in Time-Varying Games , 2018, Math. Oper. Res..

[45]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[46]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[47]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[48]  Haipeng Luo,et al.  More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.

[49]  P. Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[50]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[51]  Francesca Parise,et al.  Learning dynamics in stochastic routing games , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[52]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[53]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[54]  Tim Roughgarden,et al.  The Price of Anarchy in Auctions , 2016, J. Artif. Intell. Res..

[55]  Jinfeng Yi,et al.  Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.

[56]  Noam Nisan,et al.  An Experimental Evaluation of Regret-Based Econometrics , 2016, WWW.

[57]  Tuomas Sandholm,et al.  Strategy-Based Warm Starting for Regret Minimization in Games , 2016, AAAI.

[58]  Yang Cai,et al.  Zero-Sum Polymatrix Games: A Generalization of Minmax , 2016, Math. Oper. Res..

[59]  Tuomas Sandholm,et al.  Regret-Based Pruning in Extensive-Form Games , 2015, NIPS.

[60]  Tuomas Sandholm,et al.  Simultaneous Abstraction and Equilibrium Finding in Games , 2015, IJCAI.

[61]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[62]  Éva Tardos,et al.  No-Regret Learning in Bayesian Games , 2015, NIPS.

[63]  Maria-Florina Balcan,et al.  Commitment Without Regrets: Online Learning in Stackelberg Security Games , 2015, EC.

[64]  Éva Tardos,et al.  Econometrics for Learning Agents , 2015, EC.

[65]  Éva Tardos,et al.  Learning and Efficiency in Games with Dynamic Population , 2015, SODA.

[66]  Santosh S. Vempala,et al.  Efficient Representations for Lifelong Learning and Autoencoding , 2014, COLT.

[67]  Oskari Tammelin,et al.  Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.

[68]  Tuomas Sandholm,et al.  Regret Transfer and Parameter Optimization , 2014, AAAI.

[69]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[70]  Éva Tardos,et al.  Composable and efficient mechanisms , 2012, STOC '13.

[71]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[72]  T. Roughgarden,et al.  Intrinsic robustness of the price of anarchy , 2012, Commun. ACM.

[73]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[74]  John C. Duchi,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011 .

[75]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[76]  Nikhil R. Devanur,et al.  Distributed algorithms via gradient descent for fisher markets , 2011, EC '11.

[77]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[78]  Asuman E. Ozdaglar,et al.  Dynamics in near-potential games , 2011, Games Econ. Behav..

[79]  Martin Hoefer,et al.  Competitive routing over time , 2009, Theor. Comput. Sci..

[80]  Allan Borodin,et al.  Price of anarchy for greedy auctions , 2009, SODA '10.

[81]  Éva Tardos,et al.  Multiplicative updates outperform generic no-regret learning in congestion games: extended abstract , 2009, STOC '09.

[82]  Elad Hazan,et al.  Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..

[83]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[84]  Javier Peña,et al.  First-Order Algorithm with O(ln(1/e)) Convergence for e-Equilibrium in Two-Person Zero-Sum Games , 2008, AAAI.

[85]  Annamária Kovács,et al.  Bayesian Combinatorial Auctions , 2008, ICALP.

[86]  Xiaotie Deng,et al.  Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[87]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[88]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[89]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[90]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[91]  Y. Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2005, Machine Learning.

[92]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[93]  Elias Koutsoupias,et al.  The price of anarchy of finite congestion games , 2005, STOC '05.

[94]  Yossi Azar,et al.  The Price of Routing Unsplittable Flow , 2005, STOC '05.

[95]  John N. Tsitsiklis,et al.  Efficiency loss in a network resource allocation game: the case of elastic supply , 2004, IEEE Transactions on Automatic Control.

[96]  Adrian Vetta,et al.  Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[97]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[98]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[99]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[100]  D. Koller,et al.  The complexity of two-person zero-sum games in extensive form , 1992 .

[101]  Paul R. Milgrom,et al.  Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities , 1990 .

[102]  Sergiu Hart,et al.  Existence of Correlated Equilibria , 1989, Math. Oper. Res..

[103]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[104]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[105]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[106]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[107]  Haipeng Luo,et al.  Near-Optimal No-Regret Learning for General Convex Games , 2022, ArXiv.

[108]  G. Piliouras,et al.  Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU , 2021, ArXiv.

[109]  Eric V. Mazumdar,et al.  Global Convergence to Local Minmax Equilibrium in Classes of Nonconvex Zero-Sum Games , 2021, NeurIPS.

[110]  Tuomas Sandholm,et al.  A Unified Framework for Extensive-Form Game Abstraction with Bounds , 2018, NeurIPS.

[111]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[112]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[113]  Patrick L. Combettes,et al.  Proximal Methods for Cohypomonotone Operators , 2004, SIAM J. Control. Optim..

[114]  E. Vandamme Stability and perfection of nash equilibria , 1987 .

[115]  J. Neumann A Model of General Economic Equilibrium , 1945 .