Learning in Games: Robustness of Fast Convergence

We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a $(1+\epsilon)$-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. [SALS15] in a number of ways. We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and show convergence under bandit feedback. Finally, we improve upon the speed of convergence by a factor of $n$, the number of players. Both the scope of settings and the class of algorithms for which our analysis provides fast convergence are considerably broader than in previous work. Our framework applies to dynamic population games via a low approximate regret property for shifting experts. Here we strengthen the results of Lykouris et al. [LST16] in two ways: We allow players to select learning algorithms from a larger class, which includes a minor variant of the basic Hedge algorithm, and we increase the maximum churn in players for which approximate optimality is achieved. In the bandit setting we present a new algorithm which provides a "small loss"-type bound with improved dependence on the number of actions in utility settings, and is both simple and efficient. This result may be of independent interest.

[1]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[2]  Ran El-Yaniv,et al.  How to Better Use Expert Advice , 2004, Machine Learning.

[3]  Gilles Stoltz Incomplete information and internal regret in prediction of individual sequences , 2005 .

[4]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[5]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[6]  Adam Tauman Kalai,et al.  Playing games with approximation algorithms , 2007, STOC '07.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[9]  Tim Roughgarden,et al.  The Price of Anarchy in Auctions , 2016, J. Artif. Intell. Res..

[10]  Tim Roughgarden,et al.  How bad is selfish routing? , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[11]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[12]  Peter Auer,et al.  Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.

[13]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[14]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[15]  Haipeng Luo,et al.  Fast Convergence of Regularized Learning in Games , 2015, NIPS.

[16]  Éva Tardos,et al.  Learning and Efficiency in Games with Dynamic Population , 2015, SODA.

[17]  Christos H. Papadimitriou,et al.  Worst-case Equilibria , 1999, STACS.

[18]  Thomas P. Hayes,et al.  High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.

[19]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[20]  Elias Koutsoupias,et al.  The price of anarchy of finite congestion games , 2005, STOC '05.

[21]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[22]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[23]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[24]  Wouter M. Koolen,et al.  Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.

[25]  Karthik Sridharan,et al.  Adaptive Online Learning , 2015, NIPS.

[26]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[27]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[28]  Gergely Neu,et al.  First-order regret bounds for combinatorial semi-bandits , 2015, COLT.

[29]  Éva Tardos,et al.  Composable and efficient mechanisms , 2012, STOC '13.