Learning and Efficiency in Games with Dynamic Population

We study the quality of outcomes in repeated games when the population of players is dynamically changing, and where participants use learning algorithms to adapt to the dynamic environment. Price of anarchy has originally been introduced to study the Nash equilibria of one-shot games. Many games studied in computer science, such as packet routing or ad-auctions, are played repeatedly. Given the computational hardness of Nash equilibria, an attractive alternative in repeated game settings is that players use no-regret learning algorithms. The price of total anarchy considers the quality of such learning outcomes, assuming a steady environment and player population, which is rarely the case in online settings. In this paper we analyze efficiency of repeated games in dynamically changing environments. An important trait of learning behavior is its versatility to changing environments, assuming that the learning method used is adaptive, i.e., doesn't rely too heavily on experience from the distant past. We show that, in large classes of games, if players choose their strategies in a way that guarantees low adaptive regret, high social welfare is ensured, even under very frequent changes. A main technical tool for our analysis is the existence of a solution to the welfare maximization problem that is both close to optimal and relatively stable over time. Such a solution serves as a benchmark in the efficiency analysis of learning outcomes. We show that such a stable and close to optimal solution exists for many problems, even in cases when the exact optimal solution can be very unstable. We further show that a sufficient condition on the existence of stable outcomes is the existence of a differentially private algorithm for the welfare maximization problem. Hence, we draw a strong connection between differential privacy and high efficiency of learning outcomes in frequently changing repeated games. We demonstrate our techniques by focusing on two classes of games as examples: independent item auctions and congestion games. In both applications we show that adaptive learning guarantees high social welfare even with surprisingly high churn in the player population.

[1]  Annamária Kovács,et al.  Bayesian Combinatorial Auctions , 2008, ICALP.

[2]  Frank Kelly,et al.  Charging and rate control for elastic traffic , 1997, Eur. Trans. Telecommun..

[3]  Christos H. Papadimitriou,et al.  Worst-case Equilibria , 1999, STACS.

[4]  Constantinos Daskalakis,et al.  Learning in Auctions: Regret is Hard, Envy is Easy , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[6]  Aaron Roth,et al.  Inducing Approximately Optimal Flow Using Truthful Mediators , 2015, EC.

[7]  Aaron Roth,et al.  Mechanism design in large games: incentives and privacy , 2012, ITCS.

[8]  Eric Maskin,et al.  Markov Perfect Equilibrium: I. Observable Actions , 2001, J. Econ. Theory.

[9]  Yishay Mansour,et al.  From External to Internal Regret , 2005, J. Mach. Learn. Res..

[10]  Elias Koutsoupias,et al.  The price of anarchy of finite congestion games , 2005, STOC '05.

[11]  Daniel Lehmann,et al.  Combinatorial auctions with decreasing marginal utilities , 2001, EC '01.

[12]  Tim Roughgarden,et al.  The price of anarchy in large games , 2015, STOC.

[13]  John N. Tsitsiklis,et al.  Parameterized Supply Function Bidding: Equilibrium and Efficiency , 2011, Oper. Res..

[14]  D. Fudenberg,et al.  Perfect Bayesian equilibrium and sequential equilibrium , 1991 .

[15]  Benjamin Van Roy,et al.  Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games , 2005, NIPS.

[16]  ERAN SHMAYA LEARNING AND STABILITY IN BIG UNCERTAIN GAMES , 2015 .

[17]  Amit Daniely,et al.  Strongly Adaptive Online Learning , 2015, ICML.

[18]  Ramesh Johari,et al.  Mean Field Equilibrium in Dynamic Games with Strategic Complementarities , 2013, Oper. Res..

[19]  T. Başar,et al.  Dynamic Noncooperative Game Theory, 2nd Edition , 1998 .

[20]  Yang Cai,et al.  Simultaneous bayesian auctions and computational complexity , 2014, EC.

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[22]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[23]  José R. Correa,et al.  Sloan School of Management Working Paper 4319-03 June 2003 Selfish Routing in Capacitated Networks , 2022 .

[24]  Martin Gairing,et al.  Exact Price of Anarchy for Polynomial Congestion Games , 2006, SIAM J. Comput..

[25]  Justin Hsu,et al.  Jointly Private Convex Programming , 2014, SODA.

[26]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[27]  Omar Besbes,et al.  Repeated Auctions with Budgets in Ad Exchanges: Approximations and Design , 2014 .

[28]  Seshadhri Comandur,et al.  Adaptive Algorithms for Online Decision Problems , 2007, Electron. Colloquium Comput. Complex..

[29]  R. Dolan Incentive mechanisms for priority queuing problems , 1978 .

[30]  Tim Roughgarden,et al.  How bad is selfish routing? , 2002, JACM.

[31]  Éva Tardos,et al.  Econometrics for Learning Agents , 2015, EC.

[32]  Yoram Singer,et al.  Using and combining predictors that specialize , 1997, STOC '97.

[33]  Éva Tardos,et al.  Learning in Games: Robustness of Fast Convergence , 2016, NIPS.

[34]  Ngo Van Long,et al.  A Survey Of Dynamic Games In Economics , 2010 .

[35]  Renato Paes Leme,et al.  Bounding the inefficiency of outcomes in generalized second price auctions , 2012, J. Econ. Theory.

[36]  Benjamin Van Roy,et al.  Computational Methods for Oblivious Equilibrium , 2010, Oper. Res..

[37]  David C. Parkes,et al.  Efficient Mechanisms with Dynamic Populations and Dynamic Types , 2009 .

[38]  Haipeng Luo,et al.  Achieving All with No Parameters: Adaptive NormalHedge , 2015, ArXiv.

[39]  Vasilis Syrgkanis,et al.  Bayesian Games and the Smoothness Framework , 2012, ArXiv.

[40]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[41]  Ramesh Johari,et al.  Equilibria of Dynamic Games with Many Players: Existence, Approximation, and Market Structure , 2010, J. Econ. Theory.

[42]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[43]  John N. Tsitsiklis,et al.  Efficiency loss in a network resource allocation game: the case of elastic supply , 2004, IEEE Transactions on Automatic Control.

[44]  Yossi Azar,et al.  The Price of Routing Unsplittable Flow , 2005, STOC '05.

[45]  Constantinos Daskalakis,et al.  Nash equilibria: Complexity, symmetries, and approximation , 2009, Comput. Sci. Rev..

[46]  Drew Fudenberg,et al.  Recency, records and recaps: learning and non-equilibrium behavior in a simple decision problem , 2014, EC.

[47]  Mukund Sundararajan,et al.  Mean Field Equilibria of Dynamic Auctions with Learning , 2014, Manag. Sci..

[48]  Noam Nisan,et al.  An Experimental Evaluation of Regret-Based Econometrics , 2016, WWW.

[49]  Avrim Blum,et al.  Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games , 2006, PODC '06.

[50]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[51]  Tim Roughgarden,et al.  Algorithmic Game Theory , 2007 .

[52]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[53]  Benjamin Van Roy,et al.  MARKOV PERFECT INDUSTRY DYNAMICS WITH MANY FIRMS , 2008 .

[54]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[55]  Sampath Kannan,et al.  Private Pareto Optimal Exchange , 2014, EC.

[56]  S. Hart,et al.  Simple Adaptive Strategies: From Regret-matching To Uncoupled Dynamics , 2013 .

[57]  Neil Olver,et al.  The Price of Anarchy and a Priority-Based Model of Routing , 2006 .

[58]  Éva Tardos,et al.  Composable and efficient mechanisms , 2012, STOC '13.

[59]  Gabriel Y. Weintraub,et al.  Repeated Auctions with Budgets in Ad Exchanges: Approximations and Design , 2014, Manag. Sci..

[60]  D. Bergemann,et al.  Dynamic Auctions: A Survey , 2010 .

[61]  S. Morris COWLES FOUNDATION FOR RESEARCH IN ECONOMICS , 2001 .

[62]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[63]  Nicolò Cesa-Bianchi,et al.  Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.

[64]  Tim Roughgarden,et al.  The price of anarchy is independent of the network topology , 2002, STOC '02.

[65]  Tim Roughgarden,et al.  Private matchings and allocations , 2013, SIAM J. Comput..

[66]  Renato Paes Leme,et al.  Pure and Bayes-Nash Price of Anarchy for Generalized Second Price Auction , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[67]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[68]  Mohammad Taghi Hajiaghayi,et al.  Regret minimization and the price of total anarchy , 2008, STOC.

[69]  Tim Roughgarden,et al.  The price of anarchy in games of incomplete information , 2012, SECO.

[70]  Richard E. Ladner,et al.  Proceedings of the fortieth annual ACM symposium on Theory of computing , 2008, STOC 2008.

[71]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[72]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[73]  Ehud Lehrer,et al.  A wide range no-regret theorem , 2003, Games Econ. Behav..

[74]  David C. Parkes,et al.  An MDP-Based Approach to Online Mechanism Design , 2003, NIPS.