Penalty-Regulated Dynamics and Robust Learning Procedures in Games

Starting from a heuristic learning scheme for strategic N -person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game’s strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their ongoing payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm that retains these convergence properties and only requires players to observe their in-game payoffs. Moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.

[1]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[2]  Rida Laraki,et al.  Higher order game dynamics , 2012, J. Econ. Theory.

[3]  Rida Laraki,et al.  Inertial Game Dynamics and Applications to Constrained Optimization , 2013, SIAM J. Control. Optim..

[4]  L. Shapley,et al.  REGULAR ARTICLEPotential Games , 1996 .

[5]  M. T. Wasan Stochastic Approximation , 1969 .

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  S. Hart,et al.  A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[8]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[9]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[10]  C. Leake Discrete Choice Theory of Product Differentiation , 1995 .

[11]  Josef Hofbauer,et al.  Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..

[12]  Aris L. Moustakas,et al.  The emergence of rational behavior in the presence of stochastic perturbations , 2009, 0906.2094.

[13]  D. Fudenberg,et al.  Evolutionary Dynamics with Aggregate Shocks , 1992 .

[14]  Roberto Cominetti,et al.  Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[15]  M. Benaïm Dynamics of stochastic approximation algorithms , 1999 .

[16]  E. Hopkins Two Competing Models of How People Learn in Games (first version) , 1999 .

[17]  J. Weibull,et al.  Evolutionary Selection in Normal-Form Games , 1995 .

[18]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[19]  André de Palma,et al.  Discrete Choice Theory of Product Differentiation , 1995 .

[20]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[21]  P. Tarres,et al.  When can the two-armed bandit algorithm be trusted? , 2004, math/0407128.

[22]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[23]  Günther Palm,et al.  Evolutionary stable strategies and game dynamics for n-person games , 1984 .

[24]  Felipe Alvarez,et al.  Hessian Riemannian Gradient Flows in Convex Programming , 2018, SIAM J. Control. Optim..

[25]  L. Shapley,et al.  Potential Games , 1994 .

[26]  E. Vandamme Stability and perfection of nash equilibria , 1987 .

[27]  A. Cabrales Stochastic replicator dynamics , 2000 .

[28]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[29]  A. Rustichini Optimal Properties of Stimulus-Response Learning Models* , 1999 .

[30]  Eitan Altman,et al.  A survey on networking games in telecommunications , 2006, Comput. Oper. Res..

[31]  D. Leslie,et al.  Asynchronous stochastic approximation with differential inclusions , 2011, 1112.2288.

[32]  Mario Bravo An Adjusted Payoff-Based Procedure for Normal Form Games , 2016, Math. Oper. Res..

[33]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[34]  David S. Leslie,et al.  Reinforcement learning in games , 2004 .

[35]  Martin Posch,et al.  Attainability of boundary points under reinforcement learning , 2005, Games Econ. Behav..

[36]  D. McFadden MEASUREMENT OF URBAN TRAVEL DEMAND , 1974 .

[37]  J. Weibull,et al.  Nash Equilibrium and Evolution by Imitation , 1994 .

[38]  J. Kadane Structural Analysis of Discrete Data with Econometric Applications , 1984 .

[39]  H. Peyton Young,et al.  Learning by trial and error , 2009, Games Econ. Behav..

[40]  F. Downton Stochastic Approximation , 1969, Nature.

[41]  Sylvain Sorin,et al.  Exponential weight algorithm in continuous time , 2008, Math. Program..

[42]  R. Zecchina,et al.  Exact solution of a modified El Farol's bar problem: Efficiency and the role of market impact , 1999, cond-mat/9908480.

[43]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[44]  D. McFadden Econometric Models of Probabilistic Choice , 1981 .

[45]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[46]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[47]  Aris L. Moustakas,et al.  Matrix exponential learning: Distributed optimization in MIMO systems , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[48]  T. Sharia Truncated stochastic approximation with moving bounds: convergence , 2010, 1101.0031.

[49]  Vincent Conitzer,et al.  Computing Shapley Values, Manipulating Value Division Schemes, and Checking Core Membership in Multi-Issue Domains , 2004, AAAI.

[50]  V. V. Phansalkar,et al.  Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[51]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[52]  C. Plott The Rational Foundations of Economic Behavior , 2008 .

[53]  Roberto Cominetti,et al.  Asymptotic Analysis for Penalty and Barrier Methods in Convex and Linear Programming , 1997, Math. Oper. Res..