Learning in Games via Reinforcement and Regularization

We investigate a class of reinforcement learning dynamics where players adjust their strategies based on their actions' cumulative payoffs over time-specifically, by playing mixed strategies that maximize their expected cumulative payoff minus a regularization term. A widely studied example is exponential reinforcement learning, a process induced by an entropic regularization term which leads mixed strategies to evolve according to the replicator dynamics. However, in contrast to the class of regularization functions used to define smooth best responses in models of stochastic fictitious play, the functions used in this paper need not be infinitely steep at the boundary of the simplex; in fact, dropping this requirement gives rise to an important dichotomy between steep and nonsteep cases. In this general framework, we extend several properties of exponential learning, including the elimination of dominated strategies, the asymptotic stability of strict Nash equilibria, and the convergence of time-averaged trajectories in zero-sum games with an interior Nash equilibrium.

[1]  David S. Leslie,et al.  Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[2]  Rida Laraki,et al.  Higher order game dynamics , 2012, J. Econ. Theory.

[3]  Rida Laraki,et al.  Inertial Game Dynamics and Applications to Constrained Optimization , 2013, SIAM J. Control. Optim..

[4]  R. Rajendiran,et al.  Topological Spaces , 2019, A Physicist's Introduction to Algebraic Structures.

[5]  Jörgen W. Weibull,et al.  Evolutionary Game Theory , 1996 .

[6]  Josef Hofbauer,et al.  Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..

[7]  D. M. V. Hesteren Evolutionary Game Theory , 2017 .

[8]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[9]  A. Rustichini Optimal Properties of Stimulus-Response Learning Models* , 1999 .

[10]  William H. Sandholm,et al.  Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[11]  Krzysztof C. Kiwiel,et al.  Free-Steering Relaxation Methods for Problems with Strictly Convex Costs and Linear Constraints , 1997, Math. Oper. Res..

[12]  Karl Tuyls,et al.  An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[13]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[14]  E. Hopkins A Note on Best Response Dynamics , 1999 .

[15]  E. Akin Domination or equilibrium , 1980 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  E. Vandamme Stability and perfection of nash equilibria , 1987 .

[18]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[19]  D. Bayer,et al.  The nonlinear geometry of linear programming. II. Legendre transform coordinates and central trajectories , 1989 .

[20]  C. Robinson Dynamical Systems: Stability, Symbolic Dynamics, and Chaos , 1994 .

[21]  Aris L. Moustakas,et al.  The emergence of rational behavior in the presence of stochastic perturbations , 2009, 0906.2094.

[22]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[23]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[24]  L. Samuelson,et al.  Evolutionary Stability in Asymmetric Games , 1992 .

[25]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[26]  E. Hopkins Learning, Matching and Aggregation , 1995 .

[27]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[28]  Michael Wainwright On the Theory of Games of Strategy , 2016 .

[29]  Anna Nagurney,et al.  Projected Dynamical Systems in the Formulation, Stability Analysis, and Computation of Fixed-Demand Traffic Network Equilibria , 1997, Transp. Sci..

[30]  Sylvain Sorin,et al.  Exponential weight algorithm in continuous time , 2008, Math. Program..

[31]  Pierre Coucheney,et al.  Penalty-Regulated Dynamics and Robust Learning Procedures in Games , 2013, Math. Oper. Res..

[32]  William H. Sandholm,et al.  The projection dynamic and the geometry of population games , 2008, Games Econ. Behav..

[33]  John Nachbar “Evolutionary” selection dynamics in games: Convergence and limit properties , 1990 .

[34]  Martin Posch,et al.  Cycling in a stochastic learning algorithm for normal form games , 1997 .

[35]  Josef Hofbauer,et al.  Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[36]  D. Bayer,et al.  The Non-Linear Geometry of Linear Pro-gramming I: A?ne and projective scaling trajectories , 1989 .

[37]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[38]  MertikopoulosPanayotis,et al.  Learning in Games via Reinforcement and Regularization , 2016 .

[39]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[40]  Marc Teboulle,et al.  Barrier Operators and Associated Gradient-Like Dynamical Systems for Constrained Minimization Problems , 2003, SIAM J. Control. Optim..

[41]  D. Friedman EVOLUTIONARY GAMES IN ECONOMICS , 1991 .

[42]  Josef Hofbauer,et al.  Evolutionary Games and Population Dynamics , 1998 .

[43]  Roberto Cominetti,et al.  Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[44]  M. Benaïm Dynamics of stochastic approximation algorithms , 1999 .

[45]  Marc Harper,et al.  Escort Evolutionary Game Theory , 2009, ArXiv.

[46]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[47]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[48]  Martin Posch,et al.  Attainability of boundary points under reinforcement learning , 2005, Games Econ. Behav..

[49]  Sébastien Bubeck,et al.  Introduction to Online Optimization , 2011 .

[50]  E. Hopkins Two Competing Models of How People Learn in Games (first version) , 1999 .

[51]  A. Roth,et al.  Predicting How People Play Games: Reinforcement Learning in Experimental Games with Unique, Mixed Strategy Equilibria , 1998 .

[52]  J. Cross A Stochastic Learning Model of Economic Behavior , 1973 .

[53]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[54]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[55]  A. Fiacco Perturbed variations of penalty function methods , 1990 .

[56]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[57]  Panayotis Mertikopoulos,et al.  A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[58]  Felipe Alvarez,et al.  Hessian Riemannian Gradient Flows in Convex Programming , 2018, SIAM J. Control. Optim..

[59]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[60]  C. Tsallis Possible generalization of Boltzmann-Gibbs statistics , 1988 .

[61]  Andriy Zapechelnyuk,et al.  No-regret dynamics and fictitious play , 2012, J. Econ. Theory.

[62]  J. Hofbauer,et al.  Adaptive dynamics and evolutionary stability , 1990 .

[63]  I. Gilboa,et al.  Social Stability and Equilibrium , 1991 .

[64]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[65]  William H. Sandholm,et al.  The projection dynamic and the replicator dynamic , 2008, Games Econ. Behav..

[66]  P. Taylor,et al.  Evolutionarily Stable Strategies and Game Dynamics , 1978 .

[67]  Alan W. Beggs,et al.  On the convergence of reinforcement learning , 2005, J. Econ. Theory.