论文信息 - Penalty-Regulated Dynamics and Robust Learning Procedures in Games - 字舞流文

Penalty-Regulated Dynamics and Robust Learning Procedures in Games

Starting from a heuristic learning scheme for strategic N -person games, we derive a new class of continuous-time learning dynamics consisting of a replicator-like drift adjusted by a penalty term that renders the boundary of the game’s strategy space repelling. These penalty-regulated dynamics are equivalent to players keeping an exponentially discounted aggregate of their ongoing payoffs and then using a smooth best response to pick an action based on these performance scores. Owing to this inherent duality, the proposed dynamics satisfy a variant of the folk theorem of evolutionary game theory and they converge to (arbitrarily precise) approximations of Nash equilibria in potential games. Motivated by applications to traffic engineering, we exploit this duality further to design a discrete-time, payoff-based learning algorithm that retains these convergence properties and only requires players to observe their in-game payoffs. Moreover, the algorithm remains robust in the presence of stochastic perturbations and observation errors, and it does not require any synchronization between players.

Pierre Coucheney | Bruno Gaujal | Panayotis Mertikopoulos | P. Mertikopoulos | B. Gaujal | Pierre Coucheney

[1] David S. Leslie,et al. Individual Q-Learning in Normal Form Games , 2005, SIAM J. Control. Optim..

[2] Rida Laraki,et al. Higher order game dynamics , 2012, J. Econ. Theory.

[3] Rida Laraki,et al. Inertial Game Dynamics and Applications to Constrained Optimization , 2013, SIAM J. Control. Optim..

[4] L. Shapley,et al. REGULAR ARTICLEPotential Games , 1996 .

[5] M. T. Wasan. Stochastic Approximation , 1969 .

[6] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[7] S. Hart,et al. A Reinforcement Procedure Leading to Correlated Equilibrium , 2001 .

[8] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[9] Tilman Börgers,et al. Learning Through Reinforcement and Replicator Dynamics , 1997 .

[10] C. Leake. Discrete Choice Theory of Product Differentiation , 1995 .

[11] Josef Hofbauer,et al. Time Average Replicator and Best-Reply Dynamics , 2009, Math. Oper. Res..

[12] Aris L. Moustakas,et al. The emergence of rational behavior in the presence of stochastic perturbations , 2009, 0906.2094.

[13] D. Fudenberg,et al. Evolutionary Dynamics with Aggregate Shocks , 1992 .

[14] Roberto Cominetti,et al. Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[15] M. Benaïm. Dynamics of stochastic approximation algorithms , 1999 .

[16] E. Hopkins. Two Competing Models of How People Learn in Games (first version) , 1999 .

[17] J. Weibull,et al. Evolutionary Selection in Normal-Form Games , 1995 .

[18] Karl Tuyls,et al. An Evolutionary Dynamical Analysis of Multi-Agent Learning in Iterated Games , 2005, Autonomous Agents and Multi-Agent Systems.

[19] André de Palma,et al. Discrete Choice Theory of Product Differentiation , 1995 .

[20] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[21] P. Tarres,et al. When can the two-armed bandit algorithm be trusted? , 2004, math/0407128.

[22] John M. Lee. Introduction to Smooth Manifolds , 2002 .

[23] Günther Palm,et al. Evolutionary stable strategies and game dynamics for n-person games , 1984 .

[24] Felipe Alvarez,et al. Hessian Riemannian Gradient Flows in Convex Programming , 2018, SIAM J. Control. Optim..

[25] L. Shapley,et al. Potential Games , 1994 .

[26] E. Vandamme. Stability and perfection of nash equilibria , 1987 .

[27] A. Cabrales. Stochastic replicator dynamics , 2000 .

[28] William H. Sandholm,et al. ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[29] A. Rustichini. Optimal Properties of Stimulus-Response Learning Models* , 1999 .

[30] Eitan Altman,et al. A survey on networking games in telecommunications , 2006, Comput. Oper. Res..

[31] D. Leslie,et al. Asynchronous stochastic approximation with differential inclusions , 2011, 1112.2288.

[32] Mario Bravo. An Adjusted Payoff-Based Procedure for Normal Form Games , 2016, Math. Oper. Res..

[33] Sean P. Meyn,et al. The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[34] David S. Leslie,et al. Reinforcement learning in games , 2004 .

[35] Martin Posch,et al. Attainability of boundary points under reinforcement learning , 2005, Games Econ. Behav..

[36] D. McFadden. MEASUREMENT OF URBAN TRAVEL DEMAND , 1974 .

[37] J. Weibull,et al. Nash Equilibrium and Evolution by Imitation , 1994 .

[38] J. Kadane. Structural Analysis of Discrete Data with Econometric Applications , 1984 .

[39] H. Peyton Young,et al. Learning by trial and error , 2009, Games Econ. Behav..

[40] F. Downton. Stochastic Approximation , 1969, Nature.

[41] Sylvain Sorin,et al. Exponential weight algorithm in continuous time , 2008, Math. Program..

[42] R. Zecchina,et al. Exact solution of a modified El Farol's bar problem: Efficiency and the role of market impact , 1999, cond-mat/9908480.

[43] Josef Hofbauer,et al. Evolutionary Games and Population Dynamics , 1998 .

[44] D. McFadden. Econometric Models of Probabilistic Choice , 1981 .

[45] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[46] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[47] Aris L. Moustakas,et al. Matrix exponential learning: Distributed optimization in MIMO systems , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[48] T. Sharia. Truncated stochastic approximation with moving bounds: convergence , 2010, 1101.0031.

[49] Vincent Conitzer,et al. Computing Shapley Values, Manipulating Value Division Schemes, and Checking Core Membership in Multi-Issue Domains , 2004, AAAI.

[50] V. V. Phansalkar,et al. Decentralized Learning of Nash Equilibria in Multi-Person Stochastic Games With Incomplete Information , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[51] William H. Sandholm,et al. Population Games And Evolutionary Dynamics , 2010, Economic learning and social evolution.

[52] C. Plott. The Rational Foundations of Economic Behavior , 2008 .

[53] Roberto Cominetti,et al. Asymptotic Analysis for Penalty and Barrier Methods in Convex and Linear Programming , 1997, Math. Oper. Res..