Mean-Field Learning: a Survey

In this paper we study iterative procedures for stationary equilibria in games with large number of players. Most of learning algorithms for games with continuous action spaces are limited to strict contraction best reply maps in which the Banach-Picard iteration converges with geometrical convergence rate. When the best reply map is not a contraction, Ishikawa-based learning is proposed. The algorithm is shown to behave well for Lipschitz continuous and pseudo-contractive maps. However, the convergence rate is still unsatisfactory. Several acceleration techniques are presented. We explain how cognitive users can improve the convergence rate based only on few number of measurements. The methodology provides nice properties in mean field games where the payoff function depends only on own-action and the mean of the mean-field (first moment mean-field games). A learning framework that exploits the structure of such games, called, mean-field learning, is proposed. The proposed mean-field learning framework is suitable not only for games but also for non-convex global optimization problems. Then, we introduce mean-field learning without feedback and examine the convergence to equilibria in beauty contest games, which have interesting applications in financial markets. Finally, we provide a fully distributed mean-field learning and its speedup versions for satisfactory solution in wireless networks. We illustrate the convergence rate improvement with numerical examples.

[1]  Wynn C. Stirling Satisficing Games and Decision Making: Locality , 2003 .

[2]  P. Wynn,et al.  Acceleration techniques for iterated vector and matrix problems : (mathematics of computation, _1_6(1962), nr 79, p 301-322) , 1962 .

[3]  Hamidou Tembine,et al.  Mean-field learning for satisfactory solutions , 2013, 52nd IEEE Conference on Decision and Control.

[4]  R. Kannan,et al.  Some results on fixed points , 1968 .

[5]  M. Shubik,et al.  Efficiency properties of strategies market games: An axiomatic approach , 1980 .

[6]  D. R. Smart Fixed Point Theorems , 1974 .

[7]  Wynn C. Stirling Satisficing Games and Decision Making: With Applications to Engineering and Computer Science , 2003 .

[8]  Athanasios V. Vasilakos,et al.  Game Dynamics and Cost of Learning in Heterogeneous 4G Networks , 2012, IEEE Journal on Selected Areas in Communications.

[9]  E. Zeidler,et al.  Fixed-point theorems , 1986 .

[10]  Abdel Rodríguez,et al.  Continuous Action Reinforcement Learning Automata - Performance and Convergence , 2011, ICAART.

[11]  H. Simon,et al.  Rational choice and the structure of the environment. , 1956, Psychological review.

[12]  S. Ishikawa Fixed points and iteration of a nonexpansive mapping in a Banach space , 1976 .

[13]  H. Peyton Young,et al.  Strategic Learning and Its Limits , 2004 .

[14]  D. F. Mayers,et al.  The deferred approach to the limit in ordinary differential equations , 1964, Comput. J..

[15]  G. V. R. Babu,et al.  Mann iteration converges faster than Ishikawa iteration for the class of Zamfirescu operators , 2006 .

[16]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[17]  Wynn C. Stirling,et al.  A theory of satisficing decisions and control , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[18]  Hamidou Tembine,et al.  Quality-Of-Service Provisioning in Decentralized Networks: A Satisfaction Equilibrium Approach , 2011, IEEE Journal of Selected Topics in Signal Processing.

[19]  A. C. Aitken XXV.—On Bernoulli's Numerical Solution of Algebraic Equations , 1927 .

[20]  Guoqiang Tian,et al.  The Existence of Equilibria in Games with Arbitrary Strategy Spaces and Payoffs: A Full Characterization , 2009 .

[21]  S. K. Chatterjea Fixed Point Theorems For A Sequence Of Mappings With Contractive Iterates , 1972 .

[22]  Timothy Gordon,et al.  Continuous action reinforcement learning applied to vehicle suspension control , 1997 .

[23]  A. Householder The numerical treatment of a single nonlinear equation , 1970 .

[24]  D. Shanks Non‐linear Transformations of Divergent and Slowly Convergent Sequences , 1955 .

[25]  Hamidou Tembine,et al.  Dynamic Robust Games in MIMO Systems , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  A. Mas-Colell,et al.  Efficiency Properties of Strategic Market Games: An Axiomatic Approach* , 1982 .

[27]  P. Wynn,et al.  On a device for computing the _{}(_{}) tranformation , 1956 .

[28]  Roberto A. Weber 'Learning' with no feedback in a competitive guessing game , 2003, Games Econ. Behav..

[29]  Roberto Cominetti,et al.  Author's Personal Copy Games and Economic Behavior a Payoff-based Learning Procedure and Its Application to Traffic Games , 2022 .

[30]  R. Aumann,et al.  Unraveling in Guessing Games : An Experimental Study , 2007 .

[31]  W. R. Mann,et al.  Mean value methods in iteration , 1953 .

[32]  Timothy Gordon,et al.  Continuous action reinforcement learning automata and their application to adaptive digital filter design , 2001 .

[33]  Dirk Ifenthaler,et al.  Stochastic Models of Learning , 2012 .

[34]  Tudor Zamfirescu,et al.  Fix point theorems in metric spaces , 1972 .

[35]  Miroslav Krstic,et al.  Stochastic Nash Equilibrium Seeking for Games with General Nonlinear Payoffs , 2011, SIAM J. Control. Optim..

[36]  Wynn C. Stirling,et al.  Satisficing Games , 1999, Inf. Sci..

[37]  S. Ishikawa Fixed points by a new iteration method , 1974 .

[38]  Gábor Lugosi,et al.  Learning correlated equilibria in games with compact sets of strategies , 2007, Games Econ. Behav..