Q-Learning in Regularized Mean-field Games

In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. The game is defined by adding a regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. This regularization term in general makes reinforcement learning algorithm more robust with improved exploration. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.

[1]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[2]  Ali Devran Kara,et al.  Robustness to Incorrect System Models in Stochastic Control , 2018, SIAM J. Control. Optim..

[3]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[4]  Eitan Altman,et al.  Stationary Anonymous Sequential Games with Undiscounted Rewards , 2011, Journal of Optimization Theory and Applications.

[5]  A. Bensoussan,et al.  Mean Field Games and Mean Field Type Control Theory , 2013 .

[6]  Naci Saldi,et al.  Discrete-time average-cost mean-field games on Polish spaces , 2019, ArXiv.

[7]  Serdar Yüksel,et al.  Robustness to incorrect priors in partially observed stochastic control , 2018, SIAM J. Control. Optim..

[8]  Ramesh Johari,et al.  Equilibria of Dynamic Games with Many Players: Existence, Approximation, and Market Structure , 2010, J. Econ. Theory.

[9]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[10]  Diogo A. Gomes,et al.  Mean Field Games Models—A Brief Survey , 2013, Dynamic Games and Applications.

[11]  Naci Saldi,et al.  Value Iteration Algorithm for Mean-field Games , 2019, ArXiv.

[12]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[13]  Yongxin Chen,et al.  Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games , 2019, ICLR.

[14]  Tamer Basar,et al.  Discrete-time decentralized control using the risk-sensitive performance criterion in the large population regime: A mean field approach , 2015, 2015 American Control Conference (ACC).

[15]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[16]  Hans-Otto Georgii,et al.  Gibbs Measures and Phase Transitions , 1988 .

[17]  Hongyuan Zha,et al.  Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations , 2017, ICLR 2018.

[18]  Girish N. Nair,et al.  Linear-quadratic-Gaussian mean field games under high rate quantization , 2013, 52nd IEEE Conference on Decision and Control.

[19]  Minyi Huang,et al.  Large-Population LQG Games Involving a Major Player: The Nash Certainty Equivalence Principle , 2009, SIAM J. Control. Optim..

[20]  Tamer Basar,et al.  Robust mean field games for coupled Markov jump linear systems , 2016, Int. J. Control.

[21]  Tamer Basar,et al.  Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Quanyan Zhu,et al.  Risk-Sensitive Mean-Field Games , 2012, IEEE Transactions on Automatic Control.

[24]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[25]  Hongyuan Zha,et al.  Learning Deep Mean Field Games for Modeling Large Population Behavior , 2017, ICLR.

[26]  Minyi Huang,et al.  Large-Population Cost-Coupled LQG Problems With Nonuniform Agents: Individual-Mass Behavior and Decentralized $\varepsilon$-Nash Equilibria , 2007, IEEE Transactions on Automatic Control.

[27]  P. Lions,et al.  Mean field games , 2007 .

[28]  Piotr Więcek,et al.  Discrete-Time Ergodic Mean-Field Games with Average Reward on Compact Spaces , 2019, Dynamic Games and Applications.

[29]  Naci Saldi,et al.  Fitted Q-Learning in Mean-field Games , 2019, ArXiv.

[30]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[31]  Matthieu Geist,et al.  A Theory of Regularized Markov Decision Processes , 2019, ICML.

[32]  Tamer Basar,et al.  Markov-Nash Equilibria in Mean-Field Games with Discounted Cost , 2018, SIAM J. Control. Optim..

[33]  Maxim Raginsky,et al.  Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions , 2017, Math. Oper. Res..

[34]  Mathieu Lauriere,et al.  Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods , 2019, ArXiv.

[35]  K. Ramanan,et al.  Concentration Inequalities for Dependent Random Variables via the Martingale Method , 2006, math/0609835.

[36]  René Carmona,et al.  Probabilistic Analysis of Mean-field Games , 2013 .

[37]  Xun Li,et al.  Discrete time mean-field stochastic linear-quadratic optimal control problems , 2013, Autom..