Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning

The recent mean field game (MFG) formalism facilitates otherwise intractable computation of approximate Nash equilibria in many-agent settings. In this paper, we consider discrete-time finite MFGs subject to finite-horizon objectives. We show that all discrete-time finite MFGs with non-constant fixed point operators fail to be contractive as typically assumed in existing MFG literature, barring convergence via fixed point iteration. Instead, we incorporate entropy-regularization and Boltzmann policies into the fixed point iteration. As a result, we obtain provable convergence to approximate fixed points where existing methods fail, and reach the original goal of approximate Nash equilibria. All proposed methods are evaluated with respect to their exploitability, on both instructive examples with tractable exact solutions and high-dimensional problems where exact methods become intractable. In high-dimensional scenarios, we apply established deep reinforcement learning methods and empirically combine fictitious play with our approximations.

[1]  Romuald Elie,et al.  Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications , 2020, NeurIPS.

[2]  Can Deha Kariksiz,et al.  Q-Learning in Regularized Mean-field Games , 2020, Dynamic Games and Applications.

[3]  Peter E. Caines,et al.  Graphon Mean Field Games and the GMFG Equations: ε-Nash Equilibria , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[4]  Can Deha Kariksiz,et al.  Value Iteration Algorithm for Mean-field Games , 2019, Syst. Control. Lett..

[5]  J. Pérolat,et al.  Approximate Fictitious Play for Mean Field Games , 2019, ArXiv.

[6]  Jan Peters,et al.  Entropic Regularization of Markov Decision Processes , 2019, Entropy.

[7]  Enrique Munoz de Cote,et al.  Decentralised Learning in Systems with Many, Many Strategic Agents , 2018, AAAI.

[8]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[9]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[10]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[11]  Maxim Raginsky,et al.  Approximate Nash Equilibria in Partially Observed Stochastic Games with Mean-Field Interactions , 2017, Math. Oper. Res..

[12]  Peter E. Caines,et al.  A Mean Field Game Computational Methodology for Decentralized Cellular Network Optimization , 2017, IEEE Transactions on Control Systems Technology.

[13]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[14]  Tamer Basar,et al.  Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).

[15]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[16]  Gabriel Turinici,et al.  Individual Vaccination as Nash Equilibrium in a SIR Model with Application to the 2009–2010 Influenza A (H1N1) Epidemic in France , 2015, Bulletin of Mathematical Biology.

[17]  Pierre Cardaliaguet,et al.  Learning in mean field games: The fictitious play , 2015, 1507.06280.

[18]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Peter E. Caines,et al.  Epsilon-Nash Mean Field Game Theory for Nonlinear Stochastic Dynamical Systems with Major and Minor Agents , 2012, SIAM J. Control. Optim..

[21]  M. Dufwenberg Game theory. , 2011, Wiley interdisciplinary reviews. Cognitive science.

[22]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[23]  P. Lions,et al.  Mean field games , 2007 .

[24]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[25]  Paul W. Goldberg,et al.  The complexity of computing a Nash equilibrium , 2006, STOC '06.

[26]  M. Puterman Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[27]  Rousslan Fernand Julien Dossa,et al.  CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms , 2022, J. Mach. Learn. Res..

[28]  Arman C. Kizilkale,et al.  Collective Target Tracking Mean Field Control for Markovian Jump-Driven Models of Electric Water Heating Loads , 2014 .

[29]  Olivier Guéant,et al.  Mean Field Games and Applications , 2011 .

[30]  S. Vajda Some topics in two-person games , 1971 .

[31]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .