Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations

We consider the problem of representing a large population's behavior policy that drives the evolution of the population distribution over a discrete state space. A discrete time mean field game (MFG) is motivated as an interpretable model founded on game theory for understanding the aggregate effect of individual actions and predicting the temporal evolution of population distributions. We achieve a synthesis of MFG and Markov decision processes (MDP) by showing that a special MFG is reducible to an MDP. This enables us to broaden the scope of mean field game theory and infer MFG models of large real-world systems via deep inverse reinforcement learning. Our method learns both the reward function and forward dynamics of an MFG from real data, and we report the first empirical test of a mean field game model of a real-world social media population.

[1]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[2]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[3]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[4]  Andrew T. Perrin Social Media Usage: 2005-2015 , 2015 .

[5]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[6]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[7]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[8]  Elisabetta Carlini,et al.  A Fully Discrete Semi-Lagrangian Scheme for a First Order Mean Field Game Problem , 2012, SIAM J. Numer. Anal..

[9]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[10]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[11]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[12]  Shuang Li,et al.  COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution , 2015, NIPS.

[13]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[14]  Xin Liu,et al.  Action Time Sharing Policies for Ergodic Control of Markov Chains , 2012, SIAM J. Control. Optim..

[15]  Yves Achdou,et al.  Mean Field Games: Numerical Methods for the Planning Problem , 2012, SIAM J. Control. Optim..

[16]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[17]  Niloy Ganguly,et al.  Learning and Forecasting Opinion Dynamics in Social Networks , 2015, NIPS.

[18]  Dario Bauso,et al.  Opinion Dynamics and Stubbornness Via Multi-Population Mean-Field Games , 2016, J. Optim. Theory Appl..

[19]  A. Lachapelle,et al.  COMPUTATION OF MEAN FIELD EQUILIBRIA IN ECONOMICS , 2010 .

[20]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[21]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[22]  Olivier Guéant,et al.  Mean Field Games and Applications , 2011 .

[23]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[24]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  P. Howard,et al.  Opening Closed Regimes: What Was the Role of Social Media During the Arab Spring? , 2011 .

[27]  Olivier Guéant Existence and Uniqueness Result for Mean Field Games with Congestion Effect on Graphs , 2011, 1110.3442.

[28]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[29]  Olivier Guéant,et al.  A reference case for mean field games models , 2009 .

[30]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[31]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[32]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.