Maximum Entropy Inverse Reinforcement Learning for Mean Field Games

Mean field games (MFG) facilitate the otherwise intractable reinforcement learning (RL) in large-scale multi-agent systems (MAS), through reducing interplays among agents to those between a representative individual agent and the mass of the population. While, RL agents are notoriously prone to unexpected behaviours due to reward mis-specification. This problem is exacerbated by an expanding scale of MAS. Inverse reinforcement learning (IRL) provides a framework to automatically acquire proper reward functions from expert demonstrations. Extending IRL to MFG, however, is challenging due to the complex notion of mean-field-type equilibria and the coupling between agent-level and population-level dynamics. To this end, we propose mean field inverse reinforcement learning (MFIRL), a novel model-free IRL framework for MFG. We derive the algorithm based on a new equilibrium concept that incorporates entropy regularization, and the maximum entropy IRL framework. Experimental results on simulated environments demonstrate that MFIRL is sample efficient and can accurately recover the ground-truth reward functions, compared to the state-of-the-art method.

[1]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[2]  Gergely V. Záruba,et al.  Inverse reinforcement learning for decentralized non-cooperative multiagent systems , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  R. Carmona,et al.  Control of McKean–Vlasov dynamics versus mean field games , 2012, 1210.5771.

[5]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[6]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[7]  Lantao Yu,et al.  Multi-Agent Adversarial Inverse Reinforcement Learning , 2019, ICML.

[8]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[9]  Daniel Dewey,et al.  Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[10]  Sean P. Meyn,et al.  Learning in Mean-Field Games , 2014, IEEE Transactions on Automatic Control.

[11]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[12]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Ah Reum Kang,et al.  Analysis of Game Bot's Behavioral Characteristics in Social Interaction Networks of MMORPG , 2015, Comput. Commun. Rev..

[14]  Aditya Mahajan,et al.  Reinforcement Learning in Stationary Mean-field Games , 2019, AAMAS.

[15]  P. Caines,et al.  Individual and mass behaviour in large population stochastic wireless power control problems: centralized and Nash equilibrium solutions , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[16]  Tamer Basar,et al.  Markov-Nash equilibria in mean-field games with discounted cost , 2016, 2017 American Control Conference (ACC).

[17]  Ana L. C. Bazzan,et al.  Opportunities for multiagent systems and multiagent reinforcement learning in traffic control , 2009, Autonomous Agents and Multi-Agent Systems.

[18]  Sarit Kraus,et al.  Making friends on the fly: Cooperating with new teammates , 2017, Artif. Intell..

[19]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[20]  Yan Ma,et al.  Mean field stochastic games with binary actions: Stationary threshold policies , 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC).

[21]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[22]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[23]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[24]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[25]  R. Cogill,et al.  Multi-agent Inverse Reinforcement Learning for Zero-sum Games. , 2014 .

[26]  P. Lions,et al.  Mean field games , 2007 .

[27]  F. Delarue,et al.  Selection of equilibria in a linear quadratic mean-field game , 2018, Stochastic Processes and their Applications.

[28]  Benjamin Van Roy,et al.  Computational Methods for Oblivious Equilibrium , 2010, Oper. Res..

[29]  Heinz Koeppl,et al.  Inverse Reinforcement Learning in Swarm Systems , 2016, AAMAS.

[30]  Yan Ma,et al.  Mean field stochastic games: Monotone costs and threshold policies , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[31]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[32]  Peter E. Caines,et al.  Large population stochastic dynamic games: closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle , 2006, Commun. Inf. Syst..

[33]  Hongyuan Zha,et al.  Learning Deep Mean Field Games for Modeling Large Population Behavior , 2017, ICLR.

[34]  R. Carmona,et al.  Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning , 2019, The Annals of Applied Probability.

[35]  Romuald Elie,et al.  On the Convergence of Model Free Learning in Mean Field Games , 2020, AAAI.

[36]  Martino Bardi,et al.  On non-uniqueness and uniqueness of solutions in finite-horizon Mean Field Games , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[37]  D. Gomes,et al.  Discrete Time, Finite State Space Mean Field Games , 2010 .

[38]  Giorgio Ferrari,et al.  Submodular mean field games: Existence and approximation of solutions , 2019, The Annals of Applied Probability.

[39]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[40]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[41]  Prashant Doshi,et al.  Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[42]  Heinz Koeppl,et al.  Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning , 2021, AISTATS.

[43]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[44]  S. Vajda Some topics in two-person games , 1971 .