Multi-Agent Adversarial Inverse Reinforcement Learning

Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with ground truth ones, and significantly outperforms prior methods in terms of policy imitation.

[1]  Stefano Ermon,et al.  Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[5]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[6]  R. Cogill,et al.  Multi-agent Inverse Reinforcement Learning for Zero-sum Games. , 2014 .

[7]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[8]  R. McKelvey,et al.  Quantal Response Equilibria for Extensive Form Games , 1998 .

[9]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[10]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[11]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12]  Edward H Ip,et al.  Behaviour of the Gibbs sampler when conditional distributions are potentially incompatible , 2015, Journal of statistical computation and simulation.

[13]  Yuchung J. Wang,et al.  Gibbs ensembles for nearly compatible and incompatible conditional models , 2011, Comput. Stat. Data Anal..

[14]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[15]  Stefano Ermon,et al.  Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[16]  Anind K. Dey,et al.  Maximum Causal Entropy Correlated Equilibria for Markov Games , 2011, Interactive Decision Theory and Game Theory.

[17]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[18]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[19]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[20]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[21]  Gergely V. Záruba,et al.  Inverse reinforcement learning for decentralized non-cooperative multiagent systems , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[22]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[23]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[24]  Laurent Jeanpierre,et al.  Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[25]  A. Dawid,et al.  Theory and applications of proper scoring rules , 2014, 1401.0398.

[26]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[27]  Geoffrey J. Gordon,et al.  No-regret learning in convex games , 2008, ICML '08.

[28]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[29]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[31]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32]  Sarit Kraus,et al.  Making friends on the fly: Cooperating with new teammates , 2017, Artif. Intell..

[33]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[34]  Prashant Doshi,et al.  Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[35]  Heinz Koeppl,et al.  Inverse Reinforcement Learning in Swarm Systems , 2016, AAMAS.

[36]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[37]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[38]  Amit Gandhi The Stochastic Response Dynamic : A New Approach to Learning and Computing Equilibrium in Continuous Games , 2005 .

[39]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[40]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[41]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[42]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[43]  Yisong Yue,et al.  Coordinated Multi-Agent Imitation Learning , 2017, ICML.

[44]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[45]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[46]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[47]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[48]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[49]  R. Aumann Correlated Equilibrium as an Expression of Bayesian Rationality Author ( s ) , 1987 .

[50]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[51]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[52]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[53]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[54]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[55]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[56]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[57]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[58]  Stephen C. Adams,et al.  Multi-agent Inverse Reinforcement Learning for General-sum Stochastic Games , 2018, ArXiv.