论文信息 - Multi-Agent Adversarial Inverse Reinforcement Learning

Multi-Agent Adversarial Inverse Reinforcement Learning

Reinforcement learning agents are prone to undesired behaviors due to reward mis-specification. Finding a set of reward functions to properly guide agent behaviors is particularly challenging in multi-agent scenarios. Inverse reinforcement learning provides a framework to automatically acquire suitable reward functions from expert demonstrations. Its extension to multi-agent settings, however, is difficult due to the more complex notions of rational behaviors. In this paper, we propose MA-AIRL, a new framework for multi-agent inverse reinforcement learning, which is effective and scalable for Markov games with high-dimensional state-action space and unknown dynamics. We derive our algorithm based on a new solution concept and maximum pseudolikelihood estimation within an adversarial reward learning framework. In the experiments, we demonstrate that MA-AIRL can recover reward functions that are highly correlated with ground truth ones, and significantly outperforms prior methods in terms of policy imitation.

[1] Stefano Ermon,et al. Multi-Agent Generative Adversarial Imitation Learning , 2018, NeurIPS.

[2] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[5] Anca D. Dragan,et al. Inverse Reward Design , 2017, NIPS.

[6] R. Cogill,et al. Multi-agent Inverse Reinforcement Learning for Zero-sum Games. , 2014 .

[7] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[8] R. McKelvey,et al. Quantal Response Equilibria for Extensive Form Games , 1998 .

[9] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[10] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[11] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[12] Edward H Ip,et al. Behaviour of the Gibbs sampler when conditional distributions are potentially incompatible , 2015, Journal of statistical computation and simulation.

[13] Yuchung J. Wang,et al. Gibbs ensembles for nearly compatible and incompatible conditional models , 2011, Comput. Stat. Data Anal..

[14] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[15] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[16] Anind K. Dey,et al. Maximum Causal Entropy Correlated Equilibria for Markov Games , 2011, Interactive Decision Theory and Game Theory.

[17] B. Arnold,et al. Compatible Conditional Distributions , 1989 .

[18] Sam Devlin,et al. Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[19] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[20] David Maxwell Chickering,et al. Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[21] Gergely V. Záruba,et al. Inverse reinforcement learning for decentralized non-cooperative multiagent systems , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[22] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[23] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[24] Laurent Jeanpierre,et al. Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes , 2012, AAAI.

[25] A. Dawid,et al. Theory and applications of proper scoring rules , 2014, 1401.0398.

[26] Kevin Waugh,et al. Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[27] Geoffrey J. Gordon,et al. No-regret learning in convex games , 2008, ICML '08.

[28] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[29] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30] Kristian Kersting,et al. Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[31] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[32] Sarit Kraus,et al. Making friends on the fly: Cooperating with new teammates , 2017, Artif. Intell..

[33] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[34] Prashant Doshi,et al. Multi-robot inverse reinforcement learning under occlusion with interactions , 2014, AAMAS.

[35] Heinz Koeppl,et al. Inverse Reinforcement Learning in Swarm Systems , 2016, AAMAS.

[36] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[37] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[38] Amit Gandhi. The Stochastic Response Dynamic : A New Approach to Learning and Computing Equilibrium in Continuous Games , 2005 .

[39] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[40] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[41] E. L. Lehmann,et al. Theory of point estimation , 1950 .

[42] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.