论文信息 - Generative Adversarial Imitation Learning

Generative Adversarial Imitation Learning

Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

Stefano Ermon | Jonathan Ho | Jonathan Ho | S. Ermon | Stefano Ermon

[1] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2] P. Millar. The minimax principle in asymptotic statistical theory , 1983 .

[3] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .

[4] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[5] Dean Pomerleau,et al. Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[6] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[7] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[8] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10] Martin J. Wainwright,et al. ON surrogate loss functions and f-divergences , 2005, math/0510521.

[11] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[13] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[14] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.

[15] David Silver,et al. Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[16] J. Andrew Bagnell,et al. Efficient Reductions for Imitation Learning , 2010, AISTATS.

[17] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[18] Sergey Levine,et al. Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[19] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[20] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21] Sergey Levine,et al. Continuous Inverse Optimal Control with Locally Optimal Examples , 2012, ICML.

[22] N. Bambos,et al. Infinite time horizon maximum causal entropy inverse reinforcement learning , 2014, 53rd IEEE Conference on Decision and Control.

[23] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[24] Sergey Levine,et al. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[25] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27] Alborz Geramifard,et al. RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..

[28] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[29] Stefano Ermon,et al. Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[30] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[31] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[32] Michael Bloem,et al. Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.