论文信息 - Task-Relevant Adversarial Imitation Learning

Task-Relevant Adversarial Imitation Learning

We show that a critical problem in adversarial imitation from high-dimensional sensory data is the tendency of discriminator networks to distinguish agent and expert behaviour using task-irrelevant features beyond the control of the agent. We analyze this problem in detail and propose a solution as well as several baselines that outperform standard Generative Adversarial Imitation Learning (GAIL). Our proposed solution, Task-Relevant Adversarial Imitation Learning (TRAIL), uses a constrained optimization objective to overcome task-irrelevant features. Comprehensive experiments show that TRAIL can solve challenging manipulation tasks from pixels by imitating human operators, where other agents such as behaviour cloning (BC), standard GAIL, improved GAIL variants including our newly proposed baselines, and Deterministic Policy Gradients from Demonstrations (DPGfD) fail to find solutions, even when the other agents have access to task reward.

[1] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[2] Alexandros Kalousis,et al. Sample-Efficient Imitation Learning via Generative Adversarial Nets , 2018, AISTATS.

[3] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[4] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[5] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[6] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7] Alexander Novikov,et al. Visual Imitation with a Minimal Adversary , 2018 .

[8] Martin A. Riedmiller,et al. Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[9] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[10] Rémi Munos,et al. Observe and Look Further: Achieving Consistent Performance on Atari , 2018, ArXiv.

[11] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[12] Yasuharu Koike,et al. PII: S0893-6080(96)00043-3 , 1997 .

[13] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[14] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[15] Stefano Ermon,et al. InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[16] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[17] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[18] Mitsuo Kawato,et al. Teaching by Showing in Kendama Based on Optimization Principle , 1994 .