Hindsight Generative Adversarial Imitation Learning

Compared to reinforcement learning, imitation learning (IL) is a powerful paradigm for training agents to learn control policies efficiently from expert demonstrations. However, in most cases, obtaining demonstration data is costly and laborious, which poses a significant challenge in some scenarios. A promising alternative is to train agent learning skills via imitation learning without expert demonstrations, which, to some extent, would extremely expand imitation learning areas. To achieve such expectation, in this paper, we propose Hindsight Generative Adversarial Imitation Learning (HGAIL) algorithm, with the aim of achieving imitation learning satisfying no need of demonstrations. Combining hindsight idea with the generative adversarial imitation learning (GAIL) framework, we realize implementing imitation learning successfully in cases of expert demonstration data are not available. Experiments show that the proposed method can train policies showing comparable performance to current imitation learning methods. Further more, HGAIL essentially endows curriculum learning mechanism which is critical for learning policies.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[3]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[4]  Marcin Andrychowicz,et al.  Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research , 2018, ArXiv.

[5]  Michael Bloem,et al.  Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning , 2014, IEEE Transactions on Automatic Control.

[6]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Kee-Eung Kim,et al.  Imitation Learning via Kernel Mean Embedding , 2018, AAAI.

[9]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[10]  Boqing Gong,et al.  DHER: Hindsight Experience Replay for Dynamic Goals , 2018, ICLR.

[11]  Pat Langley,et al.  Learning hierarchical task networks by observation , 2006, ICML.

[12]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[13]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[14]  Ken Goldberg,et al.  Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Volker Tresp,et al.  Energy-Based Hindsight Experience Prioritization , 2018, CoRL.

[17]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[18]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[19]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[20]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[21]  Michael C. Yip,et al.  Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[22]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[23]  Volker Tresp,et al.  Curiosity-Driven Experience Prioritization via Density Estimation , 2018, ArXiv.

[24]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[25]  Silvio Savarese,et al.  Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[27]  Satinder Singh,et al.  Generative Adversarial Self-Imitation Learning , 2018, ArXiv.

[28]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[29]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[30]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[31]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[32]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[33]  Wolfram Burgard,et al.  Learning driving styles for autonomous vehicles from demonstration , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[35]  Tianfu Wu,et al.  ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience Replay , 2018, ArXiv.

[36]  B. Fischhoff,et al.  Hindsight is not equal to foresight: The effect of outcome knowledge on judgment under uncertainty. , 1975 .

[37]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[38]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[39]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[40]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[41]  Satinder Singh,et al.  Self-Imitation Learning , 2018, ICML.

[42]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[43]  Qiang Liu,et al.  Learning Self-Imitating Diverse Policies , 2018, ICLR.

[44]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).