Robust Adversarial Imitation Learning via Adaptively-Selected Demonstrations

The agent in imitation learning (IL) is expected to mimic the behavior of the expert. Its performance relies highly on the quality of given expert demonstrations. However, the assumption that collected demonstrations are optimal cannot always hold in real-world tasks, which would seriously influence the performance of the learned agent. In this paper, we propose a robust method within the framework of Generative Adversarial Imitation Learning (GAIL) to address imperfect demonstration issue, in which good demonstrations can be adaptively selected for training while bad demonstrations are abandoned. Specifically, a binary weight is assigned to each expert demonstration to indicate whether to select it for training. The reward function in GAIL is employed to determine this weight (i.e. higher reward results in higher weight). Compared to some existing solutions that require some auxiliary information about this weight, we set up the connection between weight and model so that we can jointly optimize GAIL and learn the latent weight. Besides hard binary weighting, we also propose a soft weighting scheme. Experiments in the Mujoco demonstrate the proposed method outperforms other GAIL-based methods when dealing with imperfect demonstrations.

[1]  Jc Shepherdson,et al.  Machine Intelligence 15 , 1998 .

[2]  Hsiu-Chin Lin,et al.  The 2017 IEEE International Conference on Robotics and Automation (ICRA) , 2017 .

[3]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[4]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[5]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[6]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[7]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[8]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[9]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[10]  Chang Xu,et al.  Learning to Weight Imperfect Demonstrations , 2021, ICML.

[11]  Alessandro Lazaric,et al.  Semi-Supervised Apprenticeship Learning , 2012, EWRL.

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Peter Englert,et al.  Model-based imitation learning by probabilistic trajectory matching , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16]  Katta G. Murty,et al.  Nonlinear Programming Theory and Algorithms , 2007, Technometrics.

[17]  Mohammad Emtiyaz Khan,et al.  VILD: Variational Imitation Learning with Diverse-quality Demonstrations , 2019, ICML.

[18]  Scott Niekum,et al.  Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.

[19]  Huang Xiao,et al.  Wasserstein Adversarial Imitation Learning , 2019, ArXiv.

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[22]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[23]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[24]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[25]  Eric Moulines,et al.  An Experimental Evaluation of Reinforcement Learning for Gain Scheduling , 2003, HIS.

[26]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[27]  Masashi Sugiyama,et al.  Imitation Learning from Imperfect Demonstration , 2019, ICML.

[28]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[29]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[30]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Dacheng Tao,et al.  Multi-view Self-Paced Learning for Clustering , 2015, IJCAI.