Combating False Negatives in Adversarial Imitation Learning

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior. However, as the trained policy learns to be more successful, the negative examples (the ones produced by the agent) become increasingly similar to expert ones. Despite the fact that the task is successfully accomplished in some of the agent's trajectories, the discriminator is trained to output low values for them. We hypothesize that this inconsistent training signal for the discriminator can impede its learning, and consequently leads to worse overall performance of the agent. We show experimental evidence for this hypothesis and that the 'False Negatives' (i.e. successful agent episodes) significantly hinder adversarial imitation learning, which is the first contribution of this paper. Then, we propose a method to alleviate the impact of false negatives and test it on the BabyAI environment. This method consistently improves sample efficiency over the baselines by at least an order of magnitude.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[3]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[4]  Demis Hassabis,et al.  Grounded Language Learning in a Simulated 3D World , 2017, ArXiv.

[5]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6]  Pushmeet Kohli,et al.  Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.

[7]  Terry Winograd,et al.  Understanding natural language , 1974 .

[8]  Misha Denil,et al.  Task-Relevant Adversarial Imitation Learning , 2019, CoRL.

[9]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[10]  Gui Jian-ping A Survey of Reinforcement Learning , 2008 .

[11]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[12]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[13]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[14]  Sergey Levine,et al.  From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following , 2019, ICLR.

[15]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[17]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[18]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[19]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[20]  Misha Denil,et al.  Positive-Unlabeled Reward Learning , 2019, CoRL.