Adversarial behavioral cloning*

ABSTRACT Imitation learning has been widely applied for autonomous robotics control. A popular IL approach is apprenticeship learning (AL) which alternates RL and inverse reinforcement learning (IRL). AL fundamentally requires a large number of environment interactions and thus takes a long time for training. We believe that IL algorithms would be more applicable to real-world problems if the number of interactions could be reduced as close to zero as possible. In this paper, we propose an IL algorithm which we call Adversarial Behavioral Cloning (ABC). Experimental results on MuJoCo physics simulator show that our algorithm achieves competitive results with a state-of-the-art AL algorithm, namely generative adversarial imitation learning (GAIL), even without any environment interactions. GRAPHICAL ABSTRACT

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Shie Mannor,et al.  End-to-End Differentiable Adversarial Imitation Learning , 2017, ICML.

[3]  Doina Precup,et al.  Off-policy Learning with Options and Recognizers , 2005, NIPS.

[4]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[5]  Katsushi Ikeuchi,et al.  Modeling manipulation interactions by hidden Markov models , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[7]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[8]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[9]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[13]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[14]  Ilya Kostrikov,et al.  Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning , 2018, ICLR.

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Yoshihiko Nakamura,et al.  Embodied Symbol Emergence Based on Mimesis Theory , 2004, Int. J. Robotics Res..

[17]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[19]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[20]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[21]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[22]  Tetsuya Yohira,et al.  Sample Efficient Imitation Learning for Continuous Control , 2018, ICLR.

[23]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[24]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[27]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[28]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[29]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[30]  Jun Morimoto,et al.  Learning Stylistic Dynamic Movement Primitives from multiple demonstrations , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[32]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[33]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[34]  Sergey Levine,et al.  Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow , 2018, ICLR.

[35]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.