Imitating Unknown Policies via Exploration

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.

[1]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[2]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[3]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[4]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[5]  Mohamed Medhat Gaber,et al.  Imitation Learning , 2017, ACM Comput. Surv..

[6]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[7]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[8]  Sajjad Haider,et al.  Teaching coordinated strategies to soccer robots via imitation , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[9]  Peter Stone,et al.  RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Juarez Monteiro,et al.  Augmented Behavioral Cloning from Observation , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[19]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  G. Rizzolatti,et al.  The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations , 2010, Nature Reviews Neuroscience.