论文信息 - Imitating Unknown Policies via Exploration

Imitating Unknown Policies via Exploration

Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. However, the iterative learning scheme from these techniques are prone to getting stuck into bad local minima. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration, substantially improving traditional behavioral cloning by exploiting (i) a sampling mechanism to prevent bad local minima, (ii) a sampling mechanism to improve exploration, and (iii) self-attention modules to capture global features. The resulting technique outperforms the previous state-of-the-art in four different environments by a large margin.

Juarez Monteiro | Rodrigo C. Barros | Roger Granada | Felipe Meneguzzi | Nathan Gavenski

[1] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[2] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[3] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.

[4] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[5] Mohamed Medhat Gaber,et al. Imitation Learning , 2017, ACM Comput. Surv..

[6] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[7] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[8] Sajjad Haider,et al. Teaching coordinated strategies to soccer robots via imitation , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[9] Peter Stone,et al. RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[10] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[11] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[13] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Juarez Monteiro,et al. Augmented Behavioral Cloning from Observation , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[16] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[17] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[19] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20] G. Rizzolatti,et al. The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations , 2010, Nature Reviews Neuroscience.