Relational Mimic for Visual Adversarial Imitation Learning

In this work, we introduce a new method for imitation learning from video demonstrations. Our method, Relational Mimic (RM), improves on previous visual imitation learning methods by combining generative adversarial networks and relational learning. RM is flexible and can be used in conjunction with other recent advances in generative adversarial imitation learning to better address the need for more robust and sample-efficient approaches. In addition, we introduce a new neural network architecture that improves upon the previous state-of-the-art in reinforcement learning and illustrate how increasing the relational reasoning capabilities of the agent enables the latter to achieve increasingly higher performance in a challenging locomotion task with pixel inputs. Finally, we study the effects and contributions of relational learning in policy evaluation, policy improvement and reward learning through ablation studies.

[1]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[2]  Zongqing Lu,et al.  Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation , 2018, ArXiv.

[3]  Marco Pavone,et al.  Risk-Sensitive Generative Adversarial Imitation Learning , 2018, AISTATS.

[4]  Dean Pomerleau,et al.  Rapidly Adapting Artificial Neural Networks for Autonomous Navigation , 1990, NIPS.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[9]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[12]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[15]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[16]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[17]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[18]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[19]  Andrew M. Dai,et al.  Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step , 2017, ICLR.

[20]  Taehoon Kim,et al.  Quantifying Generalization in Reinforcement Learning , 2018, ICML.

[21]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[22]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[23]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[24]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[25]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[26]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[27]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[28]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[30]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[31]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[33]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[34]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[37]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[38]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[39]  Yedid Hoshen,et al.  VAIN: Attentional Multi-agent Predictive Modeling , 2017, NIPS.

[40]  Robert E. Schapire,et al.  A Reduction from Apprenticeship Learning to Classification , 2010, NIPS.

[41]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[42]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[43]  Xiaohua Zhai,et al.  The GAN Landscape: Losses, Architectures, Regularization, and Normalization , 2018, ArXiv.

[44]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[46]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[47]  András György,et al.  The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.

[48]  Sanja Fidler,et al.  NerveNet: Learning Structured Policy with Graph Neural Networks , 2018, ICLR.

[49]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[50]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[51]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[52]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[53]  Alexandros Kalousis,et al.  Sample-Efficient Imitation Learning via Generative Adversarial Nets , 2018, AISTATS.

[54]  Lei Shi,et al.  Skeleton-Based Action Recognition With Directed Graph Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[56]  Oriol Vinyals,et al.  Synthesizing Programs for Images using Reinforced Adversarial Learning , 2018, ICML.

[57]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[59]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[60]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[61]  Stefano Ermon,et al.  InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations , 2017, NIPS.

[62]  José García Rodríguez,et al.  TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[63]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[64]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[65]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[66]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[67]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[68]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[69]  Stefano Ermon,et al.  Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[70]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[71]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[72]  Shane Legg,et al.  Scalable agent alignment via reward modeling: a research direction , 2018, ArXiv.