Imitation Learning from Video by Leveraging Proprioception

Classically, imitation learning algorithms have been developed for idealized situations, e.g., the demonstrations are often required to be collected in the exact same environment and usually include the demonstrator's actions. Recently, however, the research community has begun to address some of these shortcomings by offering algorithmic solutions that enable imitation learning from observation (IfO), e.g., learning to perform a task from visual demonstrations that may be in a different environment and do not include actions. Motivated by the fact that agents often also have access to their own internal states (i.e., proprioception), we propose and study an IfO algorithm that leverages this information in the policy learning process. The proposed architecture learns policies over proprioceptive state representations and compares the resulting trajectories visually to the demonstration data. We experimentally test the proposed technique on several MuJoCo domains and show that it outperforms other imitation from observation algorithms by a large margin.

[1]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[2]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[3]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[4]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[5]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[6]  Gal A. Kaminka,et al.  Online goal recognition through mirroring: humans and agents , 2016 .

[7]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[8]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[9]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Multi-view Observation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Michael C. Yip,et al.  Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[11]  Peter Stone,et al.  Adversarial Imitation Learning from State-only Demonstrations , 2019, AAMAS.

[12]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[14]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[15]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[16]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[19]  Jc Shepherdson,et al.  Machine Intelligence 15 , 1998 .

[20]  M. Brainin Cognition , 1999, Journal of the Neurological Sciences.

[21]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[22]  Scott Niekum,et al.  One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Peter Stone,et al.  RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[26]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[27]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[28]  Tetsuya Yohira,et al.  Sample Efficient Imitation Learning for Continuous Control , 2018, ICLR.

[29]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[30]  Felipe Meneguzzi,et al.  Towards Online Goal Recognition Combining Goal Mirroring and Landmarks , 2018, AAMAS.

[31]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[32]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.