Third-Person Imitation Learning

Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first-person demonstrations. Humans address this problem by learning from third-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves. In this paper, we present a method for unsupervised third-person imitation learning. Here third-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third-person demonstrations, and is not provided a correspondence between teacher states and student states. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.

[1]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[2]  David J. C. MacKay,et al.  Unsupervised Classifiers, Mutual Information and 'Phantom Targets' , 1991, NIPS.

[3]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[4]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[5]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6]  Chrystopher L. Nehaniv,et al.  Like Me?- Measures of Correspondence and Imitation , 2001, Cybern. Syst..

[7]  M. Tomasello,et al.  Understanding "prior intentions" enables two-year-olds to imitatively learn a complex task. , 2002, Child development.

[8]  Baselines , 2004, Cases and Materials on the Law of the Sea.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  David Barber,et al.  Kernelized Infomax Clustering , 2005, NIPS.

[11]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[12]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[14]  T. Belpraeme,et al.  Imitation and Social Learning in Robots, Humans and Animals: Behavioural, Social and Communicative Dimensions , 2006 .

[15]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[16]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[17]  C. Nehaniv Imitation and Social Learning in Robots, Humans and Animals: Nine billion correspondence problems , 2007 .

[18]  Aude Billard,et al.  On Learning, Representing, and Generalizing a Task in a Humanoid Robot , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[20]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[21]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[22]  Henk Nijmeijer,et al.  Robot Programming by Demonstration , 2010, SIMPAR.

[23]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[24]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[25]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[26]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[27]  Andrew Zisserman,et al.  Tabula rasa: Model transfer for object category detection , 2011, 2011 International Conference on Computer Vision.

[28]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[29]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[30]  Trevor Darrell,et al.  What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[31]  Monica Malvezzi,et al.  An Object-Based Approach to Map Human Hand Synergies onto Robotic Hands with Dissimilar Kinematics , 2012, Robotics: Science and Systems.

[32]  Ivor W. Tsang,et al.  Learning with Augmented Features for Heterogeneous Domain Adaptation , 2012, ICML.

[33]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[34]  Trevor Darrell,et al.  Efficient Learning of Domain-invariant Image Representations , 2013, ICLR.

[35]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[38]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[39]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[40]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[41]  Markus Wulfmeier,et al.  Maximum Entropy Deep Inverse Reinforcement Learning , 2015, 1507.04888.

[42]  Marc Toussaint,et al.  Direct Loss Minimization Inverse Optimal Control , 2015, Robotics: Science and Systems.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[45]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[46]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[47]  Sergey Levine,et al.  Learning dexterous manipulation for a soft robotic hand from human demonstrations , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[48]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[49]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[50]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[51]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[52]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[53]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[54]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .