Recent Advances in Imitation Learning from Observation

Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task. Conventionally, the imitator has access to both state and action information generated by an expert performing the task (e.g., the expert may provide a kinesthetic demonstration of object placement using a robotic arm). However, requiring the action information prevents imitation learning from a large number of existing valuable learning resources such as online videos of humans performing tasks. To overcome this issue, the specific problem of imitation from observation (IfO) has recently garnered a great deal of attention, in which the imitator only has access to the state information (e.g., video frames) generated by the expert. In this paper, we provide a literature review of methods developed for IfO, and then point out some open research problems and potential future work.

[1]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[2]  Aravaipa Canyon Basin,et al.  Volume 3 , 2012, Journal of Diabetes Investigation.

[3]  Jc Shepherdson,et al.  Machine Intelligence 15 , 1998 .

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Lambert Schomaker,et al.  2000 IEEE/RSJ International Conference On Intelligent Robots And Systems , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[6]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[7]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[8]  Gordon Cheng,et al.  Humanoid robot learning and game playing using PC-based vision , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[10]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Dock Bumpers,et al.  Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[13]  Aude Billard,et al.  Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16]  Peter Stone,et al.  MARIOnET: motion acquisition for robots through iterative online evaluative training , 2010, AAMAS.

[17]  Barteld Kooi,et al.  Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems , 2011, Adaptive Agents and Multi-Agent Systems.

[18]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[22]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[23]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[24]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[25]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[26]  Martial Hebert,et al.  Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[27]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[28]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[29]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[30]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.

[33]  Sergey Levine,et al.  Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[34]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[35]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[37]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[38]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[39]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Ryuki Tachibana,et al.  Internal Model from Observations for Reward Shaping , 2018, ArXiv.

[41]  Pedro H. O. Pinheiro,et al.  Reinforced Imitation Learning from Observations , 2018 .

[42]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[43]  Nando de Freitas,et al.  Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[44]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[46]  Joelle Pineau,et al.  OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.

[47]  Jitendra Malik,et al.  Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Jonathan Tompson,et al.  Learning Actionable Representations from Visual Observations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Peter Stone,et al.  Behavioral Cloning from Observation , 2018, IJCAI.

[50]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[51]  Mo Yu,et al.  Hybrid Reinforcement Learning with Expert State Sequences , 2019, AAAI.

[52]  Peter Stone,et al.  Imitation Learning from Video by Leveraging Proprioception , 2019, IJCAI.

[53]  Yannick Schroecker,et al.  Imitating Latent Policies from Observation , 2018, ICML.

[54]  Byron Boots,et al.  Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.

[55]  Peter Stone,et al.  Adversarial Imitation Learning from State-only Demonstrations , 2019, AAMAS.

[56]  Scott Niekum,et al.  One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[57]  Peter Stone,et al.  Sample-efficient Adversarial Imitation Learning from Observation , 2019, ArXiv.

[58]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59]  Chen Qian,et al.  3D Human Pose Machines with Self-Supervised Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Peter Stone,et al.  RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[61]  Towards effective algorithms for linear groups , 2006 .