论文信息 - Recent Advances in Imitation Learning from Observation

Recent Advances in Imitation Learning from Observation

Imitation learning is the process by which one agent tries to learn how to perform a certain task using information generated by another, often more-expert agent performing that same task. Conventionally, the imitator has access to both state and action information generated by an expert performing the task (e.g., the expert may provide a kinesthetic demonstration of object placement using a robotic arm). However, requiring the action information prevents imitation learning from a large number of existing valuable learning resources such as online videos of humans performing tasks. To overcome this issue, the specific problem of imitation from observation (IfO) has recently garnered a great deal of attention, in which the imitator only has access to the state information (e.g., video frames) generated by the expert. In this paper, we provide a literature review of methods developed for IfO, and then point out some open research problems and potential future work.

[1] T. Michael Knasel,et al. Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[2] Aravaipa Canyon Basin,et al. Volume 3 , 2012, Journal of Diabetes Investigation.

[3] Jc Shepherdson,et al. Machine Intelligence 15 , 1998 .

[4] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5] Lambert Schomaker,et al. 2000 IEEE/RSJ International Conference On Intelligent Robots And Systems , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[6] Jun Nakanishi,et al. Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[7] Jun Nakanishi,et al. Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[8] Gordon Cheng,et al. Humanoid robot learning and game playing using PC-based vision , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Jun Morimoto,et al. Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[10] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Dock Bumpers,et al. Volume 2 , 2005, Proceedings of the Ninth International Conference on Computer Supported Cooperative Work in Design, 2005..

[13] Aude Billard,et al. Incremental learning of gestures by imitation in a humanoid robot , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[15] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[16] Peter Stone,et al. MARIOnET: motion acquisition for robots through iterative online evaluative training , 2010, AAMAS.

[17] Barteld Kooi,et al. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems , 2011, Adaptive Agents and Multi-Agent Systems.

[18] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[19] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[21] Sergey Levine,et al. Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[22] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[23] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[24] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[25] Martial Hebert,et al. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[26] Martial Hebert,et al. Learning Transferable Policies for Monocular Reactive MAV Control , 2016, ISER.

[27] Taku Komura,et al. A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[28] Pieter Abbeel,et al. Third-Person Imitation Learning , 2017, ICLR.

[29] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[30] Jitendra Malik,et al. Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Stefan Schaal,et al. Learning from Demonstration , 1996, NIPS.

[33] Sergey Levine,et al. Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning , 2017, ICLR.

[34] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[35] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36] Jitendra Malik,et al. SFV , 2018, ACM Trans. Graph..

[37] Sergey Levine,et al. DeepMimic , 2018, ACM Trans. Graph..

[38] Peter Stone,et al. Generative Adversarial Imitation from Observation , 2018, ArXiv.

[39] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[40] Ryuki Tachibana,et al. Internal Model from Observations for Reward Shaping , 2018, ArXiv.

[41] Pedro H. O. Pinheiro,et al. Reinforced Imitation Learning from Observations , 2018 .

[42] Sergey Levine,et al. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[43] Nando de Freitas,et al. Playing hard exploration games by watching YouTube , 2018, NeurIPS.

[44] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[45] Pieter Abbeel,et al. An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[46] Joelle Pineau,et al. OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning , 2017, AAAI.

[47] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48] Jonathan Tompson,et al. Learning Actionable Representations from Visual Observations , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[50] Mei Wang,et al. Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[51] Mo Yu,et al. Hybrid Reinforcement Learning with Expert State Sequences , 2019, AAAI.

[52] Peter Stone,et al. Imitation Learning from Video by Leveraging Proprioception , 2019, IJCAI.

[53] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.

[54] Byron Boots,et al. Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.

[55] Peter Stone,et al. Adversarial Imitation Learning from State-only Demonstrations , 2019, AAMAS.

[56] Scott Niekum,et al. One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[57] Peter Stone,et al. Sample-efficient Adversarial Imitation Learning from Observation , 2019, ArXiv.

[58] Peter Stone,et al. Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[59] Chen Qian,et al. 3D Human Pose Machines with Self-Supervised Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] Peter Stone,et al. RIDM: Reinforced Inverse Dynamics Modeling for Learning from a Single Observed Demonstration , 2019, IEEE Robotics and Automation Letters.

[61] Towards effective algorithms for linear groups , 2006 .