Zero-Shot Visual Imitation

Imitating expert demonstration is a powerful mechanism for learning to perform tasks from raw sensory observations. The current dominant paradigm in learning from demonstration (LfD) [3,16,19,20] requires the expert to either manually move the robot joints (i.e., kinesthetic teaching) or teleoperate the robot to execute the desired task. The expert typically provides multiple demonstrations of a task at training time, and this generates data in the form of observation-action pairs from the agent's point of view. The agent then distills this data into a policy for performing the task of interest. Such a heavily supervised approach, where it is necessary to provide demonstrations by controlling the robot, is incredibly tedious for the human expert. Moreover, for every new task that the robot needs to execute, the expert is required to provide a new set of demonstrations.

[1]  Pieter Abbeel,et al.  Third-Person Imitation Learning , 2017, ICLR.

[2]  Rüdiger Dillmann,et al.  Teaching and learning of robot tasks via observation of human performance , 2004, Robotics Auton. Syst..

[3]  Sergey Levine,et al.  Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[4]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2015, ICLR.

[6]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[7]  Sergey Levine,et al.  One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[8]  Pierre-Yves Oudeyer,et al.  Intrinsic Motivation Systems for Autonomous Mental Development , 2007, IEEE Transactions on Evolutionary Computation.

[9]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[10]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[11]  Dean Pomerleau,et al.  ALVINN: An Autonomous Land Vehicle in a Neural Network , 1988, NIPS.

[12]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[13]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[14]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[15]  Yi Li,et al.  Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[16]  Jitendra Malik,et al.  Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[19]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[23]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[24]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  David W. Murray,et al.  Mobile Robot Localisation Using Active Vision , 1998, ECCV.

[26]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[27]  Peter K. Allen,et al.  Active, uncalibrated visual servoing , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[28]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[30]  C. Breazeal,et al.  Robots that imitate humans , 2002, Trends in Cognitive Sciences.

[31]  William J. Wilson,et al.  Relative end-effector control using Cartesian position based visual servoing , 1996, IEEE Trans. Robotics Autom..

[32]  Peter Dayan,et al.  Structure in the Space of Value Functions , 2002, Machine Learning.

[33]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[34]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2014, ICLR.

[36]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2017, IEEE Transactions on Robotics.

[37]  Martin A. Riedmiller,et al.  Acquiring visual servoing reaching and grasping skills using neural reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[38]  Éric Marchand,et al.  Photometric visual servoing for omnidirectional cameras , 2013, Auton. Robots.

[39]  Sergey Levine,et al.  Learning Visual Servoing with Deep Features and Fitted Q-Iteration , 2017, ICLR.

[40]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[41]  Masayuki Inaba,et al.  Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[42]  Katsushi Ikeuchi,et al.  Toward an assembly plan from observation. I. Task recognition with polyhedral objects , 1994, IEEE Trans. Robotics Autom..

[43]  Anand Rangarajan,et al.  A new point matching algorithm for non-rigid registration , 2003, Comput. Vis. Image Underst..

[44]  Koichi Hashimoto,et al.  Visual Servoing: Real-Time Control of Robot Manipulators Based on Visual Sensory Feedback , 1993 .

[45]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[46]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[47]  Andrew Y. Ng,et al.  Algorithms for Inverse Reinforcement Learning , 2000, ICML.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[50]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.