论文信息 - AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

AVID: Learning Multi-Stage Tasks via Pixel-Level Translation of Human Videos

Robotic reinforcement learning (RL) holds the promise of enabling robots to learn complex behaviors through experience. However, realizing this promise for long-horizon tasks in the real world requires mechanisms to reduce human burden in terms of defining the task and scaffolding the learning process. In this paper, we study how these challenges can be alleviated with an automated robotic learning framework, in which multi-stage tasks are defined simply by providing videos of a human demonstrator and then learned autonomously by the robot from raw image observations. A central challenge in imitating human videos is the difference in appearance between the human and robot, which typically requires manual correspondence. We instead take an automated approach and perform pixel-level image translation via CycleGAN to convert the human demonstration into a video of a robot, which can then be used to construct a reward function for a model-based RL algorithm. The robot then learns the task one stage at a time, automatically learning how to reset each stage to retry it multiple times without human-provided resets. This makes the learning process largely automatic, from intuitive task specification via a video to automated training with minimal human intervention. We demonstrate that our approach is capable of learning complex tasks, such as operating a coffee machine, directly from raw image observations, requiring only 20 minutes to provide human demonstrations and about 180 minutes of robot interaction.

Sergey Levine | Pieter Abbeel | Marvin Zhang | Laura Smith | Nikita Dhawan

[1] Gordon Cheng,et al. Transferring skills to humanoid robots by extracting semantic representations from observations of human activities , 2017, Artif. Intell..

[2] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[3] Ken Goldberg,et al. Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation , 2017, ICRA.

[4] Aude Billard,et al. Learning human arm movements by imitation: : Evaluation of a biologically inspired connectionist architecture , 2000, Robotics Auton. Syst..

[5] Darwin G. Caldwell,et al. Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Pieter Abbeel,et al. Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[7] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[8] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9] David Filliat,et al. State Representation Learning for Control: An Overview , 2018, Neural Networks.

[10] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[11] Jitendra Malik,et al. Zero-Shot Visual Imitation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12] Jan Peters,et al. Learning to sequence movement primitives from demonstrations , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Sergey Levine,et al. Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[14] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[15] Stefan Schaal,et al. Online movement adaptation based on previous sensor experiences , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[17] Sergey Levine,et al. Learning compound multi-step controllers under unknown dynamics , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[19] Stefan Schaal,et al. Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[20] Masayuki Inaba,et al. Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..

[21] Gillian M. Hayes,et al. A Robot Controller Using Learning by Imitation , 1994 .

[22] Sergey Levine,et al. Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control , 2018, ArXiv.

[23] Deepak Pathak,et al. Third-Person Visual Imitation Learning via Decoupled Hierarchical Controller , 2019, NeurIPS.

[24] Gregory D. Hager,et al. Transition State Clustering: Unsupervised Surgical Trajectory Segmentation for Robot Learning , 2017, ISRR.

[25] Carl E. Rasmussen,et al. Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Stefan Schaal,et al. http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[27] Sergey Levine,et al. Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[29] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[30] Jitendra Malik,et al. Combining self-supervised learning and imitation for vision-based rope manipulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[31] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[32] K. Dautenhahn,et al. The correspondence problem , 2002 .

[33] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[34] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[35] Peter Stone,et al. Behavioral Cloning from Observation , 2018, IJCAI.

[36] Yannick Schroecker,et al. Imitating Latent Policies from Observation , 2018, ICML.

[37] Michael Milford,et al. What Would You Do? Acting by Learning to Predict , 2017, IROS 2017.

[38] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[39] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[40] Jan Peters,et al. Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[41] Byron Boots,et al. Provably Efficient Imitation Learning from Observation Alone , 2019, ICML.

[42] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[43] Paul Evrard,et al. Learning collaborative manipulation tasks by demonstration using a haptic interface , 2009, ICAR.

[44] Jan Peters,et al. Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[45] Sergey Levine,et al. Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[46] Alexei A. Efros,et al. Time-Agnostic Prediction: Predicting Predictable Video Frames , 2018, ICLR.

[47] Joseph J. Lim,et al. To Follow or not to Follow: Selective Imitation Learning from Observations , 2019, CoRL.

[48] Oliver Kroemer,et al. Active Reward Learning , 2014, Robotics: Science and Systems.

[49] S. Schaal,et al. A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[50] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[51] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[52] Sergey Levine,et al. Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[53] Nicholas Rhinehart,et al. First-Person Activity Forecasting with Online Inverse Reinforcement Learning , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[54] Sergey Levine,et al. Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[55] Sergey Levine,et al. Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[56] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[57] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[58] Connor Schenck,et al. Visual closed-loop control for pouring liquids , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[59] Sergey Levine,et al. Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[60] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[61] Harshad Rai,et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[62] Yi Li,et al. Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.

[63] Stefan Schaal,et al. Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[64] Tae-Kyun Kim,et al. A syntactic approach to robot imitation learning using probabilistic activity grammars , 2013, Robotics Auton. Syst..

[65] Maya Cakmak,et al. Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).