论文信息 - TRASS: Time Reversal as Self-Supervision

TRASS: Time Reversal as Self-Supervision

A longstanding challenge in robot learning for manipulation tasks has been the ability to generalize to varying initial conditions, diverse objects, and changing objectives. Learning based approaches have shown promise in producing robust policies, but require heavy supervision and large number of environment interactions, especially from visual inputs. We propose a novel self-supervision technique that uses time-reversal to provide high level supervision to reach goals. In particular, we introduce the time-reversal model (TRM), a self-supervised model which explores outward from a set of goal states and learns to predict these trajectories in reverse. This provides a high level plan towards goals, allowing us to learn complex manipulation tasks with no demonstrations or exploration at test time. We test our method on the domain of assembly, specifically the mating of tetris-style block pairs. Using our method operating atop visual model predictive control, we are able to assemble tetris blocks on a KuKa IIWA-7 using only uncalibrated RGB camera input, and generalize to unseen block pairs. Project’s-page: https://sites.google.com/view/time-reversal

[1] Urs A. Muller,et al. Learning long-range vision for autonomous off-road driving , 2009 .

[2] Dieter Fox,et al. SE3-nets: Learning rigid body motion using deep neural networks , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[3] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[5] Andrew J. Davison,et al. Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[6] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[7] Soshi Iba,et al. Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making , 2018, ISRR.

[8] Silvio Savarese,et al. Neural Task Programming: Learning to Generalize Across Hierarchical Tasks , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[10] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[11] Xinyu Liu,et al. Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[12] Andrew J. Davison,et al. Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[13] Sergey Levine,et al. QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[14] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[16] Patrick Rives,et al. A new approach to visual servoing in robotics , 1992, IEEE Trans. Robotics Autom..

[17] Byron Boots,et al. Deep Forward and Inverse Perceptual Models for Tracking and Prediction , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[18] Sergey Levine,et al. Learning Latent Plans from Play , 2019, CoRL.

[19] Pieter Abbeel,et al. Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[20] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[21] Vijay Kumar,et al. Vision-based control of a quadrotor for perching on lines , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[22] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Sergey Levine,et al. Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[25] Peter K. Allen,et al. Active, uncalibrated visual servoing , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[26] Sergey Levine,et al. Recall Traces: Backtracking Models for Efficient Reinforcement Learning , 2018, ICLR.

[27] Sergey Levine,et al. Stochastic Variational Video Prediction , 2017, ICLR.

[28] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30] Gregory D. Hager,et al. Visual Robot Task Planning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[31] Martin A. Riedmiller,et al. Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[32] Ross A. Knepper,et al. DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[33] Pieter Abbeel,et al. Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[34] Nando de Freitas,et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[35] Olac Fuentes,et al. Experimental evaluation of uncalibrated visual servoing for precision manipulation , 1997, Proceedings of International Conference on Robotics and Automation.

[36] Sergey Levine,et al. Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[37] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[38] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.