QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[5]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6]  Kuan-Ting Yu,et al.  Realtime State Estimation with Tactile and Visual Sensing. Application to Planar Manipulation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[8]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[9]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Nikhil Chavan Dafle,et al.  Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking Contacts , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[12]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[13]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[14]  Kate Saenko,et al.  Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[15]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[16]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[17]  Danica Kragic,et al.  Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[18]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Peter Corke,et al.  Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[20]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[21]  Peter I. Corke,et al.  Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Sergey Levine,et al.  Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[25]  Sergey Levine,et al.  Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[27]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[28]  Pieter Abbeel,et al.  Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[29]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[30]  Kate Saenko,et al.  Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[31]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[32]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[33]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[34]  Alberto Rodriguez,et al.  Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Danica Kragic,et al.  Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Gaurav S. Sukhatme,et al.  Regrasping Using Tactile Perception and Supervised Policy Learning , 2017, AAAI Spring Symposia.

[37]  Xinyu Liu,et al.  Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[38]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[39]  Peter K. Allen,et al.  Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[40]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[41]  Martin A. Riedmiller,et al.  Reinforcement learning in feedback control , 2011, Machine Learning.

[42]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.