论文信息 - QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations.

[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[5] Marlos C. Machado,et al. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[6] Kuan-Ting Yu,et al. Realtime State Estimation with Tactile and Visual Sensing. Application to Planar Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] Sergey Levine,et al. Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[9] Sergey Levine,et al. Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] Nikhil Chavan Dafle,et al. Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking Contacts , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[11] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[12] Dirk P. Kroese,et al. Cross‐Entropy Method , 2011 .

[13] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[14] Kate Saenko,et al. Grasp Pose Detection in Point Clouds , 2017, Int. J. Robotics Res..

[15] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[16] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[17] Danica Kragic,et al. Data-Driven Grasp Synthesis—A Survey , 2013, IEEE Transactions on Robotics.

[18] Stefan Schaal,et al. Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19] Peter Corke,et al. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach , 2018, Robotics: Science and Systems.

[20] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[21] Peter I. Corke,et al. Cartman: The Low-Cost Cartesian Manipulator that Won the Amazon Robotics Challenge , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22] Abhinav Gupta,et al. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[23] Sergey Levine,et al. Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[25] Sergey Levine,et al. Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26] Lei Xu,et al. Input Convex Neural Networks : Supplementary Material , 2017 .

[27] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[28] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[29] Honglak Lee,et al. Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[30] Kate Saenko,et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[31] Sergey Levine,et al. Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[32] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[33] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[34] Alberto Rodriguez,et al. Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36] Gaurav S. Sukhatme,et al. Regrasping Using Tactile Perception and Supervised Policy Learning , 2017, AAAI Spring Symposia.

[37] Xinyu Liu,et al. Dex-Net 3.0: Computing Robust Robot Suction Grasp Targets in Point Clouds using a New Analytic Model and Deep Learning , 2017, ArXiv.

[38] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[39] Peter K. Allen,et al. Pose error robust grasping from contact wrench space metrics , 2012, 2012 IEEE International Conference on Robotics and Automation.

[40] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[41] Martin A. Riedmiller,et al. Reinforcement learning in feedback control , 2011, Machine Learning.

[42] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.