End-to-end nonprehensile rearrangement with deep reinforcement learning and simulation-to-reality transfer

Abstract Nonprehensile rearrangement is the problem of controlling a robot to interact with objects through pushing actions in order to reconfigure the objects into a predefined goal pose. In this work, we rearrange one object at a time in an environment with obstacles using an end-to-end policy that maps raw pixels as visual input to control actions without any form of engineered feature extraction. To reduce the amount of training data that needs to be collected using a real robot, we propose a simulation-to-reality transfer approach. In the first step, we model the nonprehensile rearrangement task in simulation and use deep reinforcement learning to learn a suitable rearrangement policy, which requires in the order of hundreds of thousands of example actions for training. Thereafter, we collect a small dataset of only 70 episodes of real-world actions as supervised examples for adapting the learned rearrangement policy to real-world input data. In this process, we make use of newly proposed strategies for improving the reinforcement learning process, such as heuristic exploration and the curation of a balanced set of experiences. We evaluate our method in both simulation and real setting using a Baxter robot to show that the proposed approach can effectively improve the training process in simulation, as well as efficiently adapt the learned policy to the real world application, even when the camera pose is different from simulation. Additionally, we show that the learned system not only can provide adaptive behavior to handle unforeseen events during executions, such as distraction objects, sudden changes in positions of the objects, and obstacles, but also can deal with obstacle shapes that were not present in the training process.

[1]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  Danica Kragic,et al.  Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[4]  Siddhartha S. Srinivasa,et al.  Nonprehensile whole arm rearrangement planning on physics manifolds , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[7]  Sachin Chitta,et al.  MoveIt! [ROS Topics] , 2012, IEEE Robotics Autom. Mag..

[8]  Siddhartha S. Srinivasa,et al.  Unobservable Monte Carlo planning for nonprehensile rearrangement tasks , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[11]  Danica Kragic,et al.  Herding by Caging: a Topological Approach towards Guiding Moving Agents via Mobile Robots , 2017, Robotics: Science and Systems.

[12]  Sergey Levine,et al.  Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Siddhartha S. Srinivasa,et al.  Rearrangement planning using object-centric and robot-centric action spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Danica Kragic,et al.  A Framework for Optimal Grasp Contact Planning , 2017, IEEE Robotics and Automation Letters.

[17]  Radu Bogdan Rusu,et al.  3D is here: Point Cloud Library (PCL) , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Danica Kragic,et al.  Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  Self-Supervised Visual Planning with Temporal Skip Connections , 2017, CoRL.

[20]  Xinyu Liu,et al.  Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics , 2017, Robotics: Science and Systems.

[21]  Danica Kragic,et al.  Hierarchical Fingertip Space for multi-fingered precision grasping , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[23]  Danica Kragic,et al.  Combinatorial optimization for hierarchical contact-level grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[26]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Siddhartha S. Srinivasa,et al.  Physics-Based Grasp Planning Through Clutter , 2012, Robotics: Science and Systems.

[28]  Leslie Pack Kaelbling,et al.  A hierarchical approach to manipulation with diverse actions , 2013, 2013 IEEE International Conference on Robotics and Automation.

[29]  Aaron M. Johnson,et al.  A Probabilistic Planning Framework for Planar Grasping Under Uncertainty , 2017, IEEE Robotics and Automation Letters.

[30]  Siddhartha S. Srinivasa,et al.  Kinodynamic randomized rearrangement planning via dynamic transitions between statically stable states , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Marcin Andrychowicz,et al.  One-Shot Imitation Learning , 2017, NIPS.

[32]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[34]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[35]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[36]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[37]  Nima Fazeli,et al.  Identifiability Analysis of Planar Rigid-Body Frictional Contact , 2017, ISRR.

[38]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[39]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[40]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[41]  Akansel Cosgun,et al.  Push planning for object placement on cluttered table surfaces , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[44]  Tamim Asfour,et al.  Manipulation Planning Among Movable Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[45]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[47]  Qidan Zhu,et al.  Robot Path Planning Based on Artificial Potential Field Approach with Simulated Annealing , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Robert Babuska,et al.  Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[50]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Aaron M. Dollar,et al.  Classifying human manipulation behavior , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[52]  Joel W. Burdick,et al.  Interactive non-prehensile manipulation for grasping via POMDPs , 2013, 2013 IEEE International Conference on Robotics and Automation.

[53]  Jia Liu,et al.  Annotating Everyday Grasps in Action , 2014, Dance Notations and Robot Motion.

[54]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[55]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[56]  Gaurav S. Sukhatme,et al.  Using manipulation primitives for brick sorting in clutter , 2012, 2012 IEEE International Conference on Robotics and Automation.

[57]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[58]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[59]  J. O’Neill,et al.  Play it again: reactivation of waking experience and memory , 2010, Trends in Neurosciences.

[60]  Yun Jiang,et al.  Learning to place new objects , 2011, 2012 IEEE International Conference on Robotics and Automation.

[61]  Danica Kragic,et al.  Hierarchical Fingertip Space: A Unified Framework for Grasp Planning and In-Hand Grasp Adaptation , 2016, IEEE Transactions on Robotics.

[62]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[63]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[64]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[65]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[66]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[67]  Thierry Siméon,et al.  Manipulation Planning with Probabilistic Roadmaps , 2004, Int. J. Robotics Res..