Policy transfer via modularity and reward guiding

Non-prehensile manipulation, such as pushing, is an important function for robots to move objects and is sometimes preferred as an alternative to grasping. However, due to unknown frictional forces, pushing has been proven a difficult task for robots. We explore the use of reinforcement learning to train a robot to robustly push an object. In order to deal with the sample complexity of training such a method, we train the pushing policy in simulation and then transfer this policy to the real world. In order to ease the transfer from simulation, we propose to use modularity to separate the learned policy from the raw inputs and outputs; rather than training “end-to-end,” we decompose our system into modules and train only a subset of these modules in simulation. We further demonstrate that we can incorporate prior knowledge about the task into the state space and the reward function to speed up convergence. Finally, we introduce “reward guiding” to modify the reward function and further reduce the training time. We demonstrate, in both simulation and real-world experiments, that such an approach can be used to reliably push an object from many initial positions and orientations. Videos available at https://goo.gl/B7LtY3.

[1]  Matthew T. Mason,et al.  Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[2]  A. Ruina,et al.  Planar sliding with dry friction Part 1. Limit surface and moment function , 1991 .

[3]  Soo-Hong Lee,et al.  Fixture planning with friction , 1991 .

[4]  Kazuo Tanie,et al.  Manipulation And Active Sensing By Pushing Using Tactile Feedback , 1992, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Giulio Sandini,et al.  A Vision-Based Learning Method for Pushing Manipulation , 1993 .

[6]  Mark R. Cutkosky,et al.  Practical Force-Motion Models for Sliding Manipulation , 1996, Int. J. Robotics Res..

[7]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[8]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[9]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[10]  John Kenneth Salisbury,et al.  Pushing using learned manipulation maps , 2008, 2008 IEEE International Conference on Robotics and Automation.

[11]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[12]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[13]  Peter Stone,et al.  Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[14]  S. Srinivasa,et al.  Push-grasping with dexterous hands: Mechanics and a method , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Siddhartha S. Srinivasa,et al.  A Framework for Push-Grasping in Clutter , 2011, Robotics: Science and Systems.

[16]  Takeo Igarashi,et al.  Automatic learning of pushing strategy for delivery of irregular-shaped objects , 2011, 2011 IEEE International Conference on Robotics and Automation.

[17]  Akansel Cosgun,et al.  Push planning for object placement on cluttered table surfaces , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Claudio Zito,et al.  Two-level RRT planning for robotic push manipulation , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Eric Brachmann,et al.  Learning 6D Object Pose Estimation Using 3D Object Coordinates , 2014, ECCV.

[21]  Carme Torras,et al.  Dimensionality reduction and motion coordination in learning trajectories with Dynamic Movement Primitives , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22]  Eric Brachmann,et al.  6-DOF Model Based Tracking via Object Coordinate Regression , 2014, ACCV.

[23]  Carme Torras,et al.  A friction-model-based framework for Reinforcement Learning of robotic tasks in non-rigid environments , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Manuela M. Veloso,et al.  Push-manipulation of complex passive mobile objects using experimentally acquired motion models , 2015, Auton. Robots.

[25]  Manuela M. Veloso,et al.  A Case-Based Approach to Mobile Push-Manipulation , 2015, J. Intell. Robotic Syst..

[26]  Jitendra Malik,et al.  Aligning 3D models to RGB-D images of cluttered scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[28]  Leslie Pack Kaelbling,et al.  Hierarchical planning for multi-contact non-prehensile manipulation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[30]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[31]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[32]  Eric Brachmann,et al.  Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[34]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[35]  Jonathan P. How,et al.  Autonomous drifting using simulation-aided reinforcement learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[37]  J. Andrew Bagnell,et al.  A convex polynomial force-motion model for planar sliding: Identification and application , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Kuan-Ting Yu,et al.  More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[40]  Nancy M. Amato,et al.  Multi-agent push behaviors for large sets of passive objects , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Pieter Abbeel,et al.  Combining model-based policy search with online model learning for control of physical humanoids , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Sergey Levine,et al.  Path integral guided policy search , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[44]  Rustam Stolkin,et al.  Learning modular and transferable forward models of the motions of push manipulated objects , 2017, Auton. Robots.