Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

This paper proposes a sample-efficient yet simple approach to learning closed-loop policies for nonprehensile manipulation. Although reinforcement learning (RL) can learn closed-loop policies without requiring access to underlying physics models, it suffers from poor sample complexity on challenging tasks. To overcome this problem, we leverage rearrangement planning to provide an informative physics-based prior on the environment's optimal state-visitation distribution. Specifically, we present a new technique, Learning with Planned Episodic Resets (LeaPER), that resets the environment's state to one informed by the prior during the learning phase. We experimentally show that LeaPER significantly outperforms traditional RL approaches by a factor of up to 5X on simulated rearrangement. Further, we relax dynamics from quasi-static to welded contacts to illustrate that LeaPER is robust to the use of simpler physics models. Finally, LeaPER's closed-loop policies significantly improve task success rates relative to both open-loop controls with a planned path or simple feedback controllers that track open-loop trajectories. We demonstrate the performance and behavior of LeaPER on a physical 7-DOF manipulator in this https URL.

[1]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[2]  Daniel E. Koditschek,et al.  Spring loaded inverted pendulum running: a plant model , 1998 .

[3]  Sergey Levine,et al.  MBMF: Model-Based Priors for Model-Free Reinforcement Learning , 2017, ArXiv.

[4]  Matthew T. Mason,et al.  Pushing revisited: Differential flatness, trajectory planning, and stabilization , 2019, Int. J. Robotics Res..

[5]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[6]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[7]  Siddhartha S. Srinivasa,et al.  Rearrangement planning using object-centric and robot-centric action spaces , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[9]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[10]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[11]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Siddhartha S. Srinivasa,et al.  Nonprehensile whole arm rearrangement planning on physics manifolds , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Kevin M. Lynch,et al.  Stable Pushing: Mechanics, Controllability, and Planning , 1995, Int. J. Robotics Res..

[14]  Abhinav Gupta,et al.  The Curious Robot: Learning Visual Representations via Physical Interactions , 2016, ECCV.

[15]  Gordon T. Wilfong Motion planning in the presence of movable obstacles , 1988, SCG '88.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Kostas E. Bekris,et al.  Dealing with Difficult Instances of Object Rearrangement , 2015, Robotics: Science and Systems.

[18]  Siddhartha S. Srinivasa,et al.  Pregrasp Manipulation as Trajectory Optimization , 2013, Robotics: Science and Systems.

[19]  Ferdinando A. Mussa-Ivaldi,et al.  Vector field approximation: a computational paradigm for motor control and learning , 1992, Biological Cybernetics.

[20]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[21]  Thierry Siméon,et al.  Manipulation Planning with Probabilistic Roadmaps , 2004, Int. J. Robotics Res..

[22]  Tamim Asfour,et al.  Manipulation Planning Among Movable Obstacles , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[23]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[24]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[25]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Kostas E. Bekris,et al.  Efficiently solving general rearrangement tasks: A fast extension primitive for an incremental sampling-based planner , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[28]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[29]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[30]  Lydia Tapia,et al.  PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[32]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[33]  Traian Rebedea,et al.  Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.

[34]  A. Ruina,et al.  Planar sliding with dry friction Part 1. Limit surface and moment function , 1991 .

[35]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Mark H. Overmars,et al.  An Effective Framework for Path Planning Amidst Movable Obstacles , 2006, WAFR.

[37]  Abhinav Gupta,et al.  Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias , 2018, NeurIPS.

[38]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[40]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[42]  Jun Ota,et al.  Rearrangement of multiple movable objects - integration of global and local planning methodology , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[43]  Matthew T. Mason,et al.  Mechanics and Planning of Manipulator Pushing Operations , 1986 .

[44]  Rachid Alami,et al.  Two manipulation planning algorithms , 1995 .

[45]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[46]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[47]  Mike Stilman,et al.  Navigation among movable obstacles , 2007 .

[48]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[49]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[50]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[52]  Daniel E. Whitney,et al.  Quasi-Static Assembly of Compliantly Supported Rigid Parts , 1982 .

[53]  Ranjan Vepa,et al.  Modelling and Control of the Barrett Hand for Grasping , 2013, 2013 UKSim 15th International Conference on Computer Modelling and Simulation.