Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments

Deep reinforcement learning (RL) agents are able to learn contact-rich manipulation tasks by maximizing a reward signal, but require large amounts of experience, especially in environments with many obstacles that complicate exploration. In contrast, motion planners use explicit models of the agent and environment to plan collision-free paths to faraway goals, but suffer from inaccurate models in tasks that require contacts with the environment. To combine the benefits of both approaches, we propose motion planner augmented RL (MoPA-RL) which augments the action space of an RL agent with the long-horizon planning capabilities of motion planners. Based on the magnitude of the action, our approach smoothly transitions between directly executing the action and invoking a motion planner. We evaluate our approach on various simulated manipulation tasks and compare it to alternative action spaces in terms of learning efficiency and safety. The experiments demonstrate that MoPA-RL increases learning efficiency, leads to a faster exploration, and results in safer policies that avoid collisions with the environment. Videos and code are available at this https URL .

[1]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[2]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[3]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[4]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[5]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[6]  Milan Simic,et al.  Sampling-Based Robot Motion Planning: A Review , 2014, IEEE Access.

[7]  Mark H. Overmars,et al.  Creating High-quality Paths for Motion Planning , 2007, Int. J. Robotics Res..

[8]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[9]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Sergey Levine,et al.  Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[13]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[14]  Joseph J. Lim,et al.  Learning to Coordinate Manipulation Skills via Skill Behavior Diversification , 2020, ICLR.

[15]  Mark H. Overmars,et al.  A random approach to motion planning , 1992 .

[16]  Silvio Savarese,et al.  ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[17]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[18]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[19]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[21]  Joseph J. Lim,et al.  Composing Complex Skills by Learning Transition Policies , 2018, ICLR.

[22]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[23]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Bruce Randall Donald,et al.  Algorithmic and Computational Robotics: New Directions , 2001 .

[25]  Ales Ude,et al.  Enhanced Policy Adaptation Through Directed Explorative Learning , 2015, Int. J. Humanoid Robotics.

[26]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[27]  Lydia E. Kavraki,et al.  Randomized preprocessing of configuration for fast path planning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[28]  Dinesh Manocha,et al.  An efficient retraction-based RRT planner , 2008, 2008 IEEE International Conference on Robotics and Automation.

[29]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[30]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[31]  OpenAI Learning Dexterous In-Hand Manipulation. , 2018 .

[32]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[33]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[34]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[35]  Peter Englert,et al.  Learning manipulation skills from a single demonstration , 2018, Int. J. Robotics Res..

[36]  Lydia E. Kavraki,et al.  The Open Motion Planning Library , 2012, IEEE Robotics & Automation Magazine.

[37]  Yordan Hristov,et al.  Composing Diverse Policies for Temporally Extended Tasks , 2019, IEEE Robotics and Automation Letters.

[38]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[39]  Nancy M. Amato,et al.  A randomized roadmap method for path and manipulation planning , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[40]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[41]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[42]  Dieter Fox,et al.  Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).