Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd.

[1]  Shuran Song,et al.  Learning a Decentralized Multi-arm Motion Planner , 2020, CoRL.

[2]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[3]  Olivier Sigaud,et al.  PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning , 2020, ICANN.

[4]  Michael C. Yip,et al.  Motion Planning Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[6]  Matthew E. Taylor,et al.  Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning , 2017, ArXiv.

[7]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Marin Kobilarov,et al.  Using Data-Driven Domain Randomization to Transfer Robust Control Policies to Mobile Robots , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Jackie Kay,et al.  Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Vinicius G. Goecks,et al.  Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Sparse Reward Environments , 2019, ArXiv.

[12]  Mayur Joseph Bency Towards Neural Network Embeddings of Optimal Motion Planners , 2018 .

[13]  Joseph J. Lim,et al.  Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding , 2021, Robotics: Science and Systems.

[14]  Aviv Tamar,et al.  Efficient Self-Supervised Data Collection for Offline Robot Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[16]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[17]  Raia Hadsell,et al.  Beyond Pick-and-Place: Tackling Robotic Stacking of Diverse Shapes , 2021, CoRL.

[18]  Minwoo Lee,et al.  Faster reinforcement learning after pretraining deep networks to predict state dynamics , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19]  Aviv Tamar,et al.  Harnessing Reinforcement Learning for Neural Motion Planning , 2019, Robotics: Science and Systems.

[20]  Ivan Laptev,et al.  Learning Obstacle Representations for Neural Motion Planning , 2020, ArXiv.

[21]  Tianwei Ni,et al.  Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient , 2020, ArXiv.

[22]  Gaurav S. Sukhatme,et al.  Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments , 2020, CoRL.

[23]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[24]  Lambert Schomaker,et al.  Self-Imitation Learning by Planning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Silvio Savarese,et al.  ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation , 2020, ArXiv.

[26]  Marco Pavone,et al.  A convex optimization approach to smooth trajectories for motion planning with car-like robots , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[27]  S. Levine,et al.  Accelerating Online Reinforcement Learning with Offline Datasets , 2020, ArXiv.

[28]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[29]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[32]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[33]  Joseph J. Lim,et al.  IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks , 2019, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[35]  Michael C. Yip,et al.  Deeply Informed Neural Sampling for Robot Motion Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Silvio Savarese,et al.  SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark , 2018, CoRL.