CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based RL

We propose a vision-based reinforcement learning (RL) approach for closed-loop trajectory generation in an arm reaching problem. Arm trajectory generation is a fundamental robotics problem which entails finding collision-free paths to move the robot’s body (e.g. arm) in order to satisfy a goal (e.g. place end-effector at a point). While classical methods typically require the model of the environment to solve a planning, search or optimization problem, learning-based approaches hold the promise of directly mapping from observations to robot actions. However, learning a collision-avoidance policy using RL remains a challenge for various reasons, including, but not limited to, partial observability, poor exploration, low sample efficiency, and learning instabilities. To address these challenges, we present a residual-RL method that leverages a greedy goal-reaching RL policy as the base to improve exploration, and the base policy is augmented with residual state-action values and residual actions learned from images to avoid obstacles. Further more, we introduce novel learning objectives and techniques to improve 3D understanding from multiple image views and sample efficiency of our algorithm. Compared to RL baselines, our method achieves superior performance in terms of success rate.

[1]  Pieter Abbeel,et al.  Visual Hindsight Experience Replay , 2019, ArXiv.

[2]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[3]  Marco Pavone,et al.  Robot Motion Planning in Learned Latent Spaces , 2018, IEEE Robotics and Automation Letters.

[4]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[5]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[6]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[7]  Shuran Song,et al.  Learning a Decentralized Multi-arm Motion Planner , 2020, CoRL.

[8]  Nan Jiang,et al.  Hierarchical Imitation and Reinforcement Learning , 2018, ICML.

[9]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[10]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[11]  Kostas E. Bekris,et al.  Asymptotically Optimal Sampling-based Planners , 2019, ArXiv.

[12]  Aleksandr I. Panov,et al.  Grid Path Planning with Deep Reinforcement Learning: Preliminary Results , 2017, BICA.

[13]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[14]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[16]  Gireeja Ranade,et al.  Data-driven planning via imitation learning , 2017, Int. J. Robotics Res..

[17]  Joelle Pineau,et al.  Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.

[18]  Jason J. Corso,et al.  A Critical Investigation of Deep Reinforcement Learning for Navigation , 2018, ArXiv.

[19]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[20]  Gert Kootstra,et al.  International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.

[21]  Alberto Rodriguez,et al.  TossingBot: Learning to Throw Arbitrary Objects With Residual Physics , 2019, IEEE Transactions on Robotics.

[22]  Dmitry Kalashnikov,et al.  Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Franziska Meier,et al.  SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Control , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Aviv Tamar,et al.  Harnessing Reinforcement Learning for Neural Motion Planning , 2019, Robotics: Science and Systems.

[26]  Allan Jabri,et al.  Universal Planning Networks , 2018, ICML.

[27]  Stefan Schaal,et al.  STOMP: Stochastic trajectory optimization for motion planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[28]  Marco Pavone,et al.  Learning Sampling Distributions for Robot Motion Planning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[30]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[31]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[32]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[33]  Michael C. Yip,et al.  Motion Planning Networks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[34]  Lydia E. Kavraki,et al.  Probabilistic Roadmaps for Robot Path Planning , 1998 .

[35]  Boris Katz,et al.  Deep Sequential Models for Sampling-Based Planning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).