RMP2: A Structured Composable Policy Class for Robot Learning

We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class. We leverage a multi-task control framework called RMPflow which has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning of RMPflow policies faces several computational challenges. In this work, we re-examine the RMPflow algorithm and propose a more practical alternative, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) simple programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning that is suitable for encoding domain knowledge. Our experiments show that using the structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space. The video for our experimental results can be found at https://youtu.be/dliQ-jsYhgI and the code is available at https://github.com/UWRobotLearning/rmp2.

[1]  Dieter Fox,et al.  Neural Autonomous Navigation with Riemannian Motion Policy , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[5]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[6]  Jonathan Tremblay,et al.  Joint Space Control via Deep Reinforcement Learning , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Byron Boots,et al.  Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[9]  Byron Boots,et al.  Riemannian Motion Policy Fusion through Learnable Lyapunov Function Reshaping , 2019, CoRL.

[10]  Mohammad Ghavamzadeh,et al.  Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[11]  Dieter Fox,et al.  Representing Robot Task Plans as Robust Logical-Dynamical Systems , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Bruce Christianson,et al.  Automatic Hessians by reverse accumulation , 1992 .

[13]  Byron Boots,et al.  Learning Reactive Motion Policies in Multiple Task Spaces from Human Demonstrations , 2019, CoRL.

[14]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[15]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[16]  Jan Peters,et al.  ImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Byron Boots,et al.  RMPflow: A Geometric Framework for Generation of Multitask Motion Policies , 2020, IEEE Transactions on Automation Science and Engineering.

[18]  Byron Boots,et al.  Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems , 2020, L4DC.

[19]  Byron Boots,et al.  Multi-Objective Policy Generation for Multi-Robot Systems Using Riemannian Motion Policies , 2019, ISRR.

[20]  Seth Hutchinson,et al.  Extending Riemmanian Motion Policies to a Class of Underactuated Wheeled-Inverted-Pendulum Robots , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Yevgen Chebotar,et al.  Learning Latent Space Dynamics for Tactile Servoing , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[22]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[23]  Sami Haddadin,et al.  Learning Vision-based Reactive Policies for Obstacle Avoidance , 2020, CoRL.

[24]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[25]  A. D. Lewis,et al.  Geometric control of mechanical systems : modeling, analysis, and design for simple mechanical control systems , 2005 .

[26]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[27]  P. Olver Nonlinear Systems , 2013 .

[28]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[29]  Jun Nakanishi,et al.  Operational Space Control: A Theoretical and Empirical Comparison , 2008, Int. J. Robotics Res..

[30]  Byron Boots,et al.  RMPflow: A Computational Graph for Automatic Motion Policy Generation , 2018, WAFR.

[31]  Byron Boots,et al.  Stable, Concurrent Controller Composition for Multi-Objective Robotic Tasks , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[32]  Stefan Schaal,et al.  Real-Time Perception Meets Reactive Motion Generation , 2017, IEEE Robotics and Automation Letters.

[33]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[34]  Francisco J. Rodríguez Lera,et al.  Reinforcement Learning Experiments and Benchmark for Solving Robotic Reaching Tasks , 2020, WAF.

[35]  Daniel Kappler,et al.  Riemannian Motion Policies , 2018, ArXiv.