Learning Stable Normalizing-Flow Control for Robotic Manipulation

Reinforcement Learning (RL) of robotic manipulation skills, despite its impressive successes, stands to benefit from incorporating domain knowledge from control theory. One of the most important properties that is of interest is control stability. Ideally, one would like to achieve stability guarantees while staying within the framework of state-of-the-art deep RL algorithms. Such a solution does not exist in general, especially one that scales to complex manipulation tasks. We contribute towards closing this gap by introducing $\textit{normalizing-flow}$ control structure, that can be deployed in any latest deep RL algorithms. While stable exploration is not guaranteed, our method is designed to ultimately produce deterministic controllers with provable stability. In addition to demonstrating our method on challenging contact-rich manipulation tasks, we also show that it is possible to achieve considerable exploration efficiency--reduced state space coverage and actuation efforts--without losing learning efficiency.

[1]  Danica Kragic,et al.  Data-Efficient Model Learning and Prediction for Contact-Rich Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[2]  Jan Peirs,et al.  Learning the peg-into-hole assembly operation with a connectionist reinforcement technique , 1995 .

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[6]  Solomon Lefschetz,et al.  Stability by Liapunov's Direct Method With Applications , 1962 .

[7]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  Seung-kook Yun,et al.  Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion? , 2008, 2008 IEEE International Conference on Robotics and Automation.

[11]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[12]  Aude Billard,et al.  Modeling robot discrete movements with state-varying stiffness and damping: A framework for integrated motion generation and impedance control , 2014, Robotics: Science and Systems.

[13]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[15]  N. Hogan,et al.  Impedance and Interaction Control , 2018 .

[16]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[17]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[19]  Danica Kragic,et al.  Stability-Guaranteed Reinforcement Learning for Contact-Rich Manipulation , 2021, IEEE Robotics and Automation Letters.

[20]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[21]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.

[22]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[23]  Byron Boots,et al.  Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems , 2020, L4DC.

[24]  Ana Paiva,et al.  Learning cost function and trajectory for robotic writing motion , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[25]  Ludovic Righetti,et al.  Learning Variable Impedance Control for Contact Sensitive Tasks , 2019, IEEE Robotics and Automation Letters.

[26]  Neville Hogan,et al.  Robust control of dynamically interacting systems , 1988 .