Learning Deep Neural Policies with Stability Guarantees

Reinforcement learning (RL) has been successfully used to solve various robotic control tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes guaranteeing stability during RL difficult is threefold: non interpretable neural network policies, unknown system dynamics and random exploration. We contribute towards solving the stable RL problem in the context of robotic manipulation that may involve physical contact with the environment. Our solution is derived from physics-based prior that originates from Lagrangian mechanics and does not involve learning any dynamics model. We show how to parameterize the resulting $\textit{energy shaping}$ policy as a deep neural network that consists of a convex potential function and a velocity dependent damping component. Our experiments, that include a real-world peg insertion task by a 7-DOF robot, validate the proposed policy structure and demonstrate the benefits of stability in RL.

[1]  Neville Hogan,et al.  Robust control of dynamically interacting systems , 1988 .

[2]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.

[3]  Oussama Khatib,et al.  Learning potential functions from human demonstrations with encapsulated dynamic and compliant behaviors , 2017, Auton. Robots.

[4]  Danica Kragic,et al.  Stability-Guaranteed Reinforcement Learning for Contact-Rich Manipulation , 2021, IEEE Robotics and Automation Letters.

[5]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[6]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[7]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[8]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[9]  Robert Babuska,et al.  Reinforcement Learning for Port-Hamiltonian Systems , 2012, IEEE Transactions on Cybernetics.

[10]  Lixian Zhang,et al.  Actor-Critic Reinforcement Learning for Control With Stability Guarantee , 2020, IEEE Robotics and Automation Letters.

[11]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[12]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[13]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[15]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[16]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[17]  N. Hogan,et al.  Impedance and Interaction Control , 2018 .

[18]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[19]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  Seung-kook Yun,et al.  Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion? , 2008, 2008 IEEE International Conference on Robotics and Automation.

[22]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[23]  J. Zico Kolter,et al.  Learning Stable Deep Dynamics Models , 2020, NeurIPS.

[24]  Danica Kragic,et al.  Data-Efficient Model Learning and Prediction for Contact-Rich Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[25]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[26]  Andrew Jaegle,et al.  Hamiltonian Generative Networks , 2020, ICLR.

[27]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[28]  Danica Kragic,et al.  Learning Stable Normalizing-Flow Control for Robotic Manipulation , 2020, ArXiv.

[29]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Romeo Ortega,et al.  Putting energy back in control , 2001 .

[31]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[32]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[33]  Naomi Ehrich Leonard,et al.  Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control , 2020, NeurIPS.