Learning Deep Energy Shaping Policies for Stability-Guaranteed Manipulation

Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, stability is obtained by deriving an interpretable deep policy structure based on the energy shaping control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on passivity. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.

[1]  J. Zico Kolter,et al.  Learning Stable Deep Dynamics Models , 2020, NeurIPS.

[2]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[3]  Danica Kragic,et al.  Stability-Guaranteed Reinforcement Learning for Contact-Rich Manipulation , 2021, IEEE Robotics and Automation Letters.

[4]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[5]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[6]  N. Hogan,et al.  Impedance and Interaction Control , 2018 .

[7]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[8]  Lei Xu,et al.  Input Convex Neural Networks : Supplementary Material , 2017 .

[9]  Danica Kragic,et al.  Data-Efficient Model Learning and Prediction for Contact-Rich Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[10]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[11]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[12]  Danica Kragic,et al.  Learning Stable Normalizing-Flow Control for Robotic Manipulation , 2020, ArXiv.

[13]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[15]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[16]  Jan Peters,et al.  Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning , 2019, ICLR.

[17]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[18]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[19]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  Seung-kook Yun,et al.  Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion? , 2008, 2008 IEEE International Conference on Robotics and Automation.

[22]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[23]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[24]  Naomi Ehrich Leonard,et al.  Unsupervised Learning of Lagrangian Dynamics from Images for Prediction and Control , 2020, NeurIPS.

[25]  Romeo Ortega,et al.  Putting energy back in control , 2001 .

[26]  Andrew Jaegle,et al.  Hamiltonian Generative Networks , 2020, ICLR.

[27]  Neville Hogan,et al.  Robust control of dynamically interacting systems , 1988 .

[28]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[29]  Robert Babuska,et al.  Reinforcement Learning for Port-Hamiltonian Systems , 2012, IEEE Transactions on Cybernetics.

[30]  Lixian Zhang,et al.  Actor-Critic Reinforcement Learning for Control With Stability Guarantee , 2020, IEEE Robotics and Automation Letters.

[31]  Oussama Khatib,et al.  Learning potential functions from human demonstrations with encapsulated dynamic and compliant behaviors , 2017, Auton. Robots.

[32]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.