Stability-Guaranteed Reinforcement Learning for Contact-Rich Manipulation

Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance control and stability guarantees. Recently, the concept of variable impedance control (VIC) was adopted into RL with encouraging results. However, the more important issue of stability remains unaddressed. To clarify the challenge in stable RL, we introduce the term all-the-time-stability that unambiguously means that every possible rollout should be stability certified. Our contribution is a model-free RL method that not only adopts VIC but also achieves all-the-time-stability. Building on a recently proposed stable VIC controller as the policy parameterization, we introduce a novel policy search algorithm that is inspired by Cross-Entropy Method and inherently guarantees stability. Our experimental studies confirm the feasibility and usefulness of stability guarantee and also features, to the best of our knowledge, the first successful application of RL with all-the-time-stability on the benchmark problem of peg-in-hole.

[1]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[2]  Aude Billard,et al.  A Dynamical-System-Based Approach for Controlling Robotic Manipulators During Noncontact/Contact Transitions , 2018, IEEE Robotics and Automation Letters.

[3]  Danica Kragic,et al.  Data-Efficient Model Learning and Prediction for Contact-Rich Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[4]  Jan Peirs,et al.  Learning the peg-into-hole assembly operation with a connectionist reinforcement technique , 1995 .

[5]  Silvio Savarese,et al.  Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[7]  Christian Ott,et al.  Unified Impedance and Admittance Control , 2010, 2010 IEEE International Conference on Robotics and Automation.

[8]  Dirk P. Kroese,et al.  The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .

[9]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[10]  Aude Billard,et al.  Learning Compliant Manipulation through Kinesthetic and Tactile Human-Robot Interaction , 2014, IEEE Transactions on Haptics.

[11]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  G. Oriolo,et al.  Robotics: Modelling, Planning and Control , 2008 .

[13]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Ross A. Knepper,et al.  DeepMPC: Learning Deep Latent Features for Model Predictive Control , 2015, Robotics: Science and Systems.

[15]  Ufuk Topcu,et al.  Constrained Cross-Entropy Method for Safe Reinforcement Learning , 2020, IEEE Transactions on Automatic Control.

[16]  Aude Billard,et al.  Stability Considerations for Variable Impedance Control , 2016, IEEE Transactions on Robotics.

[17]  Cristian Secchi,et al.  A tank-based approach to impedance control with variable stiffness , 2013, 2013 IEEE International Conference on Robotics and Automation.

[18]  Alice M. Agogino,et al.  Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Sergey Levine,et al.  Learning from the hindsight plan — Episodic MPC improvement , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Neville Hogan,et al.  Impedance Control: An Approach to Manipulation: Part I—Theory , 1985 .

[21]  Roderic A. Grupen,et al.  Learning reactive admittance control , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[22]  Pieter Abbeel,et al.  Learning Robotic Assembly from CAD , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[24]  Seung-kook Yun,et al.  Compliant manipulation for peg-in-hole: Is passive compliance a key to learn contact motion? , 2008, 2008 IEEE International Conference on Robotics and Automation.

[25]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[26]  Bruno Siciliano,et al.  A survey of robot interaction control schemes with experimental comparison , 1999 .

[27]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.

[28]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[29]  Silvio Savarese,et al.  Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Klas Kronander,et al.  Control and Learning of Compliant Manipulation Skills , 2015 .

[31]  Aude Billard,et al.  Learning motions from demonstrations and rewards with time-invariant dynamical systems based policies , 2018, Auton. Robots.

[32]  Stefan Schaal,et al.  Learning force control policies for compliant manipulation , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[33]  Weiping Li,et al.  Applied Nonlinear Control , 1991 .

[34]  Darwin G. Caldwell,et al.  Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Nolan Wagener,et al.  Learning contact-rich manipulation skills with guided policy search , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Oussama Khatib,et al.  Learning potential functions from human demonstrations with encapsulated dynamic and compliant behaviors , 2017, Auton. Robots.

[37]  Aude Billard,et al.  Modeling robot discrete movements with state-varying stiffness and damping: A framework for integrated motion generation and impedance control , 2014, Robotics: Science and Systems.