Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

The last half decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision-making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. It includes learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As data- and learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximity to humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches. Expected final online publication date for the Annual Review of Control, Robotics, and Autonomous Systems, Volume 5 is May 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

[1]  Andreas Krause,et al.  The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamical Systems , 2018, CoRL.

[2]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[3]  Frank L. Lewis,et al.  Optimal and Autonomous Control Using Reinforcement Learning: A Survey , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[5]  Xiaojing Zhang,et al.  Adaptive MPC with Chance Constraints for FIR Systems , 2018, 2018 Annual American Control Conference (ACC).

[6]  Angela P. Schoellig,et al.  Robust adaptive model predictive control for guaranteed fast and accurate stabilization in the presence of model errors , 2021, International Journal of Robust and Nonlinear Control.

[7]  Sehoon Ha,et al.  Learning to be Safe: Deep RL with a Safety Critic , 2020, ArXiv.

[8]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[9]  Martin Buss,et al.  A General Framework to Increase Safety of Learning Algorithms for Dynamical Systems Based on Region of Attraction Estimation , 2020, IEEE Transactions on Robotics.

[10]  Vladlen Koltun,et al.  Deep Drone Racing: From Simulation to Reality With Domain Randomization , 2019, IEEE Transactions on Robotics.

[11]  Jun Morimoto,et al.  Robust Reinforcement Learning , 2005, Neural Computation.

[12]  Alexandre M. Bayen,et al.  A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games , 2005, IEEE Transactions on Automatic Control.

[13]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[14]  Jaime F. Fisac,et al.  A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems , 2017, IEEE Transactions on Automatic Control.

[15]  Aaron D. Ames,et al.  Adaptive Safety with Control Barrier Functions , 2019, 2020 American Control Conference (ACC).

[16]  Ke Dong,et al.  Catch the Ball: Accurate High-Speed Motions for Mobile Manipulators via Inverse Dynamics Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  F. Allgöwer,et al.  A robust adaptive model predictive control framework for nonlinear uncertain systems , 2019, International Journal of Robust and Nonlinear Control.

[18]  Francesco Borrelli,et al.  Learning Model Predictive Control for Iterative Tasks. A Data-Driven Control Framework , 2016, IEEE Transactions on Automatic Control.

[19]  Koushil Sreenath,et al.  Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions , 2020, Robotics: Science and Systems.

[20]  Torsten Koller,et al.  Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning , 2019, ArXiv.

[21]  Brijen Thananjeyan,et al.  Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones , 2020, IEEE Robotics and Automation Letters.

[22]  Marco Pavone,et al.  Risk-Constrained Reinforcement Learning with Percentile Risk Criteria , 2015, J. Mach. Learn. Res..

[23]  Yang Gao,et al.  Risk Averse Robust Adversarial Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Kim Peter Wabersich,et al.  Linear Model Predictive Safety Certification for Learning-Based Control , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[25]  Sebastian Trimpe,et al.  GoSafe: Globally Optimal Safe Robot Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Ali-akbar Agha-mohammadi,et al.  Deep Learning Tubes for Tube MPC , 2020, Robotics: Science and Systems.

[27]  Giovanni De Magistris,et al.  OptLayer - Practical Constrained Optimization for Deep Reinforcement Learning in the Real World , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Koushil Sreenath,et al.  Robust Control Barrier–Value Functions for Safety-Critical Control , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[29]  Jonathan P. How,et al.  Safe Reinforcement Learning With Model Uncertainty Estimates , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[30]  David Q. Mayne,et al.  Robust model predictive control of constrained linear systems with bounded disturbances , 2005, Autom..

[31]  H. V. D. Vorst,et al.  Model Order Reduction: Theory, Research Aspects and Applications , 2008 .

[32]  Girish Chowdhary,et al.  Deep Model Reference Adaptive Control , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[33]  Paulo Tabuada,et al.  Control Barrier Functions: Theory and Applications , 2019, 2019 18th European Control Conference (ECC).

[34]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[35]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[36]  Le Song,et al.  SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation , 2017, ICML.

[37]  Soon-Jo Chung,et al.  Neural Lander: Stable Drone Landing Control Using Learned Dynamics , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[38]  Jean-Jacques E. Slotine,et al.  Robust Adaptive Control Barrier Functions: An Adaptive and Data-Driven Approach to Safety , 2021, IEEE Control Systems Letters.

[39]  Sonia Chernova,et al.  Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[40]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[41]  Laurent El Ghaoui,et al.  Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[42]  Girish Chowdhary,et al.  Asynchronous Deep Model Reference Adaptive Control , 2020, CoRL.

[43]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[44]  E. Altman Constrained Markov Decision Processes , 1999 .

[45]  Pieter Abbeel,et al.  Robust Reinforcement Learning using Adversarial Populations , 2020, ArXiv.

[46]  Li Wang,et al.  Safe Learning of Quadrotor Dynamics Using Barrier Certificates , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47]  Mohammad Ghavamzadeh,et al.  Lyapunov-based Safe Policy Optimization for Continuous Control , 2019, ArXiv.

[48]  Frank Allgöwer,et al.  Learning-Based Robust Model Predictive Control with State-Dependent Uncertainty , 2018 .

[49]  Lukas Hewing,et al.  Learning-Based Model Predictive Control: Toward Safe Learning in Control , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[50]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[51]  Sebastian Trimpe,et al.  Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[52]  R. Allmendinger,et al.  Safe Learning and Optimization Techniques: Towards a Survey of the State of the Art , 2021, TAILOR.

[53]  I. Mezić Controllability, integrability and ergodicity , 2003 .

[54]  Claire J. Tomlin,et al.  Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor , 2012, 2012 IEEE International Conference on Robotics and Automation.

[55]  Sergey Levine,et al.  Uncertainty-Aware Reinforcement Learning for Collision Avoidance , 2017, ArXiv.

[56]  Jonathan P. How,et al.  Bayesian Nonparametric Adaptive Control Using Gaussian Processes , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[57]  Joelle Pineau,et al.  Constrained Markov Decision Processes via Backward Value Functions , 2020, ICML.

[58]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[59]  Angela P. Schoellig,et al.  Exploiting Differential Flatness for Robust Learning-Based Tracking Control Using Gaussian Processes , 2021, IEEE Control Systems Letters.

[60]  Sebastian Trimpe,et al.  Probabilistic robust linear quadratic regulators with Gaussian processes , 2021, L4DC.

[61]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[62]  Scott Kuindersma,et al.  Modeling and Control of Legged Robots , 2016, Springer Handbook of Robotics, 2nd Ed..

[63]  Aaron D. Ames,et al.  A Control Barrier Perspective on Episodic Learning via Projection-to-State Safety , 2020, IEEE Control Systems Letters.

[64]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[65]  Lorenzo Fagiano,et al.  Adaptive receding horizon control for constrained MIMO systems , 2014, Autom..

[66]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[67]  Mi-Ching Tsai,et al.  Robust and Optimal Control , 2014 .

[68]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[69]  Raffaello D'Andrea,et al.  Iterative learning of feed-forward corrections for high-performance tracking , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[70]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[71]  Jonathan P. How,et al.  Experimental Validation of Bayesian Nonparametric Adaptive Control Using Gaussian Processes , 2014, J. Aerosp. Inf. Syst..

[72]  Yisong Yue,et al.  Safe Exploration and Optimization of Constrained MDPs Using Gaussian Processes , 2018, AAAI.

[73]  S. Levine,et al.  Conservative Q-Learning for Offline Reinforcement Learning , 2020, NeurIPS.

[74]  Aaron D. Ames,et al.  Guaranteeing Safety of Learned Perception Modules via Measurement-Robust Control Barrier Functions , 2020, CoRL.

[75]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[76]  Angela P. Schoellig,et al.  Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of Lagrangian Systems , 2019, IEEE Robotics and Automation Letters.

[77]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[78]  Angela P. Schoellig,et al.  Deep neural networks as add-on modules for enhancing robot performance in impromptu trajectory tracking , 2020, Int. J. Robotics Res..

[79]  Dario Amodei,et al.  Benchmarking Safe Exploration in Deep Reinforcement Learning , 2019 .

[80]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[81]  Aaron D. Ames,et al.  Towards Robust Data-Driven Control Synthesis for Nonlinear Systems with Actuation Uncertainty , 2020, 2021 60th IEEE Conference on Decision and Control (CDC).

[82]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[83]  Kim P. Wabersich,et al.  A predictive safety filter for learning-based control of constrained nonlinear dynamical systems , 2021, Autom..

[84]  A. Krause,et al.  Risk-Averse Offline Reinforcement Learning , 2021, ICLR.

[85]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[86]  Juraj Kabzan,et al.  Cautious Model Predictive Control Using Gaussian Process Regression , 2017, IEEE Transactions on Control Systems Technology.

[87]  Jonathan P. How,et al.  Certified Adversarial Robustness for Deep Reinforcement Learning , 2019, CoRL.

[88]  Li Wang,et al.  Barrier-Certified Adaptive Reinforcement Learning With Applications to Brushbot Navigation , 2018, IEEE Transactions on Robotics.

[89]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[90]  S. Sastry,et al.  Adaptive Control: Stability, Convergence and Robustness , 1989 .

[91]  Christopher D. McKinnon,et al.  Experience-Based Model Selection to Enable Long-Term, Safe Control for Repetitive Tasks Under Changing Conditions , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[92]  G. Dullerud,et al.  A Course in Robust Control Theory: A Convex Approach , 2005 .

[93]  Massimo Franceschetti,et al.  Probabilistic Safety Constraints for Learned High Relative Degree System Dynamics , 2020, L4DC.

[94]  Claire J. Tomlin,et al.  An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[95]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[96]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[97]  Aaron D. Ames,et al.  A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[98]  Tommaso Mannucci,et al.  Safe Exploration Algorithms for Reinforcement Learning Controllers , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[99]  Swarat Chaudhuri,et al.  Control Regularization for Reduced Variance Reinforcement Learning , 2019, ICML.

[100]  Foutse Khomh,et al.  How to certify machine learning based safety-critical systems? A systematic literature review , 2021, Automated Software Engineering.

[101]  Jaime F. Fisac,et al.  Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[102]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[103]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[104]  D. Negrut,et al.  The Role of Physics-Based Simulators in Robotics , 2021, Annu. Rev. Control. Robotics Auton. Syst..

[105]  Lukas Hewing,et al.  Probabilistic Model Predictive Safety Certification for Learning-Based Control , 2019, IEEE Transactions on Automatic Control.

[106]  Ryan Feeley,et al.  Some controls applications of sum of squares programming , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[107]  Timothy C. Y. Chan,et al.  Optimizing a Drone Network to Deliver Automated External Defibrillators , 2017, Circulation.

[108]  Xiaojing Zhang,et al.  Adaptive MPC for Iterative Tasks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[109]  Shie Mannor,et al.  Bayesian Reinforcement Learning: A Survey , 2015, Found. Trends Mach. Learn..

[110]  Howie Choset,et al.  Continuum Robots for Medical Applications: A Survey , 2015, IEEE Transactions on Robotics.

[111]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[112]  Qingkai Liang,et al.  Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning , 2018, ArXiv.

[113]  Manfred Morari,et al.  Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks , 2019, NeurIPS.

[114]  Yisong Yue,et al.  Learning for Safety-Critical Control with Control Barrier Functions , 2019, L4DC.

[115]  Chelsea Finn,et al.  Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings , 2020, ICML.

[116]  Javad Lavaei,et al.  Stability-Certified Reinforcement Learning: A Control-Theoretic Perspective , 2018, IEEE Access.

[117]  Christopher D. McKinnon,et al.  Context-aware Cost Shaping to Reduce the Impact of Model Error in Receding Horizon Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[118]  Christopher D. McKinnon,et al.  Learning Probabilistic Models for Safe Predictive Control in Unknown Environments , 2019, 2019 18th European Control Conference (ECC).

[119]  Yuval Tassa,et al.  Safe Exploration in Continuous Action Spaces , 2018, ArXiv.

[120]  Sergey Levine,et al.  Conservative Safety Critics for Exploration , 2021, ICLR.

[121]  Benjamin Recht,et al.  A Tour of Reinforcement Learning: The View from Continuous Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[122]  Martin Guay,et al.  Robust discrete-time set-based adaptive predictive control for nonlinear systems , 2016 .

[123]  David D. Fan,et al.  Bayesian Learning-Based Adaptive Control for Safety Critical Systems , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[124]  Philip S. Thomas,et al.  Towards Safe Policy Improvement for Non-Stationary MDPs , 2020, NeurIPS.

[125]  Andreas Krause,et al.  Constrained Bayesian Optimization with Particle Swarms for Safe Adaptive Controller Tuning , 2017 .

[126]  Naira Hovakimyan,et al.  L1 Adaptive Control Theory - Guaranteed Robustness with Fast Adaptation , 2010, Advances in design and control.

[127]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[128]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[129]  Frank Allgöwer,et al.  Robust MPC with recursive model update , 2019, Autom..

[130]  Kevin L. Moore,et al.  Iterative Learning Control: Brief Survey and Categorization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[131]  Sylvain Calinon,et al.  A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[132]  A.G. Alleyne,et al.  A survey of iterative learning control , 2006, IEEE Control Systems.

[133]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[134]  Angela P. Schoellig,et al.  Zeus: A system description of the two‐time winner of the collegiate SAE autodrive competition , 2020, J. Field Robotics.

[135]  Chengyu Cao,et al.  The use of learning in fast adaptation algorithms , 2014 .

[136]  Joel W. Burdick,et al.  Stagewise Safe Bayesian Optimization with Gaussian Processes , 2018, ICML.

[137]  Carsten W. Scherer,et al.  Controller Design via Experimental Exploration With Robustness Guarantees , 2021, IEEE Control Systems Letters.

[138]  Vector Institute for Artificial Intelligence,et al.  Learning to Fly—a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[139]  G. E. Taylor,et al.  Computer Controlled Systems: Theory and Design , 1985 .

[140]  S. Levine,et al.  Safety Augmented Value Estimation From Demonstrations (SAVED): Safe Deep Model-Based RL for Sparse Cost Robotic Tasks , 2019, IEEE Robotics and Automation Letters.

[141]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.

[142]  Aaron D. Ames,et al.  Episodic Learning with Control Lyapunov Functions for Uncertain Robotic Systems* , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[143]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.