GoSafe: Globally Optimal Safe Robot Learning

When learning policies for robotic systems from data, safety is a major concern, as violation of safety constraints may cause hardware damage. SafeOpt is an efficient Bayesian optimization (BO) algorithm that can learn policies while guaranteeing safety with high probability. However, its search space is limited to an initially given safe region. We extend this method by exploring outside the initial safe area while still guaranteeing safety with high probability. This is achieved by learning a set of initial conditions from which we can recover safely using a learned backup controller in case of a potential failure. We derive conditions for guaranteed convergence to the global optimum and validate GoSafe in hardware experiments.

[1]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[2]  Donald R. Jones,et al.  Global versus local search in constrained optimization of computer models , 1998 .

[3]  Jos'e Miguel Hern'andez-Lobato,et al.  Excursion Search for Constrained Bayesian Optimization under a Limited Budget of Failures , 2020, ArXiv.

[4]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[5]  Victor Picheny,et al.  A Stepwise uncertainty reduction approach to constrained global optimization , 2014, AISTATS.

[6]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Christopher G. Atkeson,et al.  Deep Kernels for Optimizing Locomotion Controllers , 2017, CoRL.

[8]  Robert B. Gramacy,et al.  Optimization Under Unknown Constraints , 2010, 1004.4027.

[9]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[10]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[11]  Oliver Nelles,et al.  Safe Active Learning of a High Pressure Fuel Supply System , 2018, Proceedings of The 9th EUROSIM Congress on Modelling and Simulation, EUROSIM 2016, The 57th SIMS Conference on Simulation and Modelling SIMS 2016.

[12]  Sebastian Trimpe,et al.  Robust Model-free Reinforcement Learning with Multi-objective Bayesian Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Andreas Krause,et al.  Safe learning of regions of attraction for uncertain, nonlinear systems with Gaussian processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[14]  Boris T. Polyak,et al.  Stability regions in the parameter space: D-decomposition revisited , 2006, Autom..

[15]  K Furuta,et al.  Swing-up Control of Inverted Pendulum Using Pseudo-State Feedback , 1992 .

[16]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[17]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[18]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[19]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[20]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[21]  Ludovic Righetti,et al.  Robot Learning with Crash Constraints , 2020, ArXiv.

[22]  Ofir Nachum,et al.  A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.

[23]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[24]  KrauseAndreas,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2012 .

[25]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[26]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[27]  Matthew W. Hoffman,et al.  A General Framework for Constrained Bayesian Optimization using Information-based Search , 2015, J. Mach. Learn. Res..

[28]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[29]  Felix Berkenkamp,et al.  Safe Exploration for Interactive Machine Learning , 2019, NeurIPS.

[30]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[32]  Sebastian Trimpe,et al.  A Learnable Safety Measure , 2019, CoRL.