Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics

Selecting the right tuning parameters for algorithms is a pravelent problem in machine learning that can significantly affect the performance of algorithms. Data-efficient optimization algorithms, such as Bayesian optimization, have been used to automate this process. During experiments on real-world systems such as robotic platforms these methods can evaluate unsafe parameters that lead to safety-critical system failures and can destroy the system. Recently, a safe Bayesian optimization algorithm, called SafeOpt, has been developed, which guarantees that the performance of the system never falls below a critical value; that is, safety is defined based on the performance function. However, coupling performance and safety is often not desirable in practice, since they are often opposing objectives. In this paper, we present a generalized algorithm that allows for multiple safety constraints separate from the objective. Given an initial set of safe parameters, the algorithm maximizes performance but only evaluates parameters that satisfy safety for all constraints with high probability. To this end, it carefully explores the parameter space by exploiting regularity assumptions in terms of a Gaussian process prior. Moreover, we show how context variables can be used to safely transfer knowledge to new situations and tasks. We provide a theoretical analysis and demonstrate that the proposed algorithm enables fast, automatic, and safe optimization of tuning parameters in experiments on a quadrotor vehicle.

[1]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[2]  Yuval Davidor,et al.  Genetic Algorithms and Robotics - A Heuristic Strategy for Optimization , 1991, World Scientific Series in Robotics and Intelligent Systems.

[3]  Tore Hägglund,et al.  Automatic Tuning and Adaptation for PID Controllers—A Survey , 1992 .

[4]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[5]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[6]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  S. Ghosal,et al.  Posterior consistency of Gaussian process prior for nonparametric binary regression , 2006, math/0702686.

[9]  M. Krstic,et al.  PID tuning using extremum seeking: online, model-free performance optimization , 2006, IEEE Control Systems.

[10]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[12]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[13]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[14]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[15]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[16]  Stefan Schaal,et al.  Learning Control in Robotics , 2010, IEEE Robotics & Automation Magazine.

[17]  Carl E. Rasmussen,et al.  Additive Gaussian Processes , 2011, NIPS.

[18]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[19]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[20]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Sergei Lupashin,et al.  Feasiblity of motion primitives for choreographed quadrocopter flight , 2011, Proceedings of the 2011 American Control Conference.

[22]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[23]  Raffaello D'Andrea,et al.  Feed-forward parameter identification for precise periodic quadrocopter motions , 2012, 2012 American Control Conference (ACC).

[24]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[25]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[26]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[27]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[28]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[29]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[30]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[31]  Sergei Lupashin,et al.  A platform for aerial robotics research and demonstration: The Flying Machine Arena , 2014 .

[32]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[33]  Jan Peters,et al.  An experimental comparison of Bayesian optimization for bipedal locomotion , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[35]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[36]  Stefan Schaal,et al.  Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results , 2015 .

[37]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[38]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[41]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[42]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[43]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[45]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[46]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[47]  Andreas Krause,et al.  Constrained Bayesian Optimization with Particle Swarms for Safe Adaptive Controller Tuning , 2017 .