Safe controller optimization for quadrotors with Gaussian processes

One of the most fundamental problems when designing controllers for dynamic systems is the tuning of the controller parameters. Typically, a model of the system is used to obtain an initial controller, but ultimately the controller parameters must be tuned manually on the real system to achieve the best performance. To avoid this manual tuning step, methods from machine learning, such as Bayesian optimization, have been used. However, as these methods evaluate different controller parameters on the real system, safety-critical system failures may happen. In this paper, we overcome this problem by applying, for the first time, a recently developed safe optimization algorithm, SafeOpt, to the problem of automatic controller parameter tuning. Given an initial, low-performance controller, SafeOpt automatically optimizes the parameters of a control law while guaranteeing safety. It models the underlying performance measure as a Gaussian process and only explores new controller parameters whose performance lies above a safe performance threshold with high probability. Experimental results on a quadrotor vehicle indicate that the proposed method enables fast, automatic, and safe optimization of controller parameters without human intervention.

[1]  Tore Hägglund,et al.  Automatic Tuning and Adaptation for PID Controllers - A Survey , 1992 .

[2]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[3]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[4]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Yuval Davidor,et al.  Genetic Algorithms and Robotics - A Heuristic Strategy for Optimization , 1991, World Scientific Series in Robotics and Intelligent Systems.

[7]  Stefan Schaal,et al.  Automatic LQR tuning based on Gaussian process global optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[9]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[10]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[11]  Jan Peters,et al.  An experimental comparison of Bayesian optimization for bipedal locomotion , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[13]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[14]  Jaime F. Fisac,et al.  Reachability-based safe learning with Gaussian processes , 2014, 53rd IEEE Conference on Decision and Control.

[15]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[16]  Stefan Schaal,et al.  Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results , 2015 .

[17]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[18]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[19]  Angela P. Schoellig,et al.  Safe and robust learning control with Gaussian processes , 2015, 2015 European Control Conference (ECC).

[20]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[22]  M. Krstic,et al.  PID tuning using extremum seeking: online, model-free performance optimization , 2006, IEEE Control Systems.