Learning convex bounds for linear quadratic control policy synthesis

Learning to make decisions from observed data in dynamic environments remains a problem of fundamental importance in a number of fields, from artificial intelligence and robotics, to medicine and finance. This paper concerns the problem of learning control policies for unknown linear dynamical systems so as to maximize a quadratic reward function. We present a method to optimize the expected value of the reward over the posterior distribution of the unknown system parameters, given data. The algorithm involves sequential convex programing, and enjoys reliable local convergence and robust stability guarantees. Numerical simulations and stabilization of a real-world inverted pendulum are used to demonstrate the approach, with strong performance and robustness properties observed in both.

[1]  Nikolai Matni,et al.  A System-Level Approach to Controller Synthesis , 2016, IEEE Transactions on Automatic Control.

[2]  Giuseppe Carlo Calafiore,et al.  Research on probabilistic methods for control system design , 2011, Autom..

[3]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[6]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[7]  Mircea Lazar,et al.  A sampling approach to finding Lyapunov functions for nonlinear discrete-time systems , 2016, 2016 European Control Conference (ECC).

[8]  Nikolai Matni,et al.  Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[9]  Fredrik Lindsten,et al.  Estimation of linear systems using a Gibbs sampler , 2012 .

[10]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[11]  Mathukumalli Vidyasagar,et al.  Randomized algorithms for robust controller synthesis using statistical learning theory , 2001, Autom..

[12]  Duy Nguyen-Tuong,et al.  Stability of Controllers for Gaussian Process Forward Models , 2016, ICML.

[13]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[14]  Benjamin Recht,et al.  Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator , 2017, ICML.

[15]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[18]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[19]  Anton van den Hengel,et al.  Semidefinite Programming , 2014, Computer Vision, A Reference Guide.

[20]  J. Beck,et al.  Bayesian Model Updating Using Hybrid Monte Carlo Simulation with Application to Structural Dynamic Models with Many Uncertain Parameters , 2009 .

[21]  Angela P. Schoellig,et al.  Robust Constrained Learning-based NMPC enabling reliable mobile robot path tracking , 2016, Int. J. Robotics Res..

[22]  J. Doyle,et al.  Essentials of Robust Control , 1997 .

[23]  Giuseppe Carlo Calafiore,et al.  The scenario approach to robust control design , 2006, IEEE Transactions on Automatic Control.

[24]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[25]  B. Pasik-Duncan Control-oriented system identification: An H∞ approach , 2002 .

[26]  Mathukumalli Vidyasagar,et al.  Statistical learning theory and randomized algorithms for control , 1998 .

[27]  Ian R. Petersen,et al.  Minimax optimal control of stochastic uncertain systems with relative entropy constraints , 2000, IEEE Trans. Autom. Control..

[28]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[29]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization , 1997 .

[30]  Guido Herrmann,et al.  Robust control applications , 2007, Annu. Rev. Control..

[31]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[32]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[33]  I. Michael Ross,et al.  Unscented Optimal Control for Space Flight , 2013 .

[34]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[35]  Finale Doshi-Velez,et al.  Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning , 2017, ICML.

[36]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[37]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[38]  Giuseppe Carlo Calafiore,et al.  Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..

[39]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[40]  James L. Beck,et al.  Bayesian Linear Structural Model Updating using Gibbs Sampler with Modal Data , 2005 .

[41]  Lieven Vandenberghe,et al.  Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[42]  F. Vaida PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[43]  Mario Sznaier,et al.  Randomized Algorithms for Analysis and Control of Uncertain Systems with Applications, Second Edition, Roberto Tempo, Giuseppe Calafiore, Fabrizio Dabbene (Eds.). Springer-Verlag, London (2013), 357, ISBN: 978-1-4471-4609-4 , 2014, Autom..

[44]  E. Yaz,et al.  Linear optimal control, H2 and H∞ methods, by Jeffrey B. Burl, Addison Wesley Longman, Inc. Menlo Park, CA, 1999 , 2000 .

[45]  Benjamin Recht,et al.  Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification , 2017, ArXiv.

[46]  D. Bernstein,et al.  Mixed-norm H2/H∞ regulation and estimation: the discrete-time case , 1991, 1991 American Control Conference.

[47]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[48]  Qi Gong,et al.  Riemann–Stieltjes Optimal Control Problems for Uncertain Dynamic Systems , 2015 .

[49]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[50]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[51]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[52]  Brett Ninness,et al.  Bayesian system identification via Markov chain Monte Carlo techniques , 2010, Autom..

[53]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.