Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and propose a new approach for closed-loop system identification, estimation, and confidence bound construction. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LqgOpt, and prove the regret upper bound of O(√T) for adaptive control of linear quadratic Gaussian (LQG) systems, where T is the time horizon of the problem.

[1]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[2]  Yi Zhang,et al.  Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[3]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[4]  Sanjeev Arora,et al.  Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[5]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[6]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[7]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[8]  George J. Pappas,et al.  Sample Complexity of Kalman Filtering for Unknown Systems , 2019, L4DC.

[9]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[10]  Holden Lee,et al.  Robust guarantees for learning an autoregressive filter , 2019, ALT.

[11]  Benjamin Recht,et al.  Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[12]  Richard W. Longman,et al.  System identification from closed-loop data with known output feedback dynamics , 1994 .

[13]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[14]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[15]  M. Phan,et al.  Integrated system identification and state estimation for control offlexible space structures , 1992 .

[16]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[17]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[18]  Torben Knudsen Consistency analysis of subspace identification methods based on a linear regression approach , 2001, Autom..

[19]  Bruce Lee,et al.  Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction , 2019, 2020 59th IEEE Conference on Decision and Control (CDC).

[20]  L. Meng,et al.  The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .

[21]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[22]  M. Phan,et al.  Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[23]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[24]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[25]  George J. Pappas,et al.  Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[26]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[27]  Samet Oymak,et al.  Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[28]  Alessandro Lazaric,et al.  Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[29]  Munther A. Dahleh,et al.  Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[30]  Doreen Meier,et al.  Introduction To Stochastic Control Theory , 2016 .

[31]  Thomas B. Schön,et al.  Robust exploration in linear quadratic reinforcement learning , 2019, NeurIPS.

[32]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[33]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[34]  Max Simchowitz,et al.  Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[35]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[36]  Babak Hassibi,et al.  Regret Minimization in Partially Observable Linear Quadratic Control , 2020, ArXiv.

[37]  P. Wedin Perturbation theory for pseudo-inverses , 1973 .

[38]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[39]  Nevena Lazic,et al.  Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[40]  Karan Singh,et al.  Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[41]  Yi Ouyang,et al.  Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[42]  Ambuj Tewari,et al.  Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[43]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .