Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting

We study the problem of adaptive control in partially observable linear quadratic Gaussian control systems, where the model dynamics are unknown a priori. We propose LQGOPT, a novel adaptive control algorithm based on the principle of optimism in the face of uncertainty, to effectively minimize the overall control cost. We employ the predictor state evolution representation of the system dynamics and deploy a recently proposed closed-loop system identification method, estimation, and confidence bound construction. LQGOPT efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model for further exploration and exploitation. We provide stability guarantees for LQGOPT, and prove the first Õ (√T) regret upper bound for adaptive control of linear quadratic Gaussian (LQG) systems with convex cost, where $T$ is the time horizon of the problem.

[1]  Β. L. HO,et al.  Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[2]  P. Wedin Perturbation theory for pseudo-inverses , 1973 .

[3]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[4]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[5]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[6]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[7]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[8]  M. Phan,et al.  Integrated system identification and state estimation for control offlexible space structures , 1992 .

[9]  M. Phan,et al.  Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[10]  Richard W. Longman,et al.  System identification from closed-loop data with known output feedback dynamics , 1994 .

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[13]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .

[14]  T. Kailath,et al.  Indefinite-quadratic estimation and control: a unified approach to H 2 and H ∞ theories , 1999 .

[15]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[16]  Torben Knudsen Consistency analysis of subspace identification methods based on a linear regression approach , 2001, Autom..

[17]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[18]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[19]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[20]  L. Meng,et al.  The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .

[21]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[22]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[23]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[24]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[25]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[26]  Doreen Meier,et al.  Introduction To Stochastic Control Theory , 2016 .

[27]  Karan Singh,et al.  Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[28]  Mohamad Kazem Shirani Faradonbeh,et al.  Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[29]  Yi Ouyang,et al.  Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[30]  Alessandro Lazaric,et al.  Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[31]  Ambuj Tewari,et al.  Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[32]  Sanjeev Arora,et al.  Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[33]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[34]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[35]  Yi Zhang,et al.  Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[36]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[37]  Max Simchowitz,et al.  Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[38]  Benjamin Recht,et al.  Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[39]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[40]  Munther A. Dahleh,et al.  Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[41]  George J. Pappas,et al.  Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[42]  Samet Oymak,et al.  Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[43]  Thomas B. Schön,et al.  Robust exploration in linear quadratic reinforcement learning , 2019, NeurIPS.

[44]  Nevena Lazic,et al.  Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[45]  Babak Hassibi,et al.  Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems , 2020, NeurIPS.

[46]  George J. Pappas,et al.  Sample Complexity of Kalman Filtering for Unknown Systems , 2019, L4DC.

[47]  Bruce Lee,et al.  Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction , 2019, 2020 59th IEEE Conference on Decision and Control (CDC).

[48]  Holden Lee,et al.  Robust guarantees for learning an autoregressive filter , 2019, ALT.

[49]  Max Simchowitz,et al.  Improper Learning for Non-Stochastic Control , 2020, COLT.

[50]  B. Hassibi,et al.  Regret Minimization in Partially Observable Linear Quadratic Control , 2020, ArXiv.

[51]  Ambuj Tewari,et al.  Input perturbations for adaptive control and learning , 2018, Autom..

[52]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[53]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .