Regret Minimization in Partially Observable Linear Quadratic Control

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$ for ExpCommit, where $T$ is the time horizon of the problem.

[1]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[2]  S. Bittanti,et al.  ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[3]  M. Phan,et al.  Integrated system identification and state estimation for control offlexible space structures , 1992 .

[4]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[5]  R. Skelton,et al.  The data-based LQG control problem , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[6]  Nevena Lazic,et al.  Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[7]  Sanjeev Arora,et al.  Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[8]  Karan Singh,et al.  Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[9]  Yi Ouyang,et al.  Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[10]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[11]  Ambuj Tewari,et al.  Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[12]  Yishay Mansour,et al.  Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[13]  Petre Stoica,et al.  Decentralized Control , 2018, The Control Systems Handbook.

[14]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[15]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16]  Holger Rauhut,et al.  Suprema of Chaos Processes and the Restricted Isometry Property , 2012, ArXiv.

[17]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[18]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[19]  Alessandro Lazaric,et al.  Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[20]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[21]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[22]  T. Lai,et al.  Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[23]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[24]  Kamyar Azizzadenesheli,et al.  Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[25]  Mohamad Kazem Shirani Faradonbeh,et al.  Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[26]  Ambuj Tewari,et al.  Input perturbations for adaptive control and learning , 2018, Autom..

[27]  Claude-Nicolas Fiechter,et al.  PAC adaptive control of linear systems , 1997, COLT '97.

[28]  Yi Zhang,et al.  Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[29]  Benjamin Recht,et al.  Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[30]  Benjamin Recht,et al.  Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[31]  Nikolai Matni,et al.  Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[32]  Csaba Szepesvári,et al.  Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[33]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[34]  Richard W. Longman,et al.  System identification from closed-loop data with known output feedback dynamics , 1994 .

[35]  Munther A. Dahleh,et al.  Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[36]  Tor Lattimore,et al.  On Explore-Then-Commit strategies , 2016, NIPS.

[37]  Doreen Meier,et al.  Introduction To Stochastic Control Theory , 2016 .

[38]  Alessandro Lazaric,et al.  Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[39]  Han-Fu Chen,et al.  Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[40]  George J. Pappas,et al.  Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[41]  T. Lai,et al.  Asymptotically efficient self-tuning regulators , 1987 .

[42]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[43]  M. Phan,et al.  Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[44]  Samet Oymak,et al.  Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[45]  Β. L. HO,et al.  Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[46]  T. Kailath,et al.  Indefinite-quadratic estimation and control: a unified approach to H 2 and H ∞ theories , 1999 .

[47]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[48]  P. Kumar,et al.  Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .