论文信息 - Regret Minimization in Partially Observable Linear Quadratic Control

Regret Minimization in Partially Observable Linear Quadratic Control

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$ for ExpCommit, where $T$ is the time horizon of the problem.

Babak Hassibi | Kamyar Azizzadenesheli | Anima Anandkumar | Sahin Lale

[1] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[2] S. Bittanti,et al. ADAPTIVE CONTROL OF LINEAR TIME INVARIANT SYSTEMS: THE "BET ON THE BEST" PRINCIPLE ∗ , 2006 .

[3] M. Phan,et al. Integrated system identification and state estimation for control offlexible space structures , 1992 .

[4] Michael I. Jordan,et al. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[5] R. Skelton,et al. The data-based LQG control problem , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[6] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[7] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[8] Karan Singh,et al. Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[9] Yi Ouyang,et al. Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[10] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[11] Ambuj Tewari,et al. Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[12] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[13] Petre Stoica,et al. Decentralized Control , 2018, The Control Systems Handbook.

[14] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[15] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[16] Holger Rauhut,et al. Suprema of Chaos Processes and the Restricted Isometry Property , 2012, ArXiv.

[17] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[18] Biao Huang,et al. System Identification , 2000, Control Theory for Physicists.

[19] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[20] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[21] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[22] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[23] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[24] Kamyar Azizzadenesheli,et al. Reinforcement Learning of POMDPs using Spectral Methods , 2016, COLT.

[25] Mohamad Kazem Shirani Faradonbeh,et al. Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[26] Ambuj Tewari,et al. Input perturbations for adaptive control and learning , 2018, Autom..

[27] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.

[28] Yi Zhang,et al. Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[29] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[30] Benjamin Recht,et al. Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[31] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[32] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[33] Ambuj Tewari,et al. Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[34] Richard W. Longman,et al. System identification from closed-loop data with known output feedback dynamics , 1994 .

[35] Munther A. Dahleh,et al. Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[36] Tor Lattimore,et al. On Explore-Then-Commit strategies , 2016, NIPS.

[37] Doreen Meier,et al. Introduction To Stochastic Control Theory , 2016 .

[38] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[39] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[40] George J. Pappas,et al. Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[41] T. Lai,et al. Asymptotically efficient self-tuning regulators , 1987 .

[42] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[43] M. Phan,et al. Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[44] Samet Oymak,et al. Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[45] Β. L. HO,et al. Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[46] T. Kailath,et al. Indefinite-quadratic estimation and control: a unified approach to H 2 and H ∞ theories , 1999 .

[47] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[48] P. Kumar,et al. Adaptive Linear Quadratic Gaussian Control: The Cost-Biased Approach Revisited , 1998 .