论文信息 - Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems - 字舞流文

Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems

We study the problem of system identification and adaptive control in partially observable linear dynamical systems. Adaptive and closed-loop system identification is a challenging problem due to correlations introduced in data collection. In this paper, we present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. Deploying this estimation method, we propose adaptive control online learning (AdaptOn), an efficient reinforcement learning algorithm that adaptively learns the system dynamics and continuously updates its controller through online learning steps. AdaptOn estimates the model dynamics by occasionally solving a linear regression problem through interactions with the environment. Using policy re-parameterization and the estimated model, AdaptOn constructs counterfactual loss functions to be used for updating the controller through online gradient descent. Over time, AdaptOn improves its model estimates and obtains more accurate gradient updates to improve the controller. We show that AdaptOn achieves a regret upper bound of $\text{polylog}\left(T\right)$, after $T$ time steps of agent-environment interaction. To the best of our knowledge, AdaptOn is the first algorithm that achieves $\text{polylog}\left(T\right)$ regret in adaptive control of unknown partially observable linear dynamical systems which includes linear quadratic Gaussian (LQG) control.

Babak Hassibi | Kamyar Azizzadenesheli | Anima Anandkumar | Sahin Lale

[1] Munther A. Dahleh,et al. Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[2] T. Lai,et al. Asymptotically efficient self-tuning regulators , 1987 .

[3] T. Lai,et al. Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[4] Magnus Jansson,et al. Subspace Identification and ARX Modeling , 2003 .

[5] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[6] Dante C. Youla,et al. Modern Wiener-Hopf Design of Optimal Controllers. Part I , 1976 .

[7] Ambuj Tewari,et al. Input Perturbations for Adaptive Regulation and Learning , 2018, ArXiv.

[8] Petre Stoica,et al. Decentralized Control , 2018, The Control Systems Handbook.

[9] Bruce Lee,et al. Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction , 2019, 2020 59th IEEE Conference on Decision and Control (CDC).

[10] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[11] Sham M. Kakade,et al. The Nonstochastic Control Problem , 2020, ALT.

[12] Shie Mannor,et al. Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.

[13] Holden Lee,et al. Robust guarantees for learning an autoregressive filter , 2019, ALT.

[14] Sham M. Kakade,et al. Online Control with Adversarial Disturbances , 2019, ICML.

[15] Si-Zhao Joe Qin,et al. An overview of subspace identification , 2006, Comput. Chem. Eng..

[16] Β. L. HO,et al. Editorial: Effective construction of linear state-variable models from input/output functions , 1966 .

[17] Babak Hassibi,et al. Regret Minimization in Partially Observable Linear Quadratic Control , 2020, ArXiv.

[18] Steven M. LaValle,et al. Planning algorithms , 2006 .

[19] Michel Verhaegen,et al. Identification of the deterministic part of MIMO state space models given in innovations form from input-output data , 1994, Autom..

[20] Bart De Moor,et al. N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems , 1994, Autom..

[21] Ambuj Tewari,et al. Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.

[22] Lennart Ljung,et al. Closed-loop identification revisited , 1999, Autom..

[23] Yi Zhang,et al. Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[24] M. Phan,et al. Integrated system identification and state estimation for control offlexible space structures , 1992 .

[25] Kamyar Azizzadenesheli,et al. Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting , 2020, 2021 American Control Conference (ACC).

[26] B. Moor,et al. Closed loop subspace system identification , 1997 .

[27] J. W. Nieuwenhuis,et al. Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[28] Max Simchowitz,et al. Logarithmic Regret for Adversarial Online Control , 2020, ICML.

[29] Claude-Nicolas Fiechter,et al. PAC adaptive control of linear systems , 1997, COLT '97.

[30] Alon Cohen,et al. Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.

[31] Max Simchowitz,et al. Improper Learning for Non-Stochastic Control , 2020, COLT.

[32] Csaba Szepesvári,et al. Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.

[33] Lennart Ljung,et al. Closed-Loop Subspace Identification with Innovation Estimation , 2003 .

[34] Robert F. Stengel,et al. Optimal Control and Estimation , 1994 .

[35] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[36] Avinatan Hassidim,et al. Online Linear Quadratic Control , 2018, ICML.

[37] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[38] Karan Singh,et al. Logarithmic Regret for Online Control , 2019, NeurIPS.

[39] Y. Halevi. Stable LQG controllers , 1994, IEEE Trans. Autom. Control..

[40] Nevena Lazic,et al. Model-Free Linear Quadratic Control via Reduction to Expert Prediction , 2018, AISTATS.

[41] Benjamin Recht,et al. Certainty Equivalent Control of LQR is Efficient , 2019, ArXiv.

[42] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model with quadratic cost , 1986, 1986 25th IEEE Conference on Decision and Control.

[43] Thomas B. Schön,et al. Robust exploration in linear quadratic reinforcement learning , 2019, NeurIPS.

[44] M. Phan,et al. Identification of observer/Kalman filter Markov parameters: Theory and experiments , 1993 .

[45] L. Meng,et al. The optimal perturbation bounds of the Moore–Penrose inverse under the Frobenius norm , 2010 .

[46] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ArXiv.

[47] Max Simchowitz,et al. Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[49] Ambuj Tewari,et al. Input perturbations for adaptive control and learning , 2018, Autom..

[50] Yi Zhang,et al. No-Regret Prediction in Marginally Stable Systems , 2020, COLT.

[51] Samet Oymak,et al. Non-asymptotic Identification of LTI Systems from a Single Trajectory , 2018, 2019 American Control Conference (ACC).

[52] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[53] P. Wedin. Perturbation theory for pseudo-inverses , 1973 .

[54] George J. Pappas,et al. Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[55] T. Lai,et al. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems , 1982 .

[56] Alessandro Chiuso,et al. Consistency analysis of some closed-loop subspace identification methods , 2005, Autom..

[57] Mohamad Kazem Shirani Faradonbeh,et al. Regret Analysis for Adaptive Linear-Quadratic Policies , 2017 .

[58] George J. Pappas,et al. Online Learning of the Kalman Filter With Logarithmic Regret , 2020, IEEE Transactions on Automatic Control.

[59] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[60] Biao Huang,et al. System Identification , 2000, Control Theory for Physicists.

[61] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.

[62] Alessandro Lazaric,et al. Thompson Sampling for Linear-Quadratic Control Problems , 2017, AISTATS.

[63] Karan Singh,et al. Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[64] Varun Kanade,et al. Tracking Adversarial Targets , 2014, ICML.

[65] Sanjeev Arora,et al. Towards Provable Control for Unknown Linear Dynamical Systems , 2018, International Conference on Learning Representations.

[66] Richard W. Longman,et al. System identification from closed-loop data with known output feedback dynamics , 1994 .

[67] George J. Pappas,et al. Sample Complexity of Kalman Filtering for Unknown Systems , 2019, L4DC.

[68] Han-Fu Chen,et al. Optimal adaptive control and consistent parameter estimates for ARMAX model withquadratic cost , 1987 .

[69] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[70] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[71] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[72] T. Başar,et al. A New Approach to Linear Filtering and Prediction Problems , 2001 .

[73] Lennart Ljung,et al. Subspace identification from closed loop data , 1996, Signal Process..

[74] Kamyar Azizzadenesheli,et al. Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems , 2020, ArXiv.