论文信息 - Statistical Learning Theory for Control: A Finite Sample Perspective

Statistical Learning Theory for Control: A Finite Sample Perspective

This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. From a theoretical perspective, much of the labor underlying these advances has been in adapting tools from modern high-dimensional statistics and learning theory. While highly relevant to control theorists interested in integrating tools from machine learning, the foundational material has not always been easily accessible. To remedy this, we provide a self-contained presentation of the relevant material, outlining all the key ideas and the technical machinery that underpin recent results. We also present a number of open problems and future directions.

George J. Pappas | N. Matni | Anastasios Tsiamis | I. Ziemann | Ingvar M. Ziemann

[1] A. Proutière,et al. Finite-Time Identification of Linear Systems: Fundamental Limits and Optimal Algorithms , 2023, IEEE Transactions on Automatic Control.

[2] Ingvar M. Ziemann. A note on the smallest eigenvalue of the empirical covariance of causal Gaussian processes , 2022, ArXiv.

[3] M. Fazel,et al. Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies , 2022, ArXiv.

[4] Stephen Tu,et al. Learning with little mixing , 2022, NeurIPS.

[5] N. Matni,et al. How are policy gradient methods affected by the limits of control? , 2022, 2022 IEEE 61st Conference on Decision and Control (CDC).

[6] Stephen Tu,et al. TaSIL: Taylor Series Imitation Learning , 2022, NeurIPS.

[7] George J. Pappas,et al. Learning to Control Linear Systems can be Hard , 2022, COLT.

[8] Samet Oymak,et al. Revisiting Ho–Kalman-Based System Identification: Robustness and Finite-Sample Analysis , 2022, IEEE Transactions on Automatic Control.

[9] Stephen Tu,et al. Learning from many trajectories , 2022, ArXiv.

[10] N. Matni,et al. Single Trajectory Nonparametric Learning of Nonlinear Dynamics , 2022, COLT.

[11] Boualem Djehiche,et al. Efficient learning of hidden state LTI state space models of unknown order , 2022, 2202.01625.

[12] H. Sandberg,et al. Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems , 2022, ArXiv.

[13] Yue Sun,et al. Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[14] Laura Balzano,et al. Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds , 2021, ArXiv.

[15] Max Simchowitz,et al. Stabilizing Dynamical Systems via Policy Gradient Methods , 2021, NeurIPS.

[16] J. Slotine,et al. Contraction Theory for Nonlinear Stability Analysis and Learning-based Control: A Tutorial Overview , 2021, Annu. Rev. Control..

[17] Alexandre Proutiere,et al. Minimal Expected Regret in Linear Quadratic Control , 2021, AISTATS.

[18] Prateek Jain,et al. Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems , 2021, NeurIPS.

[19] G. Ferrari-Trecate,et al. Near-Optimal Design of Safe Output-Feedback Controllers From Noisy Data , 2021, IEEE Transactions on Automatic Control.

[20] George J. Pappas,et al. Linear Systems can be Hard to Learn , 2021, 2021 60th IEEE Conference on Decision and Control (CDC).

[21] Stephen Tu,et al. On the Sample Complexity of Stability Constrained Imitation Learning , 2021, L4DC.

[22] Max Simchowitz,et al. Task-Optimal Exploration in Linear Dynamical Systems , 2021, ICML.

[23] B. Hambly,et al. Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon , 2020, SIAM J. Control. Optim..

[24] Holden Lee. Improved rates for prediction and identification of partially observed linear dynamical systems , 2020, ALT.

[25] Stuart J. Russell,et al. SLIP: Learning to Predict in Unknown Dynamical Systems with Long-Term Memory , 2020, NeurIPS.

[26] Salar Fattahi,et al. Learning Partially Observed Linear Dynamical Systems from Logarithmic Number of Samples , 2020, L4DC.

[27] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[28] Na Li,et al. Non-Asymptotic Identification of Linear Dynamical Systems Using Multiple Trajectories , 2020, IEEE Control Systems Letters.

[29] Alessandro Lazaric,et al. Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation , 2020, ICML.

[30] Michael I. Jordan,et al. Active Learning for Nonlinear System Identification with Guarantees , 2020, J. Mach. Learn. Res..

[31] Xian Wu,et al. Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms , 2020, NeurIPS.

[32] G. A. Young,et al. High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[33] Dylan J. Foster,et al. Learning nonlinear dynamical systems from a single trajectory , 2020, L4DC.

[34] Babak Hassibi,et al. Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems , 2020, NeurIPS.

[35] Alexandre Proutiere,et al. Finite-time Identification of Stable Linear Systems Optimality of the Least-Squares Estimator , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[36] B. Hassibi,et al. Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting , 2020, 2021 American Control Conference (ACC).

[37] Samet Oymak,et al. Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems , 2020, J. Mach. Learn. Res..

[38] Alon Cohen,et al. Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently , 2020, ICML.

[39] George J. Pappas,et al. Online Learning of the Kalman Filter With Logarithmic Regret , 2020, IEEE Transactions on Automatic Control.

[40] Kevin G. Jamieson,et al. Active Learning for Identification of Linear Dynamical Systems , 2020, COLT.

[41] Karan Singh,et al. No-Regret Prediction in Marginally Stable Systems , 2020, COLT.

[42] Max Simchowitz,et al. Naive Exploration is Optimal for Online LQR , 2020, ICML.

[43] Max Simchowitz,et al. Improper Learning for Non-Stochastic Control , 2020, COLT.

[44] Jaouad Mourtada. Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices , 2019, The Annals of Statistics.

[45] C. Rojas,et al. Finite impulse response models: A non-asymptotic analysis of the least squares estimator , 2019, Bernoulli.

[46] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[47] Bruce Lee,et al. Non-asymptotic Closed-Loop System Identification using Autoregressive Processes and Hankel Model Reduction , 2019, 2020 59th IEEE Conference on Decision and Control (CDC).

[48] Nikolai Matni,et al. A Tutorial on Concentration Bounds for System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[49] Yishay Mansour,et al. Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret , 2019, ICML.

[50] Nikolai Matni,et al. Learning Sparse Dynamical Systems from a Single Sample Trajectory , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[51] Alexandre Proutière,et al. Sample Complexity Lower Bounds for Linear System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[52] George J. Pappas,et al. Finite Sample Analysis of Stochastic System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[53] Benjamin Recht,et al. Certainty Equivalence is Efficient for Linear Quadratic Control , 2019, NeurIPS.

[54] Max Simchowitz,et al. Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[55] Alexander Rakhlin,et al. Near optimal finite time identification of arbitrary linear dynamical systems , 2018, ICML.

[56] Benjamin Recht,et al. The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint , 2018, COLT.

[57] Ambuj Tewari,et al. Input perturbations for adaptive control and learning , 2018, Autom..

[58] Nikolai Matni,et al. Safely Learning to Control the Constrained Linear Quadratic Regulator , 2018, 2019 American Control Conference (ACC).

[59] Shie Mannor,et al. On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters , 2018, AAAI.

[60] Alessandro Lazaric,et al. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems , 2018, ICML.

[61] Nikolai Matni,et al. Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator , 2018, NeurIPS.

[62] Nikolai Matni,et al. Finite-Data Performance Guarantees for the Output-Feedback Control of an Unknown System , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[63] Michael I. Jordan,et al. Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[64] Sham M. Kakade,et al. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[65] Ambuj Tewari,et al. Finite Time Identification in Unstable Linear Systems , 2017, Autom..

[66] Nikolai Matni,et al. On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[67] Yi Ouyang,et al. Learning-based Control of Unknown Linear Systems with Thompson Sampling , 2017, ArXiv.

[68] Benjamin Recht,et al. Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification , 2017, ArXiv.

[69] Tengyu Ma,et al. Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..

[70] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[71] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[72] S. Mendelson,et al. Regularization and the small-ball method I: sparse recovery , 2016, 1601.05584.

[73] Shie Mannor,et al. Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.

[74] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[75] Max Simchowitz,et al. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[76] S. Mendelson. Learning without concentration for general loss functions , 2014, 1410.3192.

[77] Harshad Deshmane,et al. System Identification via Nuclear Norm Regularization , 2014 .

[78] Shahar Mendelson,et al. Learning without Concentration , 2014, COLT.

[79] Roberto Imbuzeiro Oliveira,et al. The lower tail of random quadratic forms with applications to ordinary least squares , 2013, ArXiv.

[80] S. Boucheron,et al. Concentration Inequalities: A Nonasymptotic Theory of Independence , 2013 .

[81] Csaba Szepesvári,et al. Regret Bounds for the Adaptive Control of Linear Quadratic Systems , 2011, COLT.

[82] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[83] Bart De Moor,et al. Subspace Identification for Linear Systems: Theory ― Implementation ― Applications , 2011 .

[84] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[85] Mathukumalli Vidyasagar,et al. A learning theory approach to system identification and stochastic adaptive control , 2008 .

[86] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[87] V. Verdult,et al. Filtering and System Identification: A Least Squares Approach , 2007 .

[88] Si-Zhao Joe Qin,et al. An overview of subspace identification , 2006, Comput. Chem. Eng..

[89] Dietmar Bauer,et al. Comparing the CCA Subspace Method to Pseudo Maximum Likelihood Methods in the case of No Exogenous Inputs , 2005 .

[90] Dietmar Bauer,et al. Asymptotic properties of subspace estimators , 2005, Autom..

[91] M. Wagner,et al. Estimating cointegrated systems using subspace algorithms , 2002 .

[92] A. Winter,et al. Strong converse for identification via quantum channels , 2000, IEEE Trans. Inf. Theory.

[93] M. Deistler,et al. On the Impact of Weighting Matrices in Subspace Algorithms , 2000 .

[94] M. Campi,et al. Finite sample properties of system identification methods , 1999, Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No.99CH36304).

[95] Dietmar Bauer,et al. Consistency and asymptotic normality of some subspace algorithms for systems without observed inputs , 1999, Autom..

[96] Iven M. Y. Mareels,et al. Finite sample properties of linear model identification , 1999, IEEE Trans. Autom. Control..

[97] Alexander Goldenshluger,et al. Nonparametric Estimation of Transfer Functions: Rates of Convergence and Adaptation , 1998, IEEE Trans. Inf. Theory.

[98] R. Gill,et al. Applications of the van Trees inequality : a Bayesian Cramr-Rao bound , 1995 .

[99] Petros G. Voulgaris,et al. On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[100] Lennart Ljung,et al. Performance analysis of general tracking algorithms , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[101] Manfred Deistler,et al. Consistency and relative efficiency of subspace methods , 1994, Autom..

[102] K. Poolla,et al. On the time complexity of worst-case system identification , 1994, IEEE Trans. Autom. Control..

[103] J. Tsitsiklis,et al. The sample complexity of worst-case identification of FIR linear systems , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[104] L. Ljung,et al. Asymptotic properties of the least-squares method for estimating transfer functions and disturbance spectra , 1992, Advances in Applied Probability.

[105] E. Hannan,et al. The statistical theory of linear systems , 1989 .

[106] Jan Willem Polderman,et al. On the necessity of identifying the true parameter in adaptive LQ control , 1986 .

[107] Lennart Ljung,et al. Optimal experiment designs with respect to the intended model application , 1986, Autom..

[108] T. Lai. Asymptotically efficient adaptive control in stochastic regression models , 1986 .

[109] E. Bai,et al. Persistency of excitation, sufficient richness and parameter convergence in discrete time adaptive control☆ , 1985 .

[110] Woei Lin,et al. Will the self-tuning approach work for general cost criteria? , 1985 .

[111] T. Lai,et al. Asymptotic properties of general autoregressive models and strong consistency of least-squares estimates of their parameters , 1983 .

[112] Dante C. Youla,et al. Modern Wiener-Hopf Design of Optimal Controllers. Part I , 1976 .

[113] M. Hestenes. Quadratic control problems , 1975 .

[114] P. Wedin. Perturbation bounds in connection with singular value decomposition , 1972 .

[115] A. A. Feldbaum,et al. DUAL CONTROL THEORY, IV , 1961 .

[116] A. Wald,et al. On the Statistical Treatment of Linear Stochastic Difference Equations , 1943 .

[117] H. Hjalmarsson,et al. Regret Minimization for Linear Quadratic Adaptive Controllers Using Fisher Feedback Exploration , 2022, IEEE Control Systems Letters.

[118] Munther A. Dahleh,et al. Finite Time LTI System Identification , 2021, J. Mach. Learn. Res..

[119] Prof. Dr. Pavol Quittner,et al. Systems , 2019, Superlinear Parabolic Problems.

[120] Erik Weyer,et al. Finite-Sample System Identification: An Overview and a New Correlation Method , 2018, IEEE Control Systems Letters.

[121] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[122] Mihailo R. Jovanović,et al. System Identification via Nuclear Norm Regularization , 2014 .

[123] M. Gevers. Identification for Control: From the Early Achievements to the Revival of Experiment Design , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[124] A. Chiuso,et al. The asymptotic variance of subspace estimates , 2004 .

[125] Tamal Mukherjee,et al. System-Level Synthesis , 2003 .

[126] T. Lai,et al. Asymptotic Properties of General Autoregressive Models and Strong Consistency of Least-Squares Estimates of Their Parameters , 2003 .

[127] Tamer Basar,et al. Dual Control Theory , 2001 .

[128] T. Başar. Feedback and Optimal Sensitivity: Model Reference Transformations, Multiplicative Seminorms, and Approximate Inverses , 2001 .

[129] T. Söderström. Discrete-Time Stochastic Systems: Estimation and Control , 1995 .

[130] Petko Hr. Petkov,et al. Perturbation analysis of the discrete Riccati equation , 1993, Kybernetika.

[131] Lennart Ljung,et al. System Identification: Theory for the User , 1987 .

[132] B. Mark. On Self Tuning Regulators , 1972 .

[133] H. Simon,et al. Dynamic Programming Under Uncertainty with a Quadratic Criterion Function , 1956 .