Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification

We prove that the ordinary least-squares (OLS) estimator attains nearly minimax optimal performance for the identification of linear dynamical systems from a single observed trajectory. Our upper bound relies on a generalization of Mendelson's small-ball method to dependent data, eschewing the use of standard mixing-time arguments. Our lower bounds reveal that these upper bounds match up to logarithmic factors. In particular, we capture the correct signal-to-noise behavior of the problem, showing that more unstable linear systems are easier to estimate. This behavior is qualitatively different from arguments which rely on mixing-time calculations that suggest that unstable systems are more difficult to estimate. We generalize our technique to provide bounds for a more general class of linear response time-series.

[1]  John S. White THE LIMITING DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT IN THE EXPLOSIVE CASE , 1958 .

[2]  A. W. Knapp Representation Theory of Semisimple Groups: An Overview Based on Examples. . , 1986 .

[3]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[4]  Erik Weyer,et al.  Finite sample properties of system identification methods , 2002, IEEE Trans. Autom. Control..

[5]  Mehryar Mohri,et al.  Stability Bounds for Non-i.i.d. Processes , 2007, NIPS.

[6]  Mathukumalli Vidyasagar,et al.  A learning theory approach to system identification and stochastic adaptive control , 2008 .

[7]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[8]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[9]  Parikshit Shah,et al.  Linear system identification via atomic norm regularization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[10]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[11]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[12]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[13]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[14]  Mehryar Mohri,et al.  Generalization bounds for non-stationary mixing processes , 2016, Machine Learning.

[15]  Cosma Rohilla Shalizi,et al.  Nonparametric Risk Bounds for Time-Series Forecasting , 2012, J. Mach. Learn. Res..

[16]  Ambuj Tewari,et al.  Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems , 2017, ArXiv.

[17]  Karan Singh,et al.  Learning Linear Dynamical Systems via Spectral Filtering , 2017, NIPS.

[18]  Benjamin Recht,et al.  Non-Asymptotic Analysis of Robust Control from Coarse-Grained Identification , 2017, ArXiv.

[19]  Anders Rantzer,et al.  Concentration Bounds for Single Parameter Adaptive Control , 2018, 2018 Annual American Control Conference (ACC).

[20]  Ambuj Tewari,et al.  Finite Time Identification in Unstable Linear Systems , 2017, Autom..

[21]  Tengyu Ma,et al.  Gradient Descent Learns Linear Dynamical Systems , 2016, J. Mach. Learn. Res..

[22]  Yi Zhang,et al.  Spectral Filtering for General Linear Dynamical Systems , 2018, NeurIPS.

[23]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.

[24]  Ambuj Tewari,et al.  Optimism-Based Adaptive Regulation of Linear-Quadratic Systems , 2017, IEEE Transactions on Automatic Control.