Gradient Descent Learns Linear Dynamical Systems

We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though the objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.

[1]  A. Schaeffer Inequalities of A. Markoff and S. Bernstein for polynomials and related functions , 1941 .

[2]  J. Clunie,et al.  Complex Numbers and Functions. , 1963 .

[3]  Pavol Brunovský,et al.  A classification of linear controllable systems , 1970, Kybernetika.

[4]  Petre Stoica,et al.  Some properties of the output error method , 1982, Autom..

[5]  T. Söderström,et al.  Uniqueness of Prediction Error Estimates of Multivariable Moving Average Models , 1982 .

[6]  Petre Stoica,et al.  Uniqueness of prediction error estimates of multivariable moving average models , 1982, Autom..

[7]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[8]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[11]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[12]  Erik Weyer,et al.  Finite sample properties of system identification methods , 2002, IEEE Trans. Autom. Control..

[13]  Stephen P. Boyd,et al.  Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.

[14]  Trevor Darrell,et al.  Learning appearance manifolds from video , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[16]  Mathukumalli Vidyasagar,et al.  A learning theory approach to system identification and stochastic adaptive control , 2008 .

[17]  Han-Fu Chen,et al.  Introduction to Mathematical Systems Theory: Linear Systems, Identification and Control (Heig, C., et al; 2007) [Book Review] , 2008 .

[18]  Brian D. O. Anderson,et al.  Iterative minimization of H2 control performance criteria , 2008, Autom..

[19]  Alexandre Megretski,et al.  Convex optimization in robust identification of nonlinear feedback , 2008, 2008 47th IEEE Conference on Decision and Control.

[20]  João Pedro Hespanha,et al.  Linear Systems Theory , 2009 .

[21]  N Kottenstette,et al.  Relationships between positive real, passive dissipative, & positive systems , 2010, Proceedings of the 2010 American Control Conference.

[22]  D. Eckhard,et al.  On the Global Convergence of Identification of Output Error Models , 2011 .

[23]  Benjamin Recht,et al.  Atomic norm denoising with applications to line spectral estimation , 2011, Allerton.

[24]  Parikshit Shah,et al.  Linear system identification via atomic norm regularization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[25]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[26]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Shai Shalev-Shwartz,et al.  Beyond Convexity: Stochastic Quasi-Convex Optimization , 2015, NIPS.

[29]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[30]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.