Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

We consider the setting of vector valued non-linear dynamical systems Xt+1 = φ(AXt) + ηt, where ηt is unbiased noise and φ : R→ R is a known link function that satisfies certain expansivity property. The goal is to learn A∗ from a single trajectory X1, · · · , XT of dependent or correlated samples. While the problem is well-studied in the linear case, where φ is identity, with optimal error rates even for non-mixing systems, existing results in the non-linear case hold only for mixing systems. In this work, we improve existing results for learning nonlinear systems in a number of ways: a) we provide the first offline algorithm that can learn non-linear dynamical systems without the mixing assumption, b) we significantly improve upon the sample complexity of existing results for mixing systems, c) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay (SGD− RER) method, and demonstrate that for mixing systems, it achieves the same sample complexity as our offline algorithm, d) we justify the expansivity assumption by showing that for the popular ReLU link function — a non-expansive but easy to learn link function with i.i.d. samples — any method would require exponentially many samples (with respect to dimension of Xt) from the dynamical system. We validate our results via. simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Erik Weyer,et al.  Finite sample properties of system identification methods , 2002, IEEE Trans. Autom. Control..

[3]  Tomoki Fukai,et al.  Recurrent network model for learning goal-directed sequences through reverse replay , 2017, bioRxiv.

[4]  Ilias Diakonikolas,et al.  Approximation Schemes for ReLU Regression , 2020, COLT.

[5]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[6]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[7]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[8]  D. Paulin Concentration inequalities for Markov chains by Marton couplings and spectral methods , 2012, 1212.2015.

[9]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[10]  V. Koltchinskii,et al.  High Dimensional Probability , 2006, math/0612726.

[11]  Karl Johan Åström,et al.  BOOK REVIEW SYSTEM IDENTIFICATION , 1994, Econometric Theory.

[12]  Stephen A. Billings,et al.  Non-linear system identification using neural networks , 1990 .

[13]  Rebecca Willett,et al.  Inference of High-dimensional Autoregressive Generalized Linear Models , 2016, ArXiv.

[14]  Michael I. Jordan,et al.  Active Learning for Nonlinear System Identification with Guarantees , 2020, J. Mach. Learn. Res..

[15]  Csaba Szepesvári,et al.  Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.

[16]  Max Simchowitz,et al.  Learning Linear Dynamical Systems with Semi-Parametric Least Squares , 2019, COLT.

[17]  Ohad Shamir,et al.  Stochastic Convex Optimization , 2009, COLT.

[18]  Jean-Jacques E. Slotine,et al.  The Reflectron: Exploiting geometry for learning generalized linear models , 2020, ArXiv.

[19]  F. Bach,et al.  Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.

[20]  Brad E. Pfeiffer,et al.  Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing Reward , 2016, Neuron.

[21]  Yuanzhi Li,et al.  On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.

[22]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[23]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[24]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[25]  Anuradha M. Annaswamy,et al.  Parameter Estimation Bounds Based on the Theory of Spectral Lines , 2020, ArXiv.

[26]  Xian Wu,et al.  Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms , 2020, NeurIPS.

[27]  Samet Oymak,et al.  Non-asymptotic and Accurate Learning of Nonlinear Dynamical Systems , 2020, ArXiv.

[28]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[29]  J. Webster,et al.  Wiley Encyclopedia of Electrical and Electronics Engineering , 2010 .

[30]  C. A. Desoer,et al.  Nonlinear Systems Analysis , 1978 .

[31]  Munther A. Dahleh,et al.  Finite-Time System Identification for Partially Observed LTI Systems of Unknown Order , 2019, ArXiv.

[32]  T. Lai,et al.  Self-Normalized Processes: Limit Theory and Statistical Applications , 2001 .

[33]  Dylan J. Foster,et al.  Learning nonlinear dynamical systems from a single trajectory , 2020, L4DC.

[34]  Adam Tauman Kalai,et al.  The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.

[35]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[36]  Sheng Chen,et al.  Representations of non-linear systems: the NARMAX model , 1989 .

[37]  Samet Oymak,et al.  Stochastic Gradient Descent Learns State Equations with Nonlinear Activations , 2018, COLT.

[38]  Mehryar Mohri,et al.  Time series prediction and online learning , 2016, COLT.

[39]  Alexander Rakhlin,et al.  Near optimal finite time identification of arbitrary linear dynamical systems , 2018, ICML.

[40]  Tony J. Prescott,et al.  A robotic model of hippocampal reverse replay for reinforcement learning , 2021, Bioinspiration & biomimetics.

[41]  Justin Romberg,et al.  Convex Programming for Estimation in Nonlinear Recurrent Models , 2019, ArXiv.

[42]  Naira Hovakimyan,et al.  Finite-Time Model Inference From A Single Noisy Trajectory , 2020, ArXiv.

[43]  Prateek Jain,et al.  Streaming Linear System Identification with Reverse Experience Replay , 2021, NeurIPS.

[44]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[45]  Moritz Hardt,et al.  Stable Recurrent Models , 2018, ICLR.

[46]  Garvesh Raskutti,et al.  Improved Prediction and Network Estimation Using the Monotone Single Index Multi-variate Autoregressive Model , 2021, ArXiv.

[47]  Mathukumalli Vidyasagar,et al.  A learning theory approach to system identification and stochastic adaptive control , 2008 .

[48]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[49]  Prateek Jain,et al.  SGD without Replacement: Sharper Rates for General Smooth Convex Functions , 2019, ICML.