Machine Learning for Prediction with Missing Dynamics

Abstract This article presents a general framework for recovering missing dynamical systems using available data and machine learning techniques. The proposed framework reformulates the prediction problem as a supervised learning problem to approximate a map that takes the memories of the resolved and identifiable unresolved variables to the missing components in the resolved dynamics. We demonstrate the effectiveness of the proposed framework with a strong convergence error bound of the resolved variables up to finite time and numerical tests on prototypical models in various scientific domains. These include the 57-mode barotropic stress models with multiscale interactions that mimic the blocked and unblocked patterns observed in the atmosphere, the nonlinear Schrodinger equation which found many applications in physics such as optics and Bose-Einstein-Condense, the Kuramoto-Sivashinsky equation which spatiotemporal chaotic pattern formation models trapped ion mode in plasma and phase dynamics in reaction-diffusion systems. While many machine learning techniques can be used to validate the proposed framework, we found that recurrent neural networks outperform kernel regression methods in terms of recovering the trajectory of the resolved components and the equilibrium one-point and two-point statistics. This superb performance suggests that a recurrent neural network is an effective tool for recovering the missing dynamics that involves approximation of high-dimensional functions.

[1]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[2]  Lei Wu,et al.  A Priori Estimates of the Generalization Error for Two-layer Neural Networks , 2018, Communications in Mathematical Sciences.

[3]  He Zhang,et al.  Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation , 2019 .

[4]  Alexandre J. Chorin,et al.  Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems , 2016, 1605.02273.

[5]  Shi Jin,et al.  Numerical Study of Time-Splitting Spectral Discretizations of Nonlinear Schrödinger Equations in the Semiclassical Regimes , 2003, SIAM J. Sci. Comput..

[6]  D. Wilks Effects of stochastic parametrizations in the Lorenz '96 system , 2005 .

[7]  Jonathan Goodman,et al.  Stability of the kuramoto-sivashinsky and related systems† , 1994 .

[8]  Fei Lu,et al.  Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism , 2019, J. Comput. Phys..

[9]  Alexandre J. Chorin,et al.  Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics , 2015, Proceedings of the National Academy of Sciences.

[10]  Le Song,et al.  A unified kernel framework for nonparametric inference in graphical models ] Kernel Embeddings of Conditional Distributions , 2013 .

[11]  John Harlim,et al.  Semiparametric modeling: Correcting low-dimensional model error in parametric models , 2015, J. Comput. Phys..

[12]  Alexandre J. Chorin,et al.  Problem reduction, renormalization, and memory , 2005 .

[13]  Yuanzhi Li,et al.  On the Convergence Rate of Training Recurrent Neural Networks , 2018, NeurIPS.

[14]  Alexandre J. Chorin,et al.  Optimal prediction with memory , 2002 .

[15]  E Weinan,et al.  Model Reduction with Memory and the Machine Learning of Dynamical Systems , 2018, Communications in Computational Physics.

[16]  Karthik Duraisamy,et al.  A priori estimation of memory effects in reduced-order models of nonlinear systems using the Mori–Zwanzig formalism , 2016, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17]  Y. Kuramoto,et al.  Persistent Propagation of Concentration Waves in Dissipative Media Far from Thermal Equilibrium , 1976 .

[18]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[19]  Andrew J. Majda,et al.  Nonlinear Dynamics and Statistical Theories for Basic Geophysical Flows , 2006 .

[20]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[21]  H. Mori Transport, Collective Motion, and Brownian Motion , 1965 .

[22]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[23]  M. J. Grote,et al.  Dynamic Mean Flow and Small-Scale Interaction through Topographic Stress , 1999 .

[24]  Michael Ghil,et al.  Data-driven non-Markovian closure models , 2014, 1411.4700.

[25]  Qiang Du,et al.  New error bounds for deep networks using sparse grids. , 2017, 1712.08688.

[26]  Guanrong Chen,et al.  Linear systems and optimal control , 1989, Springer series in information sciences.

[27]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[28]  Shixiao W. Jiang,et al.  Modeling of missing dynamical systems: deriving parametric models using a nonparametric framework , 2019 .

[29]  Petros Koumoutsakos,et al.  Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks , 2018, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[30]  Eric Vanden-Eijnden,et al.  Transition-path theory and path-finding algorithms for the study of rare events. , 2010, Annual review of physical chemistry.

[31]  Eric Darve,et al.  Computing generalized Langevin equations and generalized Fokker–Planck equations , 2009, Proceedings of the National Academy of Sciences.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Andrew J. Majda,et al.  Systematic metastable atmospheric regime identification in an AGCM , 2009 .

[34]  Masaaki Imaizumi,et al.  Adaptive Approximation and Estimation of Deep Neural Network to Intrinsic Dimensionality , 2019, ArXiv.

[35]  Xin T. Tong,et al.  Spatial Localization for Nonlinear Dynamical Stochastic Models for Excitable Media , 2019, Chinese Annals of Mathematics, Series B.

[36]  R. Zwanzig Nonequilibrium statistical mechanics , 2001, Physics Subject Headings (PhySH).

[37]  Vanden Eijnden E,et al.  Models for stochastic climate prediction. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[38]  A. Roberts,et al.  LYAPUNOV EXPONENTS OF THE KURAMOTO–SIVASHINSKY PDE , 2019, The ANZIAM Journal.

[39]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[40]  P. Kevrekidis,et al.  Solitons in coupled nonlinear Schrödinger models: A survey of recent developments , 2016 .

[41]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[42]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[43]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[44]  Andrew J. Majda,et al.  Systematic Strategies for Stochastic Mode Reduction in Climate , 2003 .

[45]  Karthik Duraisamy,et al.  Data-Driven Discovery of Closure Models , 2018, SIAM J. Appl. Dyn. Syst..

[46]  Xuemin Tu,et al.  Accounting for Model Error from Unresolved Scales in Ensemble Kalman Filters by Stochastic Parameterization , 2017 .

[47]  John Harlim,et al.  Linear theory for filtering nonlinear multiscale systems with model error , 2013, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[48]  Andrew J. Majda,et al.  A stochastic multicloud model for tropical convection , 2010 .

[49]  Karthik Duraisamy,et al.  A dynamic subgrid scale model for Large Eddy Simulations based on the Mori-Zwanzig formalism , 2016, J. Comput. Phys..

[50]  Zuowei Shen,et al.  Deep Network Approximation with Discrepancy Being Reciprocal of Width to Power of Depth , 2020, ArXiv.

[51]  Eric Vanden-Eijnden,et al.  Subgrid-Scale Parameterization with Conditional Markov Chains , 2008 .

[52]  E Weinan,et al.  Heterogeneous multiscale methods: A review , 2007 .

[53]  Yuanzhi Li,et al.  Can SGD Learn Recurrent Neural Networks with Provable Generalization? , 2019, NeurIPS.

[54]  Kevin K. Lin,et al.  Data-driven model reduction, Wiener projections, and the Mori-Zwanzig formalism , 2019, ArXiv.

[55]  Andrew J. Majda,et al.  An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models , 2014, J. Comput. Phys..

[56]  A. Stuart,et al.  Extracting macroscopic dynamics: model problems and algorithms , 2004 .

[57]  Andrew J. Majda,et al.  Physics constrained nonlinear regression models for time series , 2012 .

[58]  Ting Gao,et al.  Mean Exit Time and Escape Probability for Dynamical Systems Driven by Lévy Noises , 2012, SIAM J. Sci. Comput..

[59]  Mtw,et al.  Stochastic flows and stochastic differential equations , 1990 .

[60]  Yuan Cao,et al.  A Generalization Theory of Gradient Descent for Learning Over-parameterized Deep ReLU Networks , 2019, ArXiv.

[61]  William Tang,et al.  Nonlinear Saturation of the Trapped-Ion Mode , 1974 .

[62]  Grigorios A. Pavliotis,et al.  Multiscale Methods: Averaging and Homogenization , 2008 .

[63]  George F. Carnevale,et al.  Nonlinear stability and statistical mechanics of flow over topography , 1987, Journal of Fluid Mechanics.

[64]  Haizhao Yang,et al.  Deep ReLU networks overcome the curse of dimensionality for bandlimited functions , 2019, 1903.00735.

[65]  Prasanna Balaprakash,et al.  Time-series learning of latent-space dynamics for reduced-order model closure , 2019, Physica D: Nonlinear Phenomena.

[66]  Zuowei Shen,et al.  Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[67]  M. Hirsch,et al.  Differential Equations, Dynamical Systems, and an Introduction to Chaos , 2003 .

[68]  Alexandre J. Chorin,et al.  Data-based stochastic model reduction for the Kuramoto--Sivashinsky equation , 2015, 1509.09279.

[69]  国田 寛 Stochastic flows and stochastic differential equations , 1990 .

[70]  Andrew J. Majda,et al.  Low-dimensional reduced-order models for statistical response and uncertainty quantification: Barotropic turbulence with topography , 2017 .

[71]  John Harlim,et al.  Parametric reduced models for the nonlinear Schrödinger equation. , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[72]  Andrew J. Majda,et al.  A mathematical framework for stochastic climate models , 2001 .

[73]  Tuo Zhao,et al.  On Generalization Bounds of a Family of Recurrent Neural Networks , 2018, AISTATS.

[74]  Frank Kwasniok,et al.  Data-based stochastic subgrid-scale parametrization: an approach using cluster-weighted modelling , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[75]  Zuowei Shen,et al.  Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[76]  Haizhao Yang,et al.  Error bounds for deep ReLU networks using the Kolmogorov-Arnold superposition theorem , 2019, Neural Networks.

[77]  Andrew J Majda,et al.  Statistical energy conservation principle for inhomogeneous turbulent dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[78]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.