Deterministic Inference of Neural Stochastic Differential Equations

Model noise is known to have detrimental effects on neural networks, such as training instability and predictive distributions with non-calibrated uncertainty properties. These factors set bottlenecks on the expressive potential of Neural Stochastic Differential Equations (NSDEs), a model family that employs neural nets on both drift and diffusion functions. We introduce a novel algorithm that solves a generic NSDE using only deterministic approximation methods. Given a discretization, we estimate the marginal distribution of the Ito process implied by the NSDE using a recursive scheme to propagate deterministic approximations of the statistical moments across time steps. The proposed algorithm comes with theoretical guarantees on numerical stability and convergence to the true solution, enabling its computational use for robust, accurate, and efficient prediction of long sequences. We observe our novel algorithm to behave interpretably on synthetic setups and to improve the state of the art on two challenging real-world tasks.

[1]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[2]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[3]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[4]  Fei Wang,et al.  Patient Subtyping via Time-Aware LSTM Networks , 2017, KDD.

[5]  Kurt Keutzer,et al.  ANODE: Unconditionally Accurate Memory-Efficient Gradients for Neural ODEs , 2019, IJCAI.

[6]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[7]  Duy Nguyen-Tuong,et al.  Probabilistic Recurrent State-Space Models , 2018, ICML.

[8]  Markus Heinonen,et al.  Learning unknown ODE models with Gaussian processes , 2018, ICML.

[9]  J. McNamee,et al.  Construction of fully symmetric numerical integration formulas of fully symmetric numerical integration formulas , 1967 .

[10]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[11]  Patrick Gallinari,et al.  Learning Dynamical Systems from Partial Observations , 2019, ArXiv.

[12]  George E. Karniadakis,et al.  Hidden physics models: Machine learning of nonlinear partial differential equations , 2017, J. Comput. Phys..

[13]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[14]  Carl E. Rasmussen,et al.  Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC , 2013, NIPS.

[15]  Vladlen Koltun,et al.  Trellis Networks for Sequence Modeling , 2018, ICLR.

[16]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[17]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[18]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[19]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[20]  Robert Grover Brown,et al.  Introduction to random signals and applied Kalman filtering : with MATLAB exercises and solutions , 1996 .

[22]  David Duvenaud,et al.  Scalable Gradients for Stochastic Differential Equations , 2020, AISTATS.

[23]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[24]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[25]  Maneesh Sahani,et al.  Learning interpretable continuous-time models of latent stochastic dynamical systems , 2019, ICML.

[26]  Maziar Raissi,et al.  Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations , 2018, ArXiv.

[27]  Kurt Keutzer,et al.  ANODEV2: A Coupled Neural ODE Framework , 2019, NeurIPS.

[28]  Inderjit S. Dhillon,et al.  Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization , 2018, ICML.

[29]  Simo Särkkä,et al.  Gaussian filtering and smoothing for continuous-discrete dynamic systems , 2013, Signal Process..

[30]  Simon Haykin,et al.  Cubature Kalman Filtering for Continuous-Discrete Systems: Theory and Simulations , 2010, IEEE Transactions on Signal Processing.

[31]  Shuhuang Xiang,et al.  On the Convergence Rates of Gauss and Clenshaw-Curtis Quadrature for Functions of Limited Regularity , 2012, SIAM J. Numer. Anal..

[32]  Edward De Brouwer,et al.  GRU-ODE-Bayes: Continuous modeling of sporadically-observed time series , 2019, NeurIPS.

[33]  Philippe Wenk,et al.  AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs , 2019, ICML.

[34]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[35]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[36]  Moustapha Cissé,et al.  Kronecker Recurrent Units , 2017, ICML.

[37]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[38]  T. Faniran Numerical Solution of Stochastic Differential Equations , 2015 .

[39]  Markus Heinonen,et al.  ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[40]  Dan Cornford,et al.  Gaussian Process Approximations of Stochastic Differential Equations , 2007, Gaussian Processes in Practice.

[41]  Justin A. Sirignano,et al.  DGM: A deep learning algorithm for solving partial differential equations , 2017, J. Comput. Phys..

[42]  Sebastian Nowozin,et al.  Deterministic Variational Inference for Robust Bayesian Neural Networks , 2018, ICLR.

[43]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[44]  David Duvenaud,et al.  Neural Networks with Cheap Differential Operators , 2019, NeurIPS.

[45]  Frank Stenger,et al.  Con-struction of fully symmetric numerical integration formulas , 1967 .

[46]  Arno Solin,et al.  Applied Stochastic Differential Equations , 2019 .

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  Michael A. Osborne,et al.  Frank-Wolfe Bayesian Quadrature: Probabilistic Integration with Theoretical Guarantees , 2015, NIPS.

[49]  Maxim Raginsky,et al.  Neural Stochastic Differential Equations: Deep Latent Gaussian Models in the Diffusion Limit , 2019, ArXiv.

[50]  Yee Whye Teh,et al.  Augmented Neural ODEs , 2019, NeurIPS.