Neural Ordinary Differential Equations

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.

[1]  Tomas Mikolov,et al.  Variable Computation in Recurrent Neural Networks , 2016, ICLR.

[2]  Jan Hasenauer,et al.  Optimization and uncertainty analysis of ODE models using 2nd order adjoint sensitivity analysis , 2018, bioRxiv.

[3]  Ernst Hairer,et al.  Solving Ordinary Differential Equations I: Nonstiff Problems , 2009 .

[4]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[5]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[6]  L. S. Pontryagin,et al.  Mathematical Theory of Optimal Processes , 1962 .

[7]  Suchi Saria,et al.  Scalable Joint Models for Reliable Uncertainty-Aware Event Prediction , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David A. Ham,et al.  Automated Derivation of the Adjoint of High-Level Transient Finite Element Programs , 2012, SIAM J. Sci. Comput..

[9]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[10]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[11]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[12]  Katherine A. Heller,et al.  Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier , 2017, ICML.

[13]  Neil D. Lawrence,et al.  Computationally Efficient Convolved Multiple Output Gaussian Processes , 2011, J. Mach. Learn. Res..

[14]  Paris Perdikaris,et al.  Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Differential Equations , 2017, SIAM J. Sci. Comput..

[15]  Kristian Kirsch,et al.  Theory Of Ordinary Differential Equations , 2016 .

[16]  Eldad Haber,et al.  Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[17]  E. Hairer,et al.  Solving ordinary differential equations I (2nd revised. ed.): nonstiff problems , 1993 .

[18]  Eldad Haber,et al.  Stable architectures for deep neural networks , 2017, ArXiv.

[19]  Bob Carpenter,et al.  The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ , 2015, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  George E. Karniadakis,et al.  Hidden physics models: Machine learning of nonlinear partial differential equations , 2017, J. Comput. Phys..

[22]  Joel Andersson,et al.  A General-Purpose Software Framework for Dynamic Optimization (Een algemene softwareomgeving voor dynamische optimalisatie) , 2013 .

[23]  Suchi Saria,et al.  Learning Treatment-Response Models from Multivariate Longitudinal Data , 2017, UAI.

[24]  Frederick Tung,et al.  Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.

[25]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[26]  Max Welling,et al.  Improving Variational Auto-Encoders using Householder Flow , 2016, ArXiv.

[27]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[28]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[29]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[30]  Nils Thürey,et al.  Latent Space Physics: Towards Learning the Temporal Evolution of Fluid Flow , 2018, Comput. Graph. Forum.

[31]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[32]  Samy Bengio,et al.  Time-Dependent Representation for Neural Event Sequence Prediction , 2017, ICLR.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yann LeCun,et al.  A theoretical framework for back-propagation , 1988 .

[35]  Maziar Raissi,et al.  Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations , 2018, J. Mach. Learn. Res..

[36]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[37]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[38]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[39]  Bin Dong,et al.  Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations , 2017, ICML.

[40]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[41]  Suchi Saria,et al.  What-If Reasoning with Counterfactual Gaussian Processes , 2017, NIPS 2017.

[42]  David C. Kale,et al.  Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series , 2016, MLHC.

[43]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[44]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[45]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[46]  Li Zhang,et al.  Spatially Adaptive Computation Time for Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  M. L. Chambers The Mathematical Theory of Optimal Processes , 1965 .

[48]  Bin Dong,et al.  PDE-Net: Learning PDEs from Data , 2017, ICML.

[49]  G. Karniadakis,et al.  Multistep Neural Networks for Data-driven Discovery of Nonlinear Dynamical Systems , 2018, 1801.01236.

[50]  David Duvenaud,et al.  Probabilistic ODE Solvers with Runge-Kutta Means , 2014, NIPS.

[51]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[52]  Wim Vanroose,et al.  Fast derivatives of likelihood functionals for ODE based models using adjoint-state method , 2016, Computational Statistics.

[53]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[54]  Adrian Sandu,et al.  FATODE: A Library for Forward, Adjoint, and Tangent Linear Integration of ODEs , 2014, SIAM J. Sci. Comput..

[55]  Eldad Haber,et al.  Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[56]  C. Runge Ueber die numerische Auflösung von Differentialgleichungen , 1895 .

[57]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[58]  Barak A. Pearlmutter Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[59]  Moritz Diehl,et al.  CasADi: a software framework for nonlinear optimization and optimal control , 2018, Mathematical Programming Computation.