Neural Networks with Cheap Differential Operators

Gradients of neural networks can be computed efficiently for any architecture, but some applications require computing differential operators with higher time complexity. We describe a family of neural network architectures that allow easy access to a family of differential operators involving \emph{dimension-wise derivatives}, and we show how to modify the backward computation graph to compute them efficiently. We demonstrate the use of these operators for solving root-finding subproblems in implicit ODE solvers, exact density evaluation for continuous normalizing flows, and evaluating the Fokker-Planck equation for training stochastic differential equation models.

[1]  A. D. Fokker Die mittlere Energie rotierender elektrischer Dipole im Strahlungsfeld , 1914 .

[2]  F. R. Moulton,et al.  New methods in exterior ballistics , 1927 .

[3]  L. Shampine,et al.  Some practical Runge-Kutta formulas , 1986 .

[4]  E. Hairer,et al.  Solving Ordinary Differential Equations I , 1987 .

[5]  J. Skilling The Eigenvalues of Mega-dimensional Matrices , 1989 .

[6]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[7]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[8]  D. Florens-zmirou Approximate discrete-time schemes for statistics of diffusion processes , 1989 .

[9]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[10]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[11]  N. Yoshida Estimation for diffusion processes from discrete observation , 1992 .

[12]  Alan C. Hindmarsh,et al.  Description and use of LSODE, the Livermore Solver for Ordinary Differential Equations , 1993 .

[13]  Yacine Ait-Sahalia Testing Continuous-Time Models of the Spot Interest Rate , 1995 .

[14]  Yacine Ait-Sahalia Testing Continuous-Time Models of the Spot Interest Rate , 1995 .

[15]  Mathieu Kessler Estimation of an Ergodic Diffusion from Discrete Observations , 1997 .

[16]  M. Pritsker Nonparametric Density Estimation and Tests of Continuous Time Interest Rate Models , 1998 .

[17]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[18]  Dimitrios I. Fotiadis,et al.  Artificial neural networks for solving ordinary and partial differential equations , 1997, IEEE Trans. Neural Networks.

[19]  Prakasa Rao Statistical inference for diffusion type processes , 1999 .

[20]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[21]  Yacine Aït-Sahalia Maximum Likelihood Estimation of Discretely Sampled Diffusions: A Closed‐form Approximation Approach , 2002 .

[22]  Y. Kutoyants Statistical Inference for Ergodic Diffusion Processes , 2004 .

[23]  J. Jeisman Estimation of the parameters of stochastic differential equations , 2006 .

[24]  H. Sørensen Parametric Inference for Diffusion Processes Observed at Discrete Points in Time: a Survey , 2004 .

[25]  S. Sharma,et al.  The Fokker-Planck Equation , 2010 .

[26]  E. Hairer,et al.  Solving Ordinary Differential Equations II , 2010 .

[27]  S. Iacus Option Pricing and Estimation of Financial Models with R , 2011 .

[28]  Ilya Sutskever,et al.  Estimating the Hessian by Back-propagating Curvature , 2012, ICML.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[31]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[32]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[33]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[34]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[35]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[36]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[37]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[38]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[39]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[40]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[41]  Ken Perlin,et al.  Accelerating Eulerian Fluid Simulation With Convolutional Networks , 2016, ICML.

[42]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[43]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[44]  Bin Dong,et al.  PDE-Net: Learning PDEs from Data , 2017, ICML.

[45]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[46]  Maziar Raissi,et al.  Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations , 2018, J. Mach. Learn. Res..

[47]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[48]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[49]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.