Distributional Gradient Matching for Learning Uncertain Neural Dynamics Models

Differential equations in general and neural ODEs in particular are an essential technique in continuous-time system identification. While many deterministic learning algorithms have been designed based on numerical integration via the adjoint method, many downstream tasks such as active learning, exploration in reinforcement learning, robust control, or filtering require accurate estimates of predictive uncertainties. In this work, we propose a novel approach towards estimating epistemically uncertain neural ODEs, avoiding the numerical integration bottleneck. Instead of modeling uncertainty in the ODE parameters, we directly model uncertainties in the state space. Our algorithm – distributional gradient matching (DGM) – jointly trains a smoother and a dynamics model and matches their gradients via minimizing a Wasserstein loss. Our experiments show that, compared to traditional approximate inference methods based on numerical integration, our approach is faster to train, faster at predicting previously unseen trajectories, and in the context of neural ODEs, significantly more accurate.

[1]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[2]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[3]  Matthew J. Johnson,et al.  Learning Differential Equations that are Easy to Solve , 2020, NeurIPS.

[4]  Giuseppe De Nicolao,et al.  A new kernel-based approach for linear system identification , 2010, Autom..

[5]  Pietro Lio,et al.  Neural ODE Processes , 2021, ICLR.

[6]  Stefan Bauer,et al.  Scalable Variational Inference for Dynamical Systems , 2017, NIPS.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Torsten P. Bohlin,et al.  Practical Grey-box Process Identification: Theory and Applications , 2006 .

[9]  Patrick Kidger,et al.  "Hey, that's not an ODE": Faster ODE Adjoints with 12 Lines of Code , 2020, ArXiv.

[10]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[11]  Jason Yosinski,et al.  Hamiltonian Neural Networks , 2019, NeurIPS.

[12]  Philippe Wenk,et al.  SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives , 2020, ArXiv.

[13]  Neil D. Lawrence,et al.  Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes , 2008, NIPS.

[14]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15]  R. Dandekar,et al.  Bayesian Neural Ordinary Differential Equations , 2020, ArXiv.

[16]  J. Duncan,et al.  Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE , 2020, ICML.

[17]  J. Varah A Spline Least Squares Method for Numerical Parameter Estimation in Differential Equations , 1982 .

[18]  Arjan van der Schaft,et al.  Interconnection and damping assignment passivity-based control of port-controlled Hamiltonian systems , 2002, Autom..

[19]  David Duvenaud,et al.  Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations , 2021, ArXiv.

[20]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[21]  Weiming Hu,et al.  Probabilistic forecasting using deep generative models , 2019, GeoInformatica.

[22]  Lennart Ljung,et al.  Kernel methods in system identification, machine learning and function estimation: A survey , 2014, Autom..

[23]  Christian Daniel,et al.  Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems , 2020, ICML.

[24]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[25]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[26]  Philippe Wenk,et al.  ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems , 2019, AAAI.

[27]  Jean-Jacques E. Slotine,et al.  Linear Matrix Inequalities for Physically Consistent Inertial Parameter Identification: A Statistical Perspective on the Mass Distribution , 2017, IEEE Robotics and Automation Letters.

[28]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[29]  Thomas B. Schön,et al.  Linearly constrained Gaussian processes , 2017, NIPS.

[30]  Daniela De Angelis,et al.  Variational inference for nonlinear ordinary differential equations , 2021, AISTATS.

[31]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[32]  Lennart Ljung,et al.  Model Validation and Model Error Modeling , 1999 .

[33]  Markus Heinonen,et al.  ODE2VAE: Deep generative second order ODEs with Bayesian neural networks , 2019, NeurIPS.

[34]  Dirk Husmeier,et al.  ODE parameter inference using adaptive gradient matching with Gaussian processes , 2013, AISTATS.

[35]  M. Spong,et al.  Robot Modeling and Control , 2005 .

[36]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[37]  D. S. Jones,et al.  Differential Equations and Mathematical Biology , 1983 .

[38]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[39]  J. Wittenburg,et al.  Dynamics of systems of rigid bodies , 1977 .

[40]  Sekhar Tatikonda,et al.  MALI: A memory efficient and reverse accurate integrator for Neural ODEs , 2021, ICLR.

[41]  Håkan Hjalmarsson,et al.  From experiment design to closed-loop control , 2005, Autom..

[42]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[43]  Romeo Ortega,et al.  On dynamic regressor extension and mixing parameter estimators: Two Luenberger observers interpretations , 2018, Autom..

[44]  Paris Perdikaris,et al.  Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations , 2019, J. Comput. Phys..

[45]  Brian Ingalls,et al.  Mathematical Modeling in Systems Biology: An Introduction , 2013 .

[46]  Philippe Wenk,et al.  AReS and MaRS - Adversarial and MMD-Minimizing Regression for SDEs , 2019, ICML.

[47]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[48]  Nico S. Gorbach,et al.  Fast Gaussian Process Based Gradient Matching for Parameter Identification in Systems of Nonlinear ODEs , 2018, AISTATS.

[49]  Carl E. Rasmussen,et al.  Derivative Observations in Gaussian Process Models of Dynamic Systems , 2002, NIPS.

[50]  Emilio Frazzoli,et al.  Sampling-based algorithms for optimal motion planning , 2011, Int. J. Robotics Res..

[51]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[52]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[53]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.