Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes

We propose a novel scheme for fitting heavily parameterized non-linear stochastic differential equations (SDEs). We assign a prior on the parameters of the SDE drift and diffusion functions to achieve a Bayesian model. We then infer this model using the well-known local reparameterized trick for the first time for empirical Bayes, i.e. to integrate out the SDE parameters. The model is then fit by maximizing the likelihood of the resultant marginal with respect to a potentially large number of hyperparameters, which prohibits stable training. As the prior parameters are marginalized, the model also no longer provides a principled means to incorporate prior knowledge. We overcome both of these drawbacks by deriving a training loss that comprises the marginal likelihood of the predictor and a PAC-Bayesian complexity penalty. We observe on synthetic as well as real-world time series prediction tasks that our method provides an improved model fit accompanied with favorable extrapolation properties when provided a partial description of the environment dynamics. Hence, we view the outcome as a promising attempt for building cutting-edge hybrid learning systems that effectively combine first-principle physics and data-driven approaches.

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  T. Faniran Numerical Solution of Stochastic Differential Equations , 2015 .

[3]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[4]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[5]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[6]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[7]  Theodoros Damoulas,et al.  Generalized Variational Inference: Three arguments for deriving new Posteriors , 2019 .

[8]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[9]  Samuel Kaski,et al.  Deep learning with differential Gaussian process flows , 2018, AISTATS.

[10]  David A. McAllester PAC-Bayesian model averaging , 1999, COLT '99.

[11]  Murat A. Erdogdu,et al.  Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond , 2019, NeurIPS.

[12]  Daniel Durstewitz,et al.  A state space approach for piecewise-linear recurrent neural networks for identifying computational dynamics from neural measurements , 2016, PLoS Comput. Biol..

[13]  Daniel Durstewitz A State Space Approach for Piecewise-Linear Recurrent Neural Networks for Reconstructing Nonlinear Dynamics from Neural Measurements , 2016, ArXiv.

[14]  James Hensman,et al.  On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes , 2015, AISTATS.

[15]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[16]  Alexandre Lacoste,et al.  PAC-Bayesian Theory Meets Bayesian Inference , 2016, NIPS.

[17]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[18]  David A. McAllester PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.

[19]  Theodoros Damoulas,et al.  Generalized Variational Inference , 2019, ArXiv.

[20]  Murat Sensoy,et al.  Evidential Deep Learning to Quantify Classification Uncertainty , 2018, NeurIPS.

[21]  Arnaud Doucet,et al.  On Particle Methods for Parameter Estimation in State-Space Models , 2014, 1412.8695.

[22]  Ali Ramadhan,et al.  Universal Differential Equations for Scientific Machine Learning , 2020, ArXiv.

[23]  Melih Kandemir,et al.  Differential Bayesian Neural Nets , 2019, ArXiv.

[24]  Gintare Karolina Dziugaite,et al.  Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.

[25]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[26]  Duy Nguyen-Tuong,et al.  Probabilistic Recurrent State-Space Models , 2018, ICML.

[27]  Isaac Dialsingh,et al.  Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[28]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[29]  Andreas Doerr,et al.  Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds , 2018, NeurIPS.

[30]  Andreas Maurer,et al.  A Note on the PAC Bayesian Theorem , 2004, ArXiv.

[31]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[32]  Pierre Alquier,et al.  On the properties of variational approximations of Gibbs posteriors , 2015, J. Mach. Learn. Res..

[33]  Dan Cornford,et al.  Variational Inference for Diffusion Processes , 2007, NIPS.

[34]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[35]  Zhe Gan,et al.  Deep Temporal Sigmoid Belief Networks for Sequence Modeling , 2015, NIPS.

[36]  Markus Heinonen,et al.  Learning unknown ODE models with Gaussian processes , 2018, ICML.