Theoretical guarantees for sampling and inference in generative models with latent diffusions

We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion. We make the following contributions: We provide a unified viewpoint on both sampling and variational inference in such generative models through the lens of stochastic control. We quantify the expressiveness of diffusion-based generative models. Specifically, we show that one can efficiently sample from a wide class of terminal target distributions by choosing the drift of the latent diffusion from the class of multilayer feedforward neural nets, with the accuracy of sampling measured by the Kullback-Leibler divergence to the target distribution. Finally, we present and analyze a scheme for unbiased simulation of generative models with latent diffusions and provide bounds on the variance of the resulting estimators. This scheme can be implemented as a deep generative model with a random number of layers.

[1]  Michele Pavon,et al.  Stochastic control and nonequilibrium thermodynamical systems , 1989 .

[2]  A. Stephen McGough,et al.  Black-Box Variational Inference for Stochastic Differential Equations , 2018, ICML.

[3]  Matus Telgarsky,et al.  Neural Networks and Rational Functions , 2017, ICML.

[4]  W. Fleming Exit probabilities and optimal stochastic control , 1977 .

[5]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[6]  Paolo Dai Pra,et al.  A stochastic control approach to reciprocal diffusion processes , 1991 .

[7]  Ward Whitt,et al.  Large deviations behavior of counting processes and their inverses , 1994, Queueing Syst. Theory Appl..

[8]  Vladimir Braverman,et al.  The Physical Systems Behind Optimization Algorithms , 2018, NeurIPS.

[9]  Paul Mineiro,et al.  A Monte Carlo EM Approach for Partially Observable Diffusion Processes: Theory and Applications to Neural Networks , 2002, Neural Computation.

[10]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[11]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[12]  Tommi S. Jaakkola,et al.  Learning Population-Level Diffusions with Generative RNNs , 2016, ICML.

[13]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[14]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[15]  Ronen Eldan,et al.  Regularization under diffusion and anti-concentration of the information content , 2014, 1410.3887.

[16]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[17]  J. Lehec,et al.  Representation formula for the entropy and functional inequalities , 2010, 1006.3028.

[18]  H. Bateman,et al.  Higher Transcendental Functions [Volumes I-III] , 1953 .

[19]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[20]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[21]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[22]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[23]  Hans Föllmer,et al.  An entropy approach to the time reversal of diffusion processes , 1985 .

[24]  Long Chen,et al.  Maximum Principle Based Algorithms for Deep Learning , 2017, J. Mach. Learn. Res..

[25]  Patrik Andersson,et al.  Unbiased simulation of stochastic differential equations using parametrix expansions , 2017 .

[26]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[27]  Vlad Bally,et al.  A probabilistic interpretation of the parametrix method. , 2015, 1510.06909.

[28]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[29]  P. Protter Stochastic integration and differential equations , 1990 .

[30]  Carl Graham,et al.  Stochastic Simulation and Monte Carlo Methods: Mathematical Foundations of Stochastic Simulation , 2013 .

[31]  Xiaolu Tan,et al.  Unbiased simulation of stochastic differential equations , 2015, 1504.06107.

[32]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[33]  J. Janssen,et al.  Deterministic and Stochastic Optimal Control , 2013 .

[34]  Wendell H. Fleming,et al.  Stochastic variational formula for fundamental solutions of parabolic PDE , 1985 .

[35]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[36]  B. Jamison The Markov processes of Schrödinger , 1975 .

[37]  P. Dupuis,et al.  A variational representation for certain functionals of Brownian motion , 1998 .

[38]  S. Sheu Some Estimates of the Transition Density of a Nondegenerate Diffusion Markov Process , 1991 .

[39]  C. Hammer Higher transcendental functions, Volume I: by Harry Bateman (compiled by the staff of the Bateman Manuscript Project). 302 pages, 16 × 24 cm. New York, McGraw-Hill Book Co., Inc., 1953. Price, $6.50 , 1953 .

[40]  Daniel J. Arrigo,et al.  An Introduction to Partial Differential Equations , 2017, An Introduction to Partial Differential Equations.

[41]  R. Adamczak A tail inequality for suprema of unbounded empirical processes with applications to Markov chains , 2007, 0709.3110.

[42]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[43]  Xin Li,et al.  Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer , 1996, Neurocomputing.

[44]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .