Particle-based energetic variational inference

We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law . Using the EVI framework, we can derive many existing particle-based variational inference (ParVI) methods, including the popular Stein variational gradient descent (SVGD). More importantly, many new ParVI schemes can be created under this framework. For illustration, we propose a new particle-based EVI scheme, which performs the particle-based approximation of the density first and then uses the approximated density in the variational procedure, or “Approximation-then-Variation” for short. Thanks to this order of approximation and variation, the new scheme can maintain the variational structure at the particle level, and can significantly decrease the KL-divergence in each iteration. Numerical experiments show the proposed method outperforms some existing ParVI methods in terms of fidelity to the target distribution.

[1]  L. Ambrosio,et al.  Stability of flows associated to gradient vector fields and convergence of iterated transport maps , 2006, manuscripta mathematica.

[2]  Pierre Degond,et al.  A Deterministic Approximation of Diffusion Equations Using Particles , 1990, SIAM J. Sci. Comput..

[3]  Chun Liu,et al.  On Lagrangian schemes for the multidimensional porous medium equations by a discrete energetic variational approach , 2019, 1905.12225.

[4]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[5]  Tomaso A. Poggio,et al.  Approximate inference with Wasserstein gradient flows , 2018, AISTATS.

[6]  Lei Li,et al.  A stochastic version of Stein Variational Gradient Descent for efficient sampling , 2019, Communications in Applied Mathematics and Computational Science.

[7]  C. Villani Optimal Transport: Old and New , 2008 .

[8]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[9]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[10]  P. Hohenberg,et al.  Theory of Dynamic Critical Phenomena , 1977 .

[11]  Youssef M. Marzouk,et al.  Bayesian inference with optimal maps , 2011, J. Comput. Phys..

[12]  Bai Li,et al.  A Unified Particle-Optimization Framework for Scalable Bayesian Sampling , 2018, UAI.

[13]  Noboru Murata,et al.  Transport Analysis of Infinitely Deep Neural Network , 2016, J. Mach. Learn. Res..

[14]  Jianfeng Lu,et al.  Scaling Limit of the Stein Variational Gradient Descent: The Mean Field Regime , 2018, SIAM J. Math. Anal..

[15]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[16]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[17]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[18]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[19]  Q. Du,et al.  The phase field method for geometric moving interfaces and their numerical approximations , 2019, Geometric Partial Differential Equations - Part I.

[20]  Chang Liu,et al.  Understanding and Accelerating Particle-Based Variational Inference , 2018, ICML.

[21]  R. Temam,et al.  Mathematical Modeling in Continuum Mechanics: Index , 2000 .

[22]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[23]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[24]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[27]  Michel Verleysen,et al.  About the locality of kernels in high-dimensional spaces , 2005 .

[28]  Andrew M. Stuart,et al.  A First Course in Continuum Mechanics: Bibliography , 2008 .

[29]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[30]  Chun Liu,et al.  An Introduction of Elastic Complex Fluids: An Energetic Variational Approach , 2009 .

[31]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[32]  L. Onsager Reciprocal Relations in Irreversible Processes. II. , 1931 .

[33]  Chang Liu,et al.  Riemannian Stein Variational Gradient Descent for Bayesian Inference , 2017, AAAI.

[34]  José A. Carrillo,et al.  A Lagrangian Scheme for the Solution of Nonlinear Diffusion Equations Using Moving Simplex Meshes , 2017, J. Sci. Comput..

[35]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Arieh Iserles,et al.  A First Course in the Numerical Analysis of Differential Equations: The diffusion equation , 2008 .

[37]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[38]  Tiangang Cui,et al.  A Stein variational Newton method , 2018, NeurIPS.

[39]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[40]  Nematollah Batmanghelich,et al.  Deep Diffeomorphic Normalizing Flows , 2018, ArXiv.

[41]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[42]  Chun Liu,et al.  A Variational Lagrangian Scheme for a Phase Field Model: A Discrete Energetic Variational Approach , 2020, SIAM J. Sci. Comput..

[43]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[44]  Arthur Gretton,et al.  Maximum Mean Discrepancy Gradient Flow , 2019, NeurIPS.

[45]  Stanley Osher,et al.  Laplacian smoothing gradient descent , 2018, Research in the Mathematical Sciences.

[46]  Lord Rayleigh,et al.  Note on the Numerical Calculation of the Roots of Fluctuating Functions , 1873 .

[47]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[48]  Peng Chen,et al.  Projected Stein Variational Newton: A Fast and Scalable Bayesian Inference Method in High Dimensions , 2019, NeurIPS.

[49]  L. I. Sedov,et al.  A course in continuum mechanics , 1971 .

[50]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[51]  D. Matthes,et al.  A variational formulation of the BDF2 method for metric gradient flows , 2017, ESAIM: Mathematical Modelling and Numerical Analysis.

[52]  J. Carrillo,et al.  On the asymptotic behavior of the gradient flow of a polyconvex functional , 2009 .

[53]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[54]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[55]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[56]  Heikki Haario,et al.  Adaptive proposal distribution for random walk Metropolis algorithm , 1999, Comput. Stat..

[57]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[58]  J. Carrillo,et al.  A blob method for diffusion , 2017, Calculus of Variations and Partial Differential Equations.

[59]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[60]  Mi-Ho Giga,et al.  Variational Modeling and Complex Fluids , 2017 .

[61]  Yoshikazu Giga,et al.  Handbook of Mathematical Analysis in Mechanics of Viscous Fluids , 2017 .

[62]  Qiang Liu,et al.  Stein Variational Gradient Descent With Matrix-Valued Kernels , 2019, NeurIPS.

[63]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[64]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[65]  L. C. Evans,et al.  Diffeomorphisms and Nonlinear Heat Flows , 2005, SIAM Journal on Mathematical Analysis.

[66]  S. Mas-Gallic,et al.  Presentation and analysis of a diffusion-velocity method , 1999 .

[67]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[68]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[69]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[70]  E. Tabak,et al.  DENSITY ESTIMATION BY DUAL ASCENT OF THE LOG-LIKELIHOOD ∗ , 2010 .

[71]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[72]  Andrew M. Stuart,et al.  Inverse problems: A Bayesian perspective , 2010, Acta Numerica.

[73]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.