Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence

Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.

[1]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[2]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[3]  Dennis Prangle Distilling importance sampling , 2019 .

[4]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[5]  Lei Wu,et al.  Irreversible samplers from jump and continuous Markov processes , 2016, Stat. Comput..

[6]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[7]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[8]  I-Cheng Yeh,et al.  Modeling of strength of high-performance concrete using artificial neural networks , 1998 .

[9]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[10]  Francisco J. R. Ruiz,et al.  A Contrastive Divergence for Combining Variational Inference and MCMC , 2019, ICML.

[11]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[12]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[13]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[14]  P. Diaconis,et al.  The sample size required in importance sampling , 2015, 1511.01437.

[15]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[16]  Trevor Campbell,et al.  Validated Variational Inference via Practical Posterior Error Bounds , 2019, AISTATS.

[17]  Frank Nielsen,et al.  Guaranteed Bounds on the Kullback–Leibler Divergence of Univariate Mixtures , 2016, IEEE Signal Processing Letters.

[18]  R. Douc,et al.  Minimum variance importance sampling via Population Monte Carlo , 2007 .

[19]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[22]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[23]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[24]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[25]  J. Geweke,et al.  Bayesian Inference in Econometric Models Using Monte Carlo Integration , 1989 .

[26]  Adji B. Dieng,et al.  Variational Inference via χ Upper Bound Minimization , 2017 .

[27]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[28]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[29]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[30]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[31]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[32]  Yi-An Ma,et al.  Black-Box Variational Inference as Distilled Langevin Dynamics , 2020 .

[33]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[34]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[35]  Jean-Marie Cornuet,et al.  Adaptive Multiple Importance Sampling , 2009, 0907.1254.

[36]  Tong Zhang,et al.  Sequential greedy approximation for certain convex optimization problems , 2003, IEEE Trans. Inf. Theory.

[37]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[38]  Xiangyu Wang,et al.  Boosting Variational Inference , 2016, ArXiv.

[39]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[40]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[41]  Gunnar Rätsch,et al.  Boosting Variational Inference: an Optimization Perspective , 2017, AISTATS.

[42]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[43]  Adrian E Raftery,et al.  Estimating and Projecting Trends in HIV/AIDS Generalized Epidemics Using Incremental Mixture Importance Sampling , 2010, Biometrics.

[44]  Ryan P. Adams,et al.  Variational Boosting: Iteratively Refining Posterior Approximations , 2016, ICML.

[45]  Jesús Cerquides,et al.  A Robust Solution to Variational Importance Sampling of Minimum Variance , 2020, Entropy.

[46]  Jean-Michel Marin,et al.  Adaptive importance sampling in general mixture classes , 2007, Stat. Comput..

[47]  Trevor Campbell,et al.  Universal Boosting Variational Inference , 2019, NeurIPS.

[48]  H. Kahn,et al.  Methods of Reducing Sample Size in Monte Carlo Computations , 1953, Oper. Res..

[49]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[50]  Andrés R. Masegosa,et al.  Scalable importance sampling estimation of Gaussian mixture posteriors in Bayesian networks , 2018, Int. J. Approx. Reason..

[51]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[52]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.