Nonparametric variational inference

Variational methods are widely used for approximate posterior inference. However, their use is typically limited to families of distributions that enjoy particular conjugacy properties. To circumvent this limitation, we propose a family of variational approximations inspired by nonparametric kernel density estimation. The locations of these kernels and their bandwidth are treated as variational parameters and optimized to improve an approximate lower bound on the marginal likelihood of the data. Unlike most other variational approximations, using multiple kernels allows the approximation to capture multiple modes of the posterior. We demonstrate the efficacy of the nonparametric approximation with a hierarchical logistic regression model and a nonlinear matrix factorization model. We obtain predictive performance as good as or better than more specialized variational methods and MCMC approximations. The method is easy to apply to graphical models for which standard variational methods are difficult to derive.

[1]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[2]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[3]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[4]  Neil D. Lawrence,et al.  Approximating Posterior Distributions in Belief Networks Using Mixtures , 1997, NIPS.

[5]  T. Jaakkola,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[6]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[7]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[8]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[9]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[10]  Neil D. Lawrence,et al.  Reducing the variability in cDNA microarray image processing by Bayesian inference , 2004, Bioinform..

[11]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[12]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/Crc Computer Science & Data Analysis) , 2007 .

[13]  Juha Karhunen,et al.  Blind separation of nonlinear mixtures by variational Bayesian learning , 2007, Digit. Signal Process..

[14]  Hugh F. Durrant-Whyte,et al.  On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[15]  Maximum A Posteriori (MAP) , 2009, Encyclopedia of Biometrics.

[16]  Guillaume Bouchard,et al.  Split variational inference , 2009, ICML '09.

[17]  Padhraic Smyth,et al.  Particle-based Variational Inference for Continuous Systems , 2009, NIPS.

[18]  Jon D. McAuliffe,et al.  Variational Inference for Large-Scale Models of Discrete Choice , 2007, 0712.2526.

[19]  Mohammad Emtiyaz Khan,et al.  Variational bounds for mixed-data factor analysis , 2010, NIPS.

[20]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[21]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[22]  David M. Blei,et al.  A topographic latent source model for fMRI data , 2011, NeuroImage.