A Geometric Variational Approach to Bayesian Inference

Abstract We propose a novel Riemannian geometric framework for variational inference in Bayesian models based on the nonparametric Fisher–Rao metric on the manifold of probability density functions. Under the square-root density representation, the manifold can be identified with the positive orthant of the unit hypersphere in , and the Fisher–Rao metric reduces to the standard metric. Exploiting such a Riemannian structure, we formulate the task of approximating the posterior distribution as a variational problem on the hypersphere based on the α-divergence. This provides a tighter lower bound on the marginal distribution when compared to, and a corresponding upper bound unavailable with, approaches based on the Kullback–Leibler divergence. We propose a novel gradient-based algorithm for the variational problem based on Fréchet derivative operators motivated by the geometry of , and examine its properties. Through simulations and real data applications, we demonstrate the utility of the proposed geometric framework and algorithm on several Bayesian models. Supplementary materials for this article are available online.

[1]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[2]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[3]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[4]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[5]  Anuj Srivastava,et al.  Riemannian Analysis of Probability Density Functions with Applications in Vision , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[7]  D. M. Titterington,et al.  Variational approximations in Bayesian model selection for finite mixture distributions , 2007, Comput. Stat. Data Anal..

[8]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[9]  Naonori Ueda,et al.  Bayesian model search for mixture models based on optimizing variational bounds , 2002, Neural Networks.

[10]  Robert E Weiss,et al.  Bayesian methods for data analysis. , 2010, American journal of ophthalmology.

[11]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[12]  G. Crooks On Measures of Entropy and Information , 2015 .

[13]  Marie Frei,et al.  Functional Data Analysis With R And Matlab , 2016 .

[14]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[15]  J. Ghosh,et al.  Posterior consistency of logistic Gaussian process priors in density estimation , 2007 .

[16]  Benedikt Wirth,et al.  Optimization Methods on Riemannian Manifolds and Their Application to Shape Space , 2012, SIAM J. Optim..

[17]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[18]  David M. Blei,et al.  Stochastic Structured Variational Inference , 2014, AISTATS.

[19]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[20]  Chong Wang,et al.  Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[21]  Sebastian Kurtek,et al.  Bayesian sensitivity analysis with the Fisher–Rao metric , 2015 .

[22]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[23]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[24]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[27]  Martin Bauer,et al.  Diffeomorphic Random Sampling Using Optimal Information Transport , 2017, GSI.

[28]  B. Khesin,et al.  Geometry of Diffeomorphism Groups, Complete integrability and Geometric statistics , 2013 .

[29]  David Barber,et al.  Tractable Variational Structures for Approximating Graphical Models , 1998, NIPS.

[30]  Richard E. Turner,et al.  Black-box α-divergence minimization , 2016, ICML 2016.

[31]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[32]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[33]  Martin Bauer,et al.  Diffeomorphic Density Matching by Optimal Information Transport , 2015, SIAM J. Imaging Sci..

[34]  S. Kurtek A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling , 2015, 1505.06954.

[35]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[36]  Hong Chang,et al.  SVC2004: First International Signature Verification Competition , 2004, ICBA.

[37]  Lisa A. Weissfeld,et al.  Approximation of certain multivariate integrals , 1991 .

[38]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[39]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[41]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[42]  Van Der Vaart,et al.  Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[43]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[44]  Aki Vehtari,et al.  Laplace approximation for logistic Gaussian process density estimation and regression , 2012, 1211.0174.

[45]  Tom Leonard Density Estimation, Stochastic Processes and Prior Information , 1978 .

[46]  Dustin Tran,et al.  Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..

[47]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[48]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[49]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[50]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[51]  B. Shahbaba,et al.  A Geometric View of Posterior Approximation , 2015, 1510.00861.