A Geometric View of Posterior Approximation

Although Bayesian methods are robust and principled, their application in practice could be limited since they typically rely on computationally intensive Markov Chain Monte Carlo algorithms for their implementation. One possible solution is to find a fast approximation of posterior distribution and use it for statistical inference. For commonly used approximation methods, such as Laplace and variational free energy, the objective is mainly defined in terms of computational convenience as opposed to a true distance measure between the target and approximating distributions. In this paper, we provide a geometric view of posterior approximation based on a valid distance measure derived from ambient Fisher geometry. Our proposed framework is easily generalizable and can inspire a new class of methods for approximate Bayesian inference.

[1]  J. Lafferty,et al.  Riemannian Geometry and Statistical Machine Learning , 2015 .

[2]  Babak Shahbaba,et al.  Spherical Hamiltonian Monte Carlo for Constrained Target Distributions , 2013, ICML.

[3]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[4]  Babak Shahbaba,et al.  Split Hamiltonian Monte Carlo , 2011, Stat. Comput..

[5]  Mátyás A. Sustik,et al.  Sparse Approximate Manifolds for Differential Geometric MCMC , 2012, NIPS.

[6]  J. M. Sanz-Serna,et al.  Hybrid Monte Carlo on Hilbert spaces , 2011 .

[7]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[9]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[10]  Christian P. Robert,et al.  A vanilla Rao--Blackwellization of Metropolis--Hastings algorithms , 2009, 0904.2144.

[11]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[12]  Liam Paninski,et al.  Efficient Markov Chain Monte Carlo Methods for Decoding Neural Spike Trains , 2011, Neural Computation.

[13]  Chao Yang,et al.  Learn From Thy Neighbor: Parallel-Chain and Regional Adaptive MCMC , 2009 .

[14]  Jean-Michel Marin,et al.  Adaptive importance sampling in general mixture classes , 2007, Stat. Comput..

[15]  R. Douc,et al.  Minimum variance importance sampling via Population Monte Carlo , 2007 .

[16]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[17]  Max Welling,et al.  Accelerated Variational Dirichlet Process Mixtures , 2006, NIPS.

[18]  C. Andrieu,et al.  On the ergodicity properties of some adaptive MCMC algorithms , 2006, math/0610317.

[19]  J. Møller,et al.  An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants , 2006 .

[20]  Anthony Brockwell Parallel Markov chain Monte Carlo Simulation by Pre-Fetching , 2006 .

[21]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[22]  Radford M. Neal The Short-Cut Metropolis Method , 2005, math/0508060.

[23]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[24]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[25]  Radford M. Neal Slice Sampling , 2000, physics/0009028.

[26]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[27]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[28]  Nando de Freitas,et al.  Variational MCMC , 2001, UAI.

[29]  G. Warnes The Normal Kernel Coupler: An Adaptive Markov Chain Monte Carlo Method for Efficiently Sampling From Multi-Modal Distributions , 2001 .

[30]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[31]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[32]  G. Roberts,et al.  Adaptive Markov Chain Monte Carlo through Regeneration , 1998 .

[33]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[34]  Radford M. Neal Sampling from multimodal distributions using tempered transitions , 1996, Stat. Comput..

[35]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[36]  Bin Yu,et al.  Regeneration in Markov chain samplers , 1995 .

[37]  Huaiyu Zhu,et al.  Information geometric measurements of generalisation , 1995 .

[38]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .

[39]  A. Dawid Further Comments on Some Comments on a Paper by Bradley Efron , 1977 .

[40]  L. Goddard Information Theory , 1962, Nature.