Alpha-Beta Divergence For Variational Inference

This paper introduces a variational approximation framework using direct optimization of what is known as the {\it scale invariant Alpha-Beta divergence} (sAB divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the R{\'e}nyi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimized directly by repurposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems.

[1]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[2]  Frank Nielsen,et al.  On the chi square and higher-order chi distances for approximating f-divergences , 2013, IEEE Signal Processing Letters.

[3]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[4]  A. Rényi On Measures of Entropy and Information , 1961 .

[5]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[6]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[7]  Leandro Pardo,et al.  A generalized divergence for statistical inference , 2017 .

[8]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[9]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[10]  A. Basu,et al.  Robust Bayes estimation using the density power divergence , 2016 .

[11]  S. Eguchi,et al.  Robust parameter estimation with a small bias against heavy contamination , 2008 .

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Peter Harremoës,et al.  Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[14]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[15]  Richard E. Turner,et al.  Rényi Divergence Variational Inference , 2016, NIPS.

[16]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[17]  Finale Doshi-Velez,et al.  Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks , 2016, ICLR.

[18]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[21]  Masashi Sugiyama,et al.  Variational Inference based on Robust Divergences , 2017, AISTATS.

[22]  Reuven Y. Rubinstein,et al.  Monte Carlo Optimization , 2008 .

[23]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[24]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[25]  Heather C. Hill,et al.  Learning and Policy , 2001 .