An Instability in Variational Inference for Topic Models

Topic models are Bayesian models that are frequently used to capture the latent structure of certain corpora of documents or images. Each data element in such a corpus (for instance each item in a collection of scientific articles) is regarded as a convex combination of a small number of vectors corresponding to `topics' or `components'. The weights are assumed to have a Dirichlet prior distribution. The standard approach towards approximating the posterior is to use variational inference algorithms, and in particular a mean field approximation. We show that this approach suffers from an instability that can produce misleading conclusions. Namely, for certain regimes of the model parameters, variational inference outputs a non-trivial decomposition into topics. However --for the same parameter values-- the data contain no actual information about the true decomposition, and hence the output of the algorithm is uncorrelated with the true topic decomposition. Among other consequences, the estimated posterior mean is significantly wrong, and estimated Bayesian credible regions do not achieve the nominal coverage. We discuss how this instability is remedied by more accurate mean field approximations.

[1]  Bo Wang,et al.  Convergence and Asymptotic Normality of Variational Bayesian Approximations for Expon , 2004, UAI.

[2]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[3]  Florent Krzakala,et al.  Estimation in the Spiked Wigner Model: A Short Proof of the Replica Formula , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[4]  M. Stephens,et al.  fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets , 2014, Genetics.

[5]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[6]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[7]  D. Féral,et al.  The Largest Eigenvalue of Rank One Deformation of Large Wigner Matrices , 2006, math/0605624.

[8]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[9]  Florent Krzakala,et al.  Mutual information in rank-one matrix estimation , 2016, 2016 IEEE Information Theory Workshop (ITW).

[10]  Kurt Binder,et al.  Finite size scaling analysis of ising model block distribution functions , 1981 .

[11]  Anderson Y. Zhang,et al.  Theoretical and Computational Guarantees of Mean Field Variational Inference for Community Detection , 2017, The Annals of Statistics.

[12]  Nicolas Macris,et al.  Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula , 2016, NIPS.

[13]  Joel A. Tropp,et al.  Factoring nonnegative matrices with linear programs , 2012, NIPS.

[14]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[16]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[17]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[18]  D. Titterington,et al.  Convergence properties of a general algorithm for calculating variational Bayesian estimates for a normal mixture model , 2006 .

[19]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[20]  David Dunson,et al.  Bayesian Factorizations of Big Sparse Tensors , 2013, Journal of the American Statistical Association.

[21]  Alain Celisse,et al.  Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model , 2011, 1105.3288.

[22]  Léo Miolane Fundamental limits of low-rank matrix estimation , 2017, 1702.00473.

[23]  Andrea Montanari,et al.  State Evolution for Approximate Message Passing with Non-Separable Functions , 2017, Information and Inference: A Journal of the IMA.

[24]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[25]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[26]  Raj Rao Nadakuditi,et al.  The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.

[27]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[28]  M. Opper,et al.  Adaptive and self-averaging Thouless-Anderson-Palmer mean-field theory for probabilistic modeling. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Yun Yang,et al.  On Statistical Optimality of Variational Bayes , 2018, AISTATS.

[30]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[31]  Andrea Montanari,et al.  Information-theoretically optimal sparse PCA , 2014, 2014 IEEE International Symposium on Information Theory.

[32]  Andrea Montanari,et al.  On the Limitation of Spectral Methods: From the Gaussian Hidden Clique Problem to Rank One Perturbations of Gaussian Tensors , 2014, IEEE Transactions on Information Theory.

[33]  W. Eric L. Grimson,et al.  Spatial Latent Dirichlet Allocation , 2007, NIPS.

[34]  David M. Blei,et al.  Frequentist Consistency of Variational Bayes , 2017, Journal of the American Statistical Association.

[35]  Adel Javanmard,et al.  State Evolution for General Approximate Message Passing Algorithms, with Applications to Spatial Coupling , 2012, ArXiv.

[36]  Bo Wang,et al.  Inadequacy of interval estimates corresponding to variational Bayesian approximations , 2005, AISTATS.

[37]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[38]  M. Mézard,et al.  Spin Glass Theory and Beyond , 1987 .

[39]  R. Palmer,et al.  Solution of 'Solvable model of a spin glass' , 1977 .

[40]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[41]  Florent Krzakala,et al.  Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications , 2017, ArXiv.

[42]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[43]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[44]  A. Montanari,et al.  Asymptotic mutual information for the balanced binary stochastic block model , 2016 .

[45]  Xiangyu Chang,et al.  Asymptotic Normality of Maximum Likelihood and its Variational Approximation for Stochastic Blockmodels , 2012, ArXiv.

[46]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Andrea Montanari,et al.  Message passing algorithms for compressed sensing: I. motivation and construction , 2009, 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo).

[48]  Satish Babu Korada,et al.  Exact Solution of the Gauge Symmetric p-Spin Glass Model on a Complete Graph , 2009 .

[49]  Matthew Stephens,et al.  Variational Inference of Population Structure in Large SNP Datasets , 2013, bioRxiv.

[50]  Michael I. Jordan,et al.  Linear Response Methods for Accurate Covariance Estimates from Mean Field Variational Bayes , 2015, NIPS.

[51]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[52]  Andrea Montanari,et al.  The dynamics of message passing on dense graphs, with applications to compressed sensing , 2010, 2010 IEEE International Symposium on Information Theory.

[53]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[54]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[55]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[56]  C. Tracy,et al.  Introduction to Random Matrices , 1992, hep-th/9210073.

[57]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[58]  Andrea Montanari,et al.  Estimation of low-rank matrices via approximate message passing , 2017, The Annals of Statistics.