Scalable Inference for Logistic-Normal Topic Models

Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. To improve time efficiency, we further present a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Extensive empirical results demonstrate the promise.

[1]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[2]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  C. Robert Simulation of truncated normal variables , 2009, 0907.4010.

[5]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[6]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[7]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[8]  James G. Scott,et al.  Default Bayesian analysis for multi-way tables: a data-augmentation approach , 2011, 1109.4180.

[9]  Chong Wang,et al.  The Discrete Infinite Logistic Normal Distribution for Mixed-Membership Modeling , 2011, AISTATS.

[10]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[13]  Andrew McCallum,et al.  Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors , 2008 .

[14]  Bo Zhang,et al.  Sparse online topic models , 2013, WWW.

[15]  Ning Chen,et al.  Generalized Relational Topic Models with Data Augmentation , 2013, IJCAI.

[16]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[17]  W. Rudin Principles of mathematical analysis , 1964 .

[18]  David B. Dunson,et al.  Lognormal and Gamma Mixed Negative Binomial Regression , 2012, ICML.

[19]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[20]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[21]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.