Rethinking LDA: Why Priors Matter

Implementations of topic models typically use symmetric Dirichlet priors with fixed concentration parameters, with the implicit assumption that such "smoothing parameters" have little practical effect. In this paper, we explore several classes of structured priors for topic models. We find that an asymmetric Dirichlet prior over the document-topic distributions has substantial advantages over a symmetric prior, while an asymmetric prior over the topic-word distributions provides no real benefit. Approximation of this prior structure through simple, efficient hyperparameter optimization steps is sufficient to achieve these performance gains. The prior structure we advocate substantially increases the robustness of topic models to variations in the number of topics and to the highly skewed word frequency distributions common in natural language. Since this prior structure can be implemented using efficient algorithms that add negligible cost beyond standard inference techniques, we recommend it as a new standard for topic modeling.

[1]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  D. Dittmar Slice Sampling , 2000 .

[3]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[6]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Padhraic Smyth,et al.  Analyzing Entities and Topics in News Articles Using Statistical Topic Models , 2006, ISI.

[8]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[9]  Yee Whye Teh,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.

[10]  Philip J. Cowans Probabilistic Document Modelling , 2006 .

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[13]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[14]  Andrew McCallum,et al.  Organizing the OCA: learning faceted subjects from a library of digital books , 2007, JCDL '07.

[15]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[16]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .

[17]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[18]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[19]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[20]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.