Scalable Hyperparameter Selection for Latent Dirichlet Allocation

Abstract Latent Dirichlet allocation (LDA) is a heavily used Bayesian hierarchical model used in machine learning for modeling high-dimensional sparse count data, for example, text documents. As a Bayesian model, it incorporates a prior on a set of latent variables. The prior is indexed by some hyperparameters, which have a big impact on inference regarding the model. The ideal estimate of the hyperparameters is the empirical Bayes estimate which is, by definition, the maximizer of the marginal likelihood of the data with all the latent variables integrated out. This estimate cannot be obtained analytically. In practice, the hyperparameters are chosen either in an ad-hoc manner, or through some variants of the EM algorithm for which the theoretical basis is weak. We propose an MCMC-based fully Bayesian method for obtaining the empirical Bayes estimate of the hyperparameter. We compare our method with other existing approaches both on synthetic and real data. The comparative experiments demonstrate that the LDA model with hyperparameters specified by our method outperforms models with the hyperparameters estimated by other methods. Supplementary materials for this article are available online.

[1]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[2]  Dean P. Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[3]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[4]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[5]  Murali Haran,et al.  Markov chain Monte Carlo: Can we trust the third significant figure? , 2007, math/0703746.

[6]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[7]  G. Parisi,et al.  Simulated tempering: a new Monte Carlo scheme , 1992, hep-lat/9205018.

[8]  Clint P. George Latent Dirichlet Allocation: Hyperparameter selection and applications to electronic discovery , 2015 .

[9]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[10]  Scott C. Schmidler,et al.  α-Stable Limit Laws for Harmonic Mean Estimators of Marginal Likelihoods , 2012 .

[11]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[12]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[13]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[14]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[15]  D. Donoho,et al.  Geometrizing Rates of Convergence, III , 1991 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Ajay Jasra,et al.  Markov Chain Monte Carlo Methods and the Label Switching Problem in Bayesian Mixture Modeling , 2005 .

[18]  Galin L. Jones,et al.  Fixed-Width Output Analysis for Markov Chain Monte Carlo , 2006, math/0601446.

[19]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[20]  John C. Nash,et al.  Unifying Optimization Algorithms to Aid Software System Users: optimx for R , 2011 .

[21]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[22]  C. Robert,et al.  Computational and Inferential Difficulties with Mixture Posterior Distributions , 2000 .

[23]  Clint P. George,et al.  Principled Selection of Hyperparameters in the Latent Dirichlet Allocation Model , 2017, J. Mach. Learn. Res..

[24]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[25]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[26]  Ming-Hui Chen Importance-Weighted Marginal Bayesian Posterior Density Estimation , 1994 .

[27]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.