Efficient Training of LDA on a GPU by Mean-for-Mode Estimation

We introduce Mean-for-Mode estimation, a variant of an uncollapsed Gibbs sampler that we use to train LDA on a GPU. The algorithm combines benefits of both uncollapsed and collapsed Gibbs samplers. Like a collapsed Gibbs sampler--and unlike an uncollapsed Gibbs sampler--it has good statistical performance, and can use sampling complexity reduction techniques such as sparsity. Meanwhile, like an uncollapsed Gibbs sampler--and unlike a collapsed Gibbs sampler --it is embarrassingly parallel, and can use approximate counters.

[1]  Gilles Celeux,et al.  On Stochastic Versions of the EM Algorithm , 1995 .

[2]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[3]  Miklós Csürös,et al.  Approximate Counting with a Floating-Point Counter , 2009, COCOON.

[4]  Joseph Tassarotti,et al.  Augur: Data-Parallel Probabilistic Modeling , 2014, NIPS.

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Feng Yan,et al.  Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units , 2009, NIPS.

[7]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[8]  Robert H. Morris,et al.  Counting large numbers of events in small registers , 1978, CACM.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[11]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[12]  Guy L. Steele,et al.  Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions , 2015, ArXiv.

[13]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[14]  Qiong Luo,et al.  Accelerating Topic Model Training on a Single Machine , 2013, APWeb.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[17]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[18]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[19]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[20]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[21]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[22]  John Canny,et al.  SAME but Different: Fast and High Quality Gibbs Parameter Estimation , 2014, KDD.

[23]  Edward Y. Chang,et al.  PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications , 2009, AAIM.