论文信息 - Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models

Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models

Mixture of exponential family models are among the most fundamental and widely used statistical models. Stochastic variational inference (SVI), the state-of-the-art algorithm for parameter estimation in such models is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this poses serious limitations on scalability when the number of parameters is in billions. In this paper, we present extreme stochastic variational inference (ESVI), a distributed, asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. Our empirical study demonstrates that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. To further speed up computation and save memory when fitting large number of topics, we propose a variant ESVI-TOPK which maintains only the top-k important topics. Empirically, we found that using top 25% topics suffices to achieve the same accuracy as storing all the topics.

[1] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[2] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[3] Inderjit S. Dhillon,et al. NOMAD: Nonlocking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion , 2013, Proc. VLDB Endow..

[4] Yee Whye Teh,et al. Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[5] C. Archambeau,et al. Incremental Variational Inference for Latent Dirichlet Allocation , 2015, 1507.05016.

[6] Charles M. Bishop,et al. Variational Message Passing , 2005, J. Mach. Learn. Res..

[7] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.

[9] Chong Wang,et al. Embarrassingly Parallel Variational Inference in Nonconjugate Models , 2015, ArXiv.

[10] Michael I. Jordan. Graphical Models , 1998 .

[11] Sean Gerrish,et al. Black Box Variational Inference , 2013, AISTATS.

[12] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[13] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[14] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[15] Andrés R. Masegosa,et al. Scaling up Bayesian variational inference using distributed computing clusters , 2017, Int. J. Approx. Reason..

[16] Erik B. Sudderth,et al. Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[17] Chong Wang,et al. Variational inference in nonconjugate models , 2012, J. Mach. Learn. Res..

[18] Inderjit S. Dhillon,et al. A Scalable Asynchronous Distributed Algorithm for Topic Modeling , 2014, WWW.

[19] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.