Scaling up Bayesian variational inference using distributed computing clusters

Abstract In this paper we present an approach for scaling up Bayesian learning using variational methods by exploiting distributed computing clusters managed by modern big data processing tools like Apache Spark or Apache Flink, which efficiently support iterative map-reduce operations. Our approach is defined as a distributed projected natural gradient ascent algorithm, has excellent convergence properties, and covers a wide range of conjugate exponential family models. We evaluate the proposed algorithm on three real-world datasets from different domains (the Pubmed abstracts dataset, a GPS trajectory dataset, and a financial dataset) and using several models (LDA, factor analysis, mixture of Gaussians and linear regression models). Our approach compares favorably to stochastic variational inference and streaming variational Bayes, two of the main current proposals for scaling up variational methods. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes and approx. 75 % latent variables using a computer cluster with 128 processing units (AWS). The proposed methods are released as part of an open-source toolbox for scalable probabilistic machine learning ( http://www.amidsttoolbox.com ) Masegosa et al. (2017) [29] .

[1]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[2]  이주연,et al.  Latent Dirichlet Allocation (LDA) 모델 기반의 인공지능(A.I.) 기술 관련 연구 활동 및 동향 분석 , 2018 .

[3]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[4]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[5]  Andrés R. Masegosa,et al.  d-VMP: Distributed Variational Message Passing , 2016, Probabilistic Graphical Models.

[6]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[7]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[8]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[9]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[10]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[11]  H. Robbins A Stochastic Approximation Method , 1951 .

[12]  Andrés R. Masegosa,et al.  Modeling Concept Drift: A Probabilistic Graphical Model Based Approach , 2015, IDA.

[13]  Harold J. Kushner,et al.  Stochastic Approximation Algorithms and Applications , 1997, Applications of Mathematics.

[14]  Mark W. Schmidt,et al.  Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence , 2015, ArXiv.

[15]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[16]  Xing Xie,et al.  GeoLife: A Collaborative Social Networking Service among User, Location and Trajectory , 2010, IEEE Data Eng. Bull..

[17]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[20]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[21]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[22]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[23]  James R. Foulds,et al.  Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation , 2013, KDD.

[24]  Jonathan P. How,et al.  Approximate Decentralized Bayesian Inference , 2014, UAI.

[25]  Andrés R. Masegosa,et al.  Financial Data Analysis with PGMs Using AMIDST , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[26]  Andrés R. Masegosa,et al.  Dynamic Bayesian modeling for risk prediction in credit operations , 2015, SCAI.

[27]  David M. Blei,et al.  Smoothed Gradients for Stochastic Variational Inference , 2014, NIPS.

[28]  Anders L. Madsen,et al.  AMIDST: a Java Toolbox for Scalable Probabilistic Machine Learning , 2017, Knowl. Based Syst..

[29]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[30]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[31]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[32]  Andrés R. Masegosa,et al.  Probabilistic Graphical Models on Multi-Core CPUs Using Java 8 , 2016, IEEE Computational Intelligence Magazine.

[33]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[34]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[35]  Wei-Ying Ma,et al.  Understanding mobility based on GPS data , 2008, UbiComp.

[36]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[37]  W. Michael Conklin,et al.  Monte Carlo Methods in Bayesian Computation , 2001, Technometrics.