d-VMP: Distributed Variational Message Passing

Motivated by a real-world financial dataset, we propose a distributed variational message passing scheme for learning conjugate exponential models. We show that the method can be seen as a projected natural gradient ascent algorithm, and it therefore has good convergence properties. This is supported experimentally, where we show that the approach is robust wrt. common problems like imbalanced data, heavy-tailed empirical distributions, and a high degree of missing values. The scheme is based on map-reduce operations, and utilizes the memory management of modern big data frameworks like Apache Flink to obtain a time-efficient and scalable implementation. The proposed algorithm compares favourably to stochastic variational inference both in terms of speed and quality of the learned models. For the scalability analysis, we evaluate our approach over a network with more than one billion nodes (and approx. 75% latent variables) using a computer cluster with 128 processing units.

[1]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[2]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[3]  H. Robbins A Stochastic Approximation Method , 1951 .

[4]  Jonathan P. How,et al.  Approximate Decentralized Bayesian Inference , 2014, UAI.

[5]  Mark W. Schmidt,et al.  Convergence of Proximal-Gradient Stochastic Variational Inference under Non-Decreasing Step-Size Sequence , 2015, ArXiv.

[6]  James R. Foulds,et al.  Stochastic collapsed variational Bayesian inference for latent Dirichlet allocation , 2013, KDD.

[7]  Andrés R. Masegosa,et al.  Modeling Concept Drift: A Probabilistic Graphical Model Based Approach , 2015, IDA.

[8]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[9]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[10]  David M. Blei,et al.  Smoothed Gradients for Stochastic Variational Inference , 2014, NIPS.

[11]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[12]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[13]  Andrés R. Masegosa,et al.  Dynamic Bayesian modeling for risk prediction in credit operations , 2015, SCAI.

[14]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Carlos S. Kubrusly,et al.  Stochastic approximation algorithms and applications , 1973, CDC 1973.

[17]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.