Asynchronous Stochastic Variational Inference

Stochastic variational inference (SVI) employs stochastic optimization to scale up Bayesian computation to massive data. Since SVI is at its core a stochastic gradient-based algorithm, horizontal parallelism can be harnessed to allow larger scale inference. We propose a lock-free parallel implementation for SVI which allows distributed computations over multiple slaves in an asynchronous style. We show that our implementation leads to linear speed-up while guaranteeing an asymptotic ergodic convergence rate $O(1/\sqrt(T)$ ) given that the number of slaves is bounded by $\sqrt(T)$ ($T$ is the total number of iterations). The implementation is done in a high-performance computing (HPC) environment using message passing interface (MPI) for python (MPI4py). The extensive empirical evaluation shows that our parallel SVI is lossless, performing comparably well to its counterpart serial SVI with linear speed-up.

[1]  Dimitris S. Papailiopoulos,et al.  Perturbed Iterate Analysis for Asynchronous Stochastic Optimization , 2015, SIAM J. Optim..

[2]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[3]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[4]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[5]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[6]  Antti Honkela,et al.  On-line Variational Bayesian Learning , 2003 .

[7]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[8]  Inderjit S. Dhillon,et al.  Extreme Stochastic Variational Inference: Distributed and Asynchronous , 2016 .

[9]  D. Blei,et al.  The Population Posterior and Bayesian Inference on Streams , 2015, 1507.05253.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Matthew D. Hoffman,et al.  A trust-region method for stochastic variational inference with applications to streaming data , 2015, ICML.

[12]  James T. Kwok,et al.  Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[13]  John C. Duchi,et al.  Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[14]  H. Robbins A Stochastic Approximation Method , 1951 .

[15]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[16]  Yijun Huang,et al.  Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.

[17]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[18]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[19]  Chong Wang,et al.  Embarrassingly Parallel Variational Inference in Nonconjugate Models , 2015, ArXiv.

[20]  Hamid Reza Feyzmahdavian,et al.  An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, CDC.