Decentralized Stochastic Gradient Langevin Dynamics and Hamiltonian Monte Carlo

Stochastic gradient Langevin dynamics (SGLD) and stochastic gradient Hamiltonian Monte Carlo (SGHMC) are two popular Markov Chain Monte Carlo (MCMC) algorithms for Bayesian inference that can scale to large datasets, allowing to sample from the posterior distribution of a machine learning (ML) model based on the input data and the prior distribution over the model parameters. However, these algorithms do not apply to the decentralized learning setting, when a network of agents are working collaboratively to learn the parameters of an ML model without sharing their individual data due to privacy reasons or communication constraints. We study two algorithms: Decentralized SGLD (DE-SGLD) and Decentralized SGHMC (DE-SGHMC) which are adaptations of SGLD and SGHMC methods that allow scaleable Bayesian inference in the decentralized setting. We show that when the posterior distribution is strongly log-concave, the iterates of these algorithms converge linearly to a neighborhood of the target distribution in the 2-Wasserstein metric. We illustrate the results for decentralized Bayesian linear regression and Bayesian logistic regression problems.

[1]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[2]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[3]  Eric Moulines,et al.  Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau , 2016, SIAM J. Imaging Sci..

[4]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[5]  Michael I. Jordan,et al.  Variational Consensus Monte Carlo , 2015, NIPS.

[6]  Yee Whye Teh,et al.  Distributed Bayesian Posterior Sampling via Moment Sharing , 2014, NIPS.

[7]  Zoubin Ghahramani,et al.  Distributed Inference for Dirichlet Process Mixture Models , 2015, ICML.

[8]  Na Li,et al.  Accelerated Distributed Nesterov Gradient Descent for smooth and strongly convex functions , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[9]  Francis R. Bach,et al.  From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.

[10]  Ziyang Meng,et al.  A survey of distributed optimization , 2019, Annu. Rev. Control..

[11]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[12]  H. Vincent Poor,et al.  On Distributed Stochastic Gradient Algorithms for Global Optimization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Vyacheslav Kungurtsev Stochastic Gradient Langevin Dynamics on a Distributed Network , 2020, ArXiv.

[14]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[15]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[16]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[17]  Usman A. Khan,et al.  Distributed Heavy-Ball: A Generalization and Acceleration of First-Order Methods With Gradient Tracking , 2018, IEEE Transactions on Automatic Control.

[18]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[19]  Ying Sun,et al.  Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks , 2020, AISTATS.

[20]  Andre Wibisono,et al.  Streaming Variational Bayes , 2013, NIPS.

[21]  H. Vincent Poor,et al.  Distributed Gradient Methods for Nonconvex Optimization: Local and Global Convergence Guarantees. , 2020 .

[22]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[23]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[24]  Mert Gürbüzbalaban,et al.  Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[25]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[26]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[27]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[28]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[29]  Zhe Gan,et al.  Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.

[30]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[31]  Asuman E. Ozdaglar,et al.  Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions , 2018, SIAM J. Optim..

[32]  Michael I. Jordan,et al.  Is There an Analog of Nesterov Acceleration for MCMC? , 2019, ArXiv.

[33]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[34]  Dahua Lin,et al.  Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation , 2013, NIPS.

[35]  Mert Gürbüzbalaban,et al.  Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[36]  Ioannis Ch. Paschalidis,et al.  Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent , 2020, IEEE Signal Processing Magazine.

[37]  Stefanie Jegelka,et al.  IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method , 2020, NeurIPS.

[38]  Necdet Serhat Aybat,et al.  Decentralized Computation of Effective Resistances and Acceleration of Distributed Optimization Algorithms. , 2019, 1907.13110.

[39]  Euhanna Ghadimi,et al.  Global convergence of the Heavy-ball method for convex optimization , 2014, 2015 European Control Conference (ECC).

[40]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[41]  H. Poor,et al.  Distributed Stochastic Gradient Descent and Convergence to Local Minima , 2020 .

[42]  Tara Javidi,et al.  Decentralized Bayesian Learning over Graphs , 2019, ArXiv.

[43]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[44]  Umut Simsekli,et al.  Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks , 2019, ArXiv.

[45]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[46]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[47]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[48]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[49]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[50]  C. Villani Optimal Transport: Old and New , 2008 .

[51]  Jonathan P. How,et al.  Approximate Decentralized Bayesian Inference , 2014, UAI.

[52]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[53]  Angelia Nedic,et al.  Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.

[54]  Babak Shahbaba,et al.  Distributed Stochastic Gradient MCMC , 2014, ICML.

[55]  Maryam Mehri Dehnavi,et al.  Randomized Gossiping With Effective Resistance Weights: Performance Guarantees and Applications , 2019, IEEE Transactions on Control of Network Systems.