QLSD: Quantised Langevin stochastic dynamics for Bayesian federated learning

Federated learning aims at conducting inference when data are decentralised and locally stored on several clients, under two main constraints: data ownership and communication overhead. In this paper, we address these issues under the Bayesian paradigm. To this end, we propose a novel Markov chain Monte Carlo algorithm coined QLSD built upon quantised versions of stochastic gradient Langevin dynamics. To improve performance in a big data regime, we introduce variance-reduced alternatives of our methodology referred to as QLSD and QLSD. We provide both non-asymptotic and asymptotic convergence guarantees for the proposed algorithms and illustrate their benefits on several federated learning benchmarks.

[1]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[2]  Daniel Paulin,et al.  Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type Splitting , 2019, J. Mach. Learn. Res..

[3]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Isabelle Bloch,et al.  Encoding the Latent Posterior of Bayesian Neural Networks for Uncertainty Quantification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[7]  Alexandre Hoang Thiery,et al.  Uncertainty Quantification and Deep Ensembles , 2020, NeurIPS.

[8]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[9]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[10]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[11]  M. Ledoux,et al.  Analysis and Geometry of Markov Diffusion Operators , 2013 .

[12]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[13]  Enrique Fernández-Blanco,et al.  A Public Domain Dataset for Real-Life Human Activity Recognition Using Smartphone Sensors , 2020, Sensors.

[14]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[15]  Aryan Mokhtari,et al.  Federated Learning with Compression: Unified Analysis and Sharp Guarantees , 2020, AISTATS.

[16]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[18]  Babak Shahbaba,et al.  Distributed Stochastic Gradient MCMC , 2014, ICML.

[19]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[20]  Sebastian U. Stich,et al.  Stochastic Distributed Learning with Gradient Quantization and Variance Reduction , 2019, 1904.05115.

[21]  Suhas Diggavi,et al.  A Field Guide to Federated Optimization , 2021, ArXiv.

[22]  F. Bach,et al.  Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.

[23]  Eric Moulines,et al.  The promises and pitfalls of Stochastic Gradient Langevin Dynamics , 2018, NeurIPS.

[24]  Nick Whiteley,et al.  Global Consensus Monte Carlo , 2018, J. Comput. Graph. Stat..

[25]  Eric Moulines,et al.  DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm , 2021, ICML.

[26]  A. Anonymous,et al.  Consumer Data Privacy in a Networked World: A Framework for Protecting Privacy and Promoting Innovation in the Global Digital Economy , 2013, J. Priv. Confidentiality.

[27]  Andrew Gordon Wilson,et al.  Subspace Inference for Bayesian Deep Learning , 2019, UAI.

[28]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[29]  Osvaldo Simeone,et al.  Wireless Federated Langevin Monte Carlo: Repurposing Channel Noise for Bayesian Sampling and Privacy , 2021, ArXiv.

[30]  Joachim M. Buhmann,et al.  Variational Federated Multi-Task Learning , 2019, ArXiv.

[31]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[32]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[33]  Chris Sherlock,et al.  Merging MCMC Subposteriors through Gaussian-Process Approximations , 2016, Bayesian Analysis.

[34]  David J Hunter,et al.  Uncertainty in the Era of Precision Medicine. , 2016, The New England journal of medicine.

[35]  Diane J. Cook,et al.  Keeping the Resident in the Loop: Adapting the Smart Home to the User , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[36]  Andrew Gordon Wilson,et al.  What Are Bayesian Neural Network Posteriors Really Like? , 2021, ICML.

[37]  C. Villani Optimal Transport: Old and New , 2008 .

[38]  Li Huang,et al.  LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data , 2018, PloS one.

[39]  Chris Jermaine,et al.  Parallel and Distributed MCMC via Shepherding Distributions , 2018, AISTATS.

[40]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[41]  M. Pelletier On the almost sure asymptotic behaviour of stochastic algorithms , 1998 .

[42]  Ananda Theertha Suresh,et al.  Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs , 2020, ArXiv.

[43]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[44]  Osvaldo Simeone,et al.  Channel-Driven Monte Carlo Sampling for Bayesian Distributed Learning in Wireless Data Centers , 2022, IEEE Journal on Selected Areas in Communications.

[45]  Nikolaos G. Bourbakis,et al.  A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[46]  Sebastian Caldas,et al.  LEAF: A Benchmark for Federated Settings , 2018, ArXiv.

[47]  A. Dawid,et al.  Theory and applications of proper scoring rules , 2014, 1401.0398.

[48]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[49]  M. Yor,et al.  Continuous martingales and Brownian motion , 1990 .

[50]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[51]  Dong Yu,et al.  1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.

[52]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[53]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[54]  Gregory Cohen,et al.  EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[55]  Anit Kumar Sahu,et al.  Federated Optimization in Heterogeneous Networks , 2018, MLSys.

[56]  Yee Whye Teh,et al.  Distributed Bayesian Learning with Stochastic Natural Gradient Expectation Propagation and the Posterior Server , 2015, J. Mach. Learn. Res..

[57]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[58]  Hong-You Chen,et al.  FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning , 2020, ICLR.

[59]  H. Robbins A Stochastic Approximation Method , 1951 .

[60]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[61]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[62]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[63]  Richard E. Turner,et al.  Partitioned Variational Inference: A unified framework encompassing federated and continual learning , 2018, ArXiv.

[64]  Martin Jaggi,et al.  Sparsified SGD with Memory , 2018, NeurIPS.

[65]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Yun Yang,et al.  Communication-Efficient Distributed Statistical Inference , 2016, Journal of the American Statistical Association.

[67]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[68]  Christopher Nemeth,et al.  Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[69]  Michael I. Miller,et al.  REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[70]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[71]  Kenneth Heafield,et al.  Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.

[72]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.