Channel-Driven Monte Carlo Sampling for Bayesian Distributed Learning in Wireless Data Centers

Conventional frequentist learning, as assumed by existing federated learning protocols, is limited in its ability to quantify uncertainty, incorporate prior knowledge, guide active learning, and enable continual learning. Bayesian learning provides a principled approach to address all these limitations, at the cost of an increase in computational complexity. This paper studies distributed Bayesian learning in a wireless data center setting encompassing a central server and multiple distributed workers. Prior work on wireless distributed learning has focused exclusively on frequentist learning, and has introduced the idea of leveraging uncoded transmission to enable “over-the-air” computing. Unlike frequentist learning, Bayesian learning aims at evaluating approximations or samples from a global posterior distribution in the model parameter space. This work investigates for the first time the design of distributed one-shot, or “embarrassingly parallel”, Bayesian learning protocols in wireless data centers via consensus Monte Carlo (CMC). Uncoded transmission is introduced not only as a way to implement “over-the-air” computing, but also as a mechanism to deploy channel-driven MC sampling: Rather than treating channel noise as a nuisance to be mitigated, channel-driven sampling utilizes channel noise as an integral part of the MC sampling process. A simple wireless CMC scheme is first proposed that is asymptotically optimal under Gaussian local posteriors. Then, for arbitrary local posteriors, a variational optimization strategy is introduced. Simulation results demonstrate that, if properly accounted for, channel noise can indeed contribute to MC sampling and does not necessarily decrease the accuracy level.

[1]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[2]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[3]  David Ginsbourger,et al.  On the choice of the low-dimensional domain for global optimization via random embeddings , 2017, Journal of Global Optimization.

[4]  Shuguang Cui,et al.  Optimized Power Control for Over-the-Air Federated Edge Learning , 2020, ICC 2021 - IEEE International Conference on Communications.

[5]  Matthias Poloczek,et al.  A Framework for Bayesian Optimization in Embedded Subspaces , 2019, ICML.

[6]  Osvaldo Simeone,et al.  Privacy for Free: Wireless Federated Learning via Uncoded Transmission With Adaptive Power Control , 2020, IEEE Journal on Selected Areas in Communications.

[7]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[8]  I. Johnstone High dimensional Bernstein-von Mises: simple examples. , 2010, Institute of Mathematical Statistics collections.

[9]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10]  Petar Popovski,et al.  Capacity of Remote Classification Over Wireless Channels , 2020, IEEE Transactions on Communications.

[11]  Kaibin Huang,et al.  Broadband Analog Aggregation for Low-Latency Federated Edge Learning , 2018, IEEE Transactions on Wireless Communications.

[12]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[13]  Elchanan Mossel,et al.  The Computational Complexity of Estimating MCMC Convergence Time , 2011, APPROX-RANDOM.

[14]  Riccardo Moriconi,et al.  High-dimensional Bayesian optimization using low-dimensional feature spaces , 2019, Machine Learning.

[15]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[16]  Osvaldo Simeone,et al.  A Brief Introduction to Machine Learning for Engineers , 2017, Found. Trends Signal Process..

[17]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[18]  Michael I. Jordan,et al.  Variational Consensus Monte Carlo , 2015, NIPS.

[19]  Faramarz Fekri,et al.  Analog Compression and Communication for Federated Learning over Wireless MAC , 2020, 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC).

[20]  Faramarz Fekri,et al.  Quantized Compressive Sampling of Stochastic Gradients for Efficient Communication in Distributed Deep Learning , 2020, AAAI.

[21]  Arnaud Doucet,et al.  An Adaptive Subsampling Approach for MCMC Inference in Large Datasets , 2014 .

[22]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[23]  Mohak Shah,et al.  On-Device Machine Learning: An Algorithms and Learning Theory Perspective , 2019, ArXiv.

[24]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[25]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[26]  Joachim M. Buhmann,et al.  Variational Federated Multi-Task Learning , 2019, ArXiv.

[27]  Yonina C. Eldar,et al.  Over-the-Air Federated Learning From Heterogeneous Data , 2020, IEEE Transactions on Signal Processing.

[28]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[29]  Osvaldo Simeone,et al.  Federated Learning over Wireless Device-to-Device Networks: Algorithms and Convergence Analysis , 2021, IEEE Journal on Selected Areas in Communications.

[30]  Babak Shahbaba,et al.  Distributed Stochastic Gradient MCMC , 2014, ICML.

[31]  Deniz Gündüz,et al.  One-Bit Over-the-Air Aggregation for Communication-Efficient Federated Edge Learning: Design and Convergence Analysis , 2020, IEEE Transactions on Wireless Communications.

[32]  Shuguang Cui,et al.  Over-the-Air Computing for Wireless Data Aggregation in Massive IoT , 2020 .

[33]  H. Vincent Poor,et al.  Scheduling Policies for Federated Learning in Wireless Networks , 2019, IEEE Transactions on Communications.

[34]  Kaibin Huang,et al.  Reduced-Dimension Design of MIMO Over-the-Air Computing for Data Aggregation in Clustered IoT Networks , 2018, IEEE Transactions on Wireless Communications.

[35]  Deniz Gündüz,et al.  Blind Federated Edge Learning , 2020, IEEE Transactions on Wireless Communications.

[36]  Kaibin Huang,et al.  Towards an Intelligent Edge: Wireless Communication Meets Machine Learning , 2018, ArXiv.

[37]  Osvaldo Simeone,et al.  Free Energy Minimization: A Unified Framework for Modeling, Inference, Learning, and Optimization [Lecture Notes] , 2021, IEEE Signal Processing Magazine.

[38]  Marios Kountouris,et al.  Wireless Distributed Edge Learning: How Many Edge Devices Do We Need? , 2020, IEEE Journal on Selected Areas in Communications.

[39]  R. Zamir,et al.  Lattice Coding for Signals and Networks: A Structured Coding Approach to Quantization, Modulation and Multiuser Information Theory , 2014 .

[40]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[41]  Eryk Dutkiewicz,et al.  Optimal Online Data Partitioning for Geo-Distributed Machine Learning in Edge of Wireless Networks , 2019, IEEE Journal on Selected Areas in Communications.

[42]  Kaibin Huang,et al.  MIMO Over-the-Air Computation for High-Mobility Multimodal Sensing , 2018, IEEE Internet of Things Journal.

[43]  Ryan P. Adams,et al.  Patterns of Scalable Bayesian Inference , 2016, Found. Trends Mach. Learn..

[44]  Syed A. Jafar,et al.  Interference Alignment: A New Look at Signal Dimensions in a Communication Network , 2011, Found. Trends Commun. Inf. Theory.

[45]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[46]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[47]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[48]  Osvaldo Simeone,et al.  Free Energy Minimization: A Unified Framework for Modelling, Inference, Learning, and Optimization , 2020, ArXiv.

[50]  Meixia Tao,et al.  Gradient Statistics Aware Power Control for Over-the-Air Federated Learning in Fading Channels , 2020, 2020 IEEE International Conference on Communications Workshops (ICC Workshops).

[51]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[52]  Dongning Guo,et al.  Scheduling for Cellular Federated Edge Learning With Importance and Channel Awareness , 2020, IEEE Transactions on Wireless Communications.

[53]  Mohamed-Slim Alouini,et al.  Wireless Data Center Networks: Advances, Challenges, and Opportunities , 2018, ArXiv.

[54]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[55]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .

[56]  Agustinus Kristiadi,et al.  Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks , 2020, ICML.