论文信息 - Decentralized Bayesian Learning over Graphs - 字舞流文

Decentralized Bayesian Learning over Graphs

We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian and peer-to-peer variant of federated learning in which each agent keeps a "posterior probability distribution" over a global model parameters. The agent update its "posterior" based on 1) the local training data and 2) the asynchronous communication and model aggregation with their 1-hop neighbors. This Bayesian formulation allows for a systematic treatment of model aggregation over any arbitrary connected graph. Furthermore, it provides strong analytic guarantees on converge in the realizable case as well as a closed form characterization of the rate of convergence. We also show that our methodology can be combined with efficient Bayesian inference techniques to train Bayesian neural networks in a decentralized manner. By empirical studies we show that our theoretical analysis can guide the design of network/social interactions and data partitioning to achieve convergence.

Tara Javidi | Farinaz Koushanfar | Yongxi Lu | Xinghan Wang | Anusha Lalitha | Osman Kilinc | T. Javidi | F. Koushanfar | Anusha Lalitha | Y. Lu | O. Kilinc | Xinghan Wang

[1] Michael I. Jordan,et al. CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[2] T. Javidi,et al. Social learning and distributed hypothesis testing , 2014, 2014 IEEE International Symposium on Information Theory.

[3] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[4] Forrest N. Iandola,et al. How to scale distributed deep learning? , 2016, ArXiv.

[5] Andre Wibisono,et al. Streaming Variational Bayes , 2013, NIPS.

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[8] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[9] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[10] Alex Kendall,et al. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[11] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[12] Richard E. Turner,et al. Variational Continual Learning , 2017, ICLR.

[13] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.

[14] Shahin Shahrampour,et al. Distributed Detection: Finite-Time Analysis and Impact of Network Topology , 2014, IEEE Transactions on Automatic Control.

[15] Yue Zhao,et al. Federated Learning with Non-IID Data , 2018, ArXiv.

[16] Chinmay Hegde,et al. Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.

[17] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[18] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.

[19] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[21] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[22] Angelia Nedic,et al. Nonasymptotic convergence rates for cooperative learning over time-varying directed graphs , 2014, 2015 American Control Conference (ACC).

[23] Yarin Gal,et al. Uncertainty in Deep Learning , 2016 .

[24] Ameet Talwalkar,et al. Parle: parallelizing stochastic gradient descent , 2017, ArXiv.

[25] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[26] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[27] Asuman E. Ozdaglar,et al. Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[28] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[29] Xiangru Lian,et al. D2: Decentralized Training over Decentralized Data , 2018, ICML.

[30] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.