论文信息 - Decentralized Federated Learning: A Segmented Gossip Approach

Decentralized Federated Learning: A Segmented Gossip Approach

The emerging concern about data privacy and security has motivated the proposal of federated learning, which allows nodes to only synchronize the locally-trained models instead their own original data. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized topologies and the assumption of large nodes-to-server bandwidths. However, in real-world federated learning scenarios the network capacities between nodes are highly uniformly distributed and smaller than that in a datacenter. It is of great challenges for conventional federated learning approaches to efficiently utilize network capacities between nodes. In this paper, we propose a model segment level decentralized federated learning to tackle this problem. In particular, we propose a segmented gossip approach, which not only makes full utilization of node-to-node bandwidth, but also has good training convergence. The experimental results show that even the training time can be highly reduced as compared to centralized federated learning.

[1] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2] Jakub Konecný,et al. Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[3] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[4] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[5] Hubert Eichner,et al. Towards Federated Learning at Scale: System Design , 2019, SysML.

[6] M. Panella. Associate Editor of the Journal of Computer and System Sciences , 2014 .

[7] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.

[8] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[9] Asim Kadav,et al. MALT: distributed data-parallelism for existing ML applications , 2015, EuroSys.

[10] Elsevier Sdol,et al. Journal of Parallel and Distributed Computing , 2009 .

[11] Matthieu Cord,et al. Gossip training for deep learning , 2016, ArXiv.

[12] Laura Ricci,et al. A peer-to-peer recommender system for self-emerging user communities based on gossip overlays , 2013, J. Comput. Syst. Sci..

[13] Abhinav Vishnu,et al. GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.

[14] Peter Richtárik,et al. Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[15] Carlo Curino,et al. Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[16] George Loizou,et al. Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[17] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[18] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.