Decentralized Federated Learning: A Segmented Gossip Approach

The emerging concern about data privacy and security has motivated the proposal of federated learning, which allows nodes to only synchronize the locally-trained models instead their own original data. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralized topologies and the assumption of large nodes-to-server bandwidths. However, in real-world federated learning scenarios the network capacities between nodes are highly uniformly distributed and smaller than that in a datacenter. It is of great challenges for conventional federated learning approaches to efficiently utilize network capacities between nodes. In this paper, we propose a model segment level decentralized federated learning to tackle this problem. In particular, we propose a segmented gossip approach, which not only makes full utilization of node-to-node bandwidth, but also has good training convergence. The experimental results show that even the training time can be highly reduced as compared to centralized federated learning.

[1]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[2]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[3]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[4]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[5]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[6]  M. Panella Associate Editor of the Journal of Computer and System Sciences , 2014 .

[7]  Raul Castro Fernandez,et al.  Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.

[8]  Michael I. Jordan,et al.  SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[9]  Asim Kadav,et al.  MALT: distributed data-parallelism for existing ML applications , 2015, EuroSys.

[10]  Elsevier Sdol,et al.  Journal of Parallel and Distributed Computing , 2009 .

[11]  Matthieu Cord,et al.  Gossip training for deep learning , 2016, ArXiv.

[12]  Laura Ricci,et al.  A peer-to-peer recommender system for self-emerging user communities based on gossip overlays , 2013, J. Comput. Syst. Sci..

[13]  Abhinav Vishnu,et al.  GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.

[14]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[15]  Carlo Curino,et al.  Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.

[16]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[17]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[18]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.