Decentralized trustless gossip training of deep neural networks

Novel machine learning techniques apply decentralized model training in order to mitigate data volume and privacy issues. Current approaches assume (a) node performance homogeneity, and (b) simultaneous training. These assumptions also imply that the predictive performance of the distributed models evolves uniformly. A different approach is required since a distributed decentralized network is heterogeneous and nonstationary: nodes can join or leave the network at any point in time (churn). We propose a novel protocol for exchanging the model knowledge between peers using a gossip algorithm combined with the stochastic gradient descent (SGD). Our method has the advantage of being fully asynchronous, decentralized, trustless, and independent of the network size and the churn ratio. We validated the proposed algorithm by running network simulations in various scenarios.

[1]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Stefan Wrobel,et al.  Efficient Decentralized Deep Learning by Dynamic Model Averaging , 2018, ECML/PKDD.

[3]  Matthieu Cord,et al.  GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange , 2018, Neurocomputing.

[4]  Chinmay Hegde,et al.  Collaborative Deep Learning in Fixed Topology Networks , 2017, NIPS.

[5]  Luc Van Gool,et al.  AENet: Learning Deep Audio Features for Video Analysis , 2017, IEEE Transactions on Multimedia.

[6]  Abhinav Vishnu,et al.  GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent , 2018, ArXiv.

[7]  Ramesh Raskar,et al.  Distributed learning of deep neural network over multiple agents , 2018, J. Netw. Comput. Appl..

[8]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[9]  Aditya Khamparia,et al.  Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network , 2019, IEEE Access.

[10]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[11]  A. Fleischmann Distributed Systems , 1994, Springer Berlin Heidelberg.

[12]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[15]  Jon Crowcroft,et al.  Privacy-Preserving Machine Learning Based Data Analytics on Edge Devices , 2018, AIES.

[16]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[17]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Nenghai Yu,et al.  Asynchronous Stochastic Gradient Descent with Delay Compensation , 2016, ICML.

[19]  Juan Benet,et al.  IPFS - Content Addressed, Versioned, P2P File System , 2014, ArXiv.

[20]  Hubert Eichner,et al.  Towards Federated Learning at Scale: System Design , 2019, SysML.

[21]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[22]  Paramartha Dutta,et al.  Advancements in Image Classification using Convolutional Neural Network , 2018, 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[23]  Huchuan Lu,et al.  Deep mutual learning for visual tracking , 2019, ACM TUR-C.