论文信息 - Random Walk Gradient Descent for Decentralized Learning on Graphs

Random Walk Gradient Descent for Decentralized Learning on Graphs

We design a new variant of the stochastic gradient descent algorithm applied for learning a global model based on the data distributed over the nodes of a network. Motivated by settings such as in decentralized learning, we suppose that one special node in the network, which we call node 1, is interested in learning the global model. We seek a decentralized and distributed algorithm for many reasons including privacy and fault-tolerance. A natural candidate here is Gossip-style SGD. However, it suffers from slow convergence and high communication cost mainly because at the end all nodes, and not only the special node, will learn the model. We propose a distributed SGD algorithm using a weighted random walk to sample the nodes. The Markov chain is designed to have stationary probability distribution that is proportional to the smoothness bound L_i of the local loss function at node i. We study the convergence rate of this algorithm and prove that it depends on the smoothness average L. This outperforms the case of uniform sampling algorithm obtained by a Metropolis-Hasting random walk (MHRW) which depends on the supremum of all L_i s noted L. We present numerical simulations that substantiate our theoretical findings and show that our algorithm outperforms random walk and gossip-style algorithms.

Salim El Rouayheb | Ghadir Ayache | S. Rouayheb | Ghadir Ayache

[1] Blaise Agüera y Arcas,et al. Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[2] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[3] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[4] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[5] Jon C. Dattorro,et al. Convex Optimization & Euclidean Distance Geometry , 2004 .

[6] Qing Ling,et al. On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[7] Angelia Nedic,et al. Incremental Stochastic Subgradient Algorithms for Convex Optimization , 2008, SIAM J. Optim..

[8] Xin Xu,et al. Beyond random walk and metropolis-hastings samplers: why you should not backtrack for unbiased graph sampling , 2012, SIGMETRICS '12.

[9] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[10] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.