Random Walk Gradient Descent for Decentralized Learning on Graphs

We design a new variant of the stochastic gradient descent algorithm applied for learning a global model based on the data distributed over the nodes of a network. Motivated by settings such as in decentralized learning, we suppose that one special node in the network, which we call node 1, is interested in learning the global model. We seek a decentralized and distributed algorithm for many reasons including privacy and fault-tolerance. A natural candidate here is Gossip-style SGD. However, it suffers from slow convergence and high communication cost mainly because at the end all nodes, and not only the special node, will learn the model. We propose a distributed SGD algorithm using a weighted random walk to sample the nodes. The Markov chain is designed to have stationary probability distribution that is proportional to the smoothness bound L_i of the local loss function at node i. We study the convergence rate of this algorithm and prove that it depends on the smoothness average L. This outperforms the case of uniform sampling algorithm obtained by a Metropolis-Hasting random walk (MHRW) which depends on the supremum of all L_i s noted L. We present numerical simulations that substantiate our theoretical findings and show that our algorithm outperforms random walk and gossip-style algorithms.