Random Walking Snakes for Decentralized Learning at Edge Networks

Random walk learning (RWL) has recently gained a lot of attention thanks to its potential for reducing communication and computation over edge networks in a decentralized fashion. In RWL, each node in a graph updates a global model with its local data, selects one of its neighbors randomly, and sends the updated global model. The selected neighbor becomes a newly activated node, so it updates the global model using its local data. This continues until convergence. Despite its promise, RWL has two challenges: (i) training time is long, and (ii) nodes should have the complete model. Thus, in this paper, we design Random Walking Snakes (RWS), where a set of nodes instead of one node is activated for model update, and each node in the set trains a part of the model. Thanks to model partitioning and parallel processing in the set of activated nodes, RWS reduces both the training time and the amount of the model that needs to be stored. We also design a novel policy that determines the set of activated nodes by taking into account the computing power of nodes. Simulation results show that RWS significantly reduces the convergence time as compared to RWL.

[1]  Tomer Avidor,et al.  Locally Asynchronous Stochastic Gradient Descent for Decentralised Deep Learning , 2022, ArXiv.

[2]  Erdem Koyuncu,et al.  Model-Distributed DNN Training for Memory-Constrained Edge Computing Devices , 2021, 2021 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN).

[3]  Erdem Koyuncu,et al.  Respipe: Resilient Model-Distributed DNN Training at Edge Networks , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  S. Rouayheb,et al.  Private Weighted Random Walk Stochastic Gradient Descent , 2020, IEEE Journal on Selected Areas in Information Theory.

[5]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[6]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[7]  Xiang Li,et al.  On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.

[8]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[9]  Quoc V. Le,et al.  GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, NeurIPS.

[10]  Nam Sung Kim,et al.  Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training , 2018, NeurIPS.

[11]  Chia-Lin Yang,et al.  Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform , 2018, ArXiv.

[12]  Wotao Yin,et al.  On Markov Chain Gradient Descent , 2018, NeurIPS.

[13]  Nikhil R. Devanur,et al.  PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.

[14]  Parijat Dube,et al.  Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.

[15]  Mianxiong Dong,et al.  Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing , 2018, IEEE Network.

[16]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[17]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[18]  Blaise Agüera y Arcas,et al.  Federated Learning of Deep Networks using Model Averaging , 2016, ArXiv.

[19]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[20]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[21]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[22]  Anand D. Sarwate,et al.  Broadcast Gossip Algorithms for Consensus , 2009, IEEE Transactions on Signal Processing.

[23]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[24]  Stephen P. Boyd,et al.  Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.

[25]  Stephen P. Boyd,et al.  Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[26]  Johannes Gehrke,et al.  Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[27]  Dimitri P. Bertsekas,et al.  A New Class of Incremental Gradient Methods for Least Squares Problems , 1997, SIAM J. Optim..

[28]  John C. Duchi,et al.  Ieee Transactions on Automatic Control 1 Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2022 .