Decentralized Optimization Over Noisy, Rate-Constrained Networks: How We Agree By Talking About How We Disagree

In decentralized optimization, multiple nodes in a network collaborate to minimize the sum of their local loss functions. The information exchange between nodes required for this task is often limited by network connectivity. We consider a generalization of this setting, in which communication is further hindered by (i) a finite data-rate constraint on the signal transmitted by any node, and (ii) an additive noise corrupting the signal received by any node. We develop a novel algorithm for this scenario: Decentralized Lazy Mirror Descent with Differential Exchanges (DLMD-DiffEx), which guarantees convergence of the local estimates to the optimal solution. A salient feature of DLMD-DiffEx is the introduction of additional proxy variables that are maintained by the nodes to account for the disagreement in their estimates due to channel noise and data-rate constraints. We investigate the performance of DLMD-DiffEx both from a theoretical perspective as well as through numerical evaluations.

[1]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[2]  John Vanderkooy,et al.  A theory of nonsubtractive dither , 2000, IEEE Trans. Signal Process..

[3]  Walid Saad,et al.  Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications , 2018, IEEE Transactions on Communications.

[4]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[5]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[6]  Yu Hen Hu,et al.  Detection, classification, and tracking of targets , 2002, IEEE Signal Process. Mag..

[7]  Zhengyuan Zhu,et al.  Compressed Distributed Gradient Descent: Communication-Efficient Consensus over Networks , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[8]  Li Huang,et al.  LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data , 2018, PloS one.

[9]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[10]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[11]  Michael G. Rabbat,et al.  Distributed strongly convex optimization , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Soummya Kar,et al.  Distributed Consensus Algorithms in Sensor Networks: Quantized Data and Random Link Failures , 2007, IEEE Transactions on Signal Processing.

[13]  Angelia Nedic,et al.  Distributed constrained optimization over noisy networks , 2010, 49th IEEE Conference on Decision and Control (CDC).

[14]  Kamyar Azizzadenesheli,et al.  signSGD: compressed optimisation for non-convex problems , 2018, ICML.

[15]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[16]  Martin J. Wainwright,et al.  Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.

[17]  Aryan Mokhtari,et al.  An Exact Quantized Decentralized Gradient Descent Algorithm , 2018, IEEE Transactions on Signal Processing.

[18]  Yiguang Hong,et al.  Quantized Subgradient Algorithm and Data-Rate Analysis for Distributed Optimization , 2014, IEEE Transactions on Control of Network Systems.

[19]  John N. Tsitsiklis,et al.  Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.

[20]  H. Vincent Poor,et al.  Scheduling Policies for Federated Learning in Wireless Networks , 2019, IEEE Transactions on Communications.

[21]  Dan Alistarh,et al.  QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[22]  T. C. Aysal,et al.  Distributed Average Consensus With Dithered Quantization , 2008, IEEE Transactions on Signal Processing.

[23]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[24]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[25]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[26]  Volkan Cevher,et al.  Machine Learning From Distributed, Streaming Data [From the Guest Editors] , 2020, IEEE Signal Process. Mag..

[27]  Michael G. Rabbat,et al.  Distributed dual averaging for convex optimization under communication delays , 2012, 2012 American Control Conference (ACC).

[28]  John N. Tsitsiklis,et al.  Distributed subgradient methods and quantization effects , 2008, 2008 47th IEEE Conference on Decision and Control.

[29]  Walid Saad,et al.  A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks , 2021, IEEE Transactions on Wireless Communications.