Decentralized Gradient Tracking with Local Steps

. We develop a novel decentralized tracking mechanism, K -GT, that enables communication-efficient local updates in GT while inheriting the data-independence property of GT. We prove a convergence rate for K -GT on smooth non-convex functions and prove that it reduces the communication overhead asymptotically by a linear factor K , where K denotes the number of local steps. We illustrate the robustness and effectiveness of this heterogeneity correction on convex and non-convex benchmark problems and on a non-convex neural network training task with the MNIST dataset

[1]  Sulaiman A. Alghunaim,et al.  On the Performance of Gradient Tracking with Local Updates , 2022, ArXiv.

[2]  Sebastian U. Stich,et al.  An Improved Analysis of Gradient Tracking for Decentralized Machine Learning , 2022, NeurIPS.

[3]  Kun Yuan,et al.  Accelerating Gossip SGD with Periodic Global Averaging , 2021, ICML.

[4]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2019, Found. Trends Mach. Learn..

[5]  Yue Liu Variance reduction in decentralized training over heterogeneous data , 2021 .

[6]  Martin Jaggi,et al.  A Unified Theory of Decentralized SGD with Changing Topology and Local Updates , 2020, ICML.

[7]  Kun Yuan,et al.  Can Primal Methods Outperform Primal-Dual Methods in Decentralized Dynamic Optimization? , 2020, IEEE Transactions on Signal Processing.

[8]  U. Khan,et al.  Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence , 2019, IEEE Transactions on Signal Processing.

[9]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[10]  Phillip B. Gibbons,et al.  The Non-IID Data Quagmire of Decentralized Machine Learning , 2019, ICML.

[11]  Yuejie Chi,et al.  Communication-Efficient Distributed Optimization in Networks with Gradient Tracking , 2019, AISTATS.

[12]  Martin Jaggi,et al.  Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.

[13]  Ali H. Sayed,et al.  Linear convergence of primal-dual gradient methods and their performance in distributed optimization , 2019, Autom..

[14]  Tao Lin,et al.  Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.

[15]  Farzin Haddadpour,et al.  Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization , 2019, NeurIPS.

[16]  Jiaqi Zhang,et al.  Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization , 2019 .

[17]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[18]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[19]  Angelia Nedic,et al.  A Distributed Stochastic Gradient Tracking Method , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[20]  Wei Zhang,et al.  Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.

[21]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[22]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[23]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[24]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[25]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[26]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[27]  Wei Shi,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, SIAM J. Optim..

[28]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[29]  L. Deng,et al.  The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web] , 2012, IEEE Signal Processing Magazine.

[30]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[31]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .