A Stochastic Proximal Gradient Framework for Decentralized Non-Convex Composite Optimization: Topology-Independent Sample Complexity and Communication Efficiency

Decentralized optimization is a promising parallel computation paradigm for large-scale data analytics and machine learning problems defined over a network of nodes. This paper is concerned with decentralized non-convex composite problems with population or empirical risk. In particular, the networked nodes are tasked to find an approximate stationary point of the average of local, smooth, possibly non-convex risk functions plus a possibly non-differentiable extended valued convex regularizer. Under this general formulation, we propose the first provably efficient, stochastic proximal gradient framework, called ProxGT. Specifically, we construct and analyze several instances of ProxGT that are tailored respectively for different problem classes of interest. Remarkably, we show that the sample complexities of these instances are network topology-independent and achieve linear speedups compared to that of the corresponding centralized optimal methods implemented on a single node.

[1]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[2]  Kun Yuan,et al.  Decentralized Composite Optimization with Compression , 2021, ArXiv.

[3]  Wei Shi,et al.  Achieving Geometric Convergence for Distributed Optimization Over Time-Varying Graphs , 2016, SIAM J. Optim..

[4]  Soummya Kar,et al.  Fast Decentralized Nonconvex Finite-Sum Optimization with Recursive Variance Reduction , 2020, SIAM J. Optim..

[5]  Soummya Kar,et al.  A hybrid variance-reduced method for decentralized stochastic non-convex optimization , 2021, ICML.

[6]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[7]  Yingbin Liang,et al.  SpiderBoost and Momentum: Faster Variance Reduction Algorithms , 2019, NeurIPS.

[8]  U. Khan,et al.  Variance-Reduced Decentralized Stochastic Optimization With Accelerated Convergence , 2019, IEEE Transactions on Signal Processing.

[9]  Martin Jaggi,et al.  A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! , 2020, AISTATS.

[10]  Soummya Kar,et al.  Distributed Parameter Estimation in Sensor Networks: Nonlinear Observation Models and Imperfect Communication , 2008, IEEE Transactions on Information Theory.

[11]  Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization , 2020, ArXiv.

[12]  Alexander Olshevsky,et al.  Linear Time Average Consensus and Distributed Optimization on Fixed Graphs , 2017, SIAM J. Control. Optim..

[13]  Lam M. Nguyen,et al.  ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex Optimization , 2019, J. Mach. Learn. Res..

[14]  Ying Sun,et al.  Distributed Algorithms for Composite Optimization: Unified Framework and Convergence Analysis , 2020, IEEE Transactions on Signal Processing.

[15]  Michael G. Rabbat,et al.  Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.

[16]  Alexander J. Smola,et al.  Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[17]  Wotao Yin,et al.  On Nonconvex Decentralized Gradient Descent , 2016, IEEE Transactions on Signal Processing.

[18]  Songtao Lu,et al.  GNSD: a Gradient-Tracking Based Nonconvex Stochastic Algorithm for Decentralized Optimization , 2019, 2019 IEEE Data Science Workshop (DSW).

[19]  A. Stephen Morse,et al.  Accelerated linear iterations for distributed averaging , 2011, Annu. Rev. Control..

[20]  Na Li,et al.  Harnessing smoothness to accelerate distributed optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[21]  Haoran Sun,et al.  Improving the Sample and Communication Complexity for Decentralized Non-Convex Optimization: Joint Gradient Estimation and Tracking , 2020, ICML.

[22]  H. Vincent Poor,et al.  Distributed Stochastic Gradient Descent: Nonconvexity, Nonsmoothness, and Convergence to Local Minima , 2020, 2003.02818.

[23]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[24]  Anna Scaglione,et al.  Decentralized Frank–Wolfe Algorithm for Convex and Nonconvex Problems , 2016, IEEE Transactions on Automatic Control.

[25]  Tsung-Hui Chang,et al.  Distributed Stochastic Consensus Optimization With Momentum for Nonconvex Nonsmooth Problems , 2020, IEEE Transactions on Signal Processing.

[26]  Jie Lu,et al.  A Unifying Approximate Method of Multipliers for Distributed Composite Optimization , 2020, IEEE Transactions on Automatic Control.

[27]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[28]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[29]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[30]  Michael G. Rabbat,et al.  Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.

[31]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[32]  Soummya Kar,et al.  An Improved Convergence Analysis for Decentralized Online Stochastic Non-Convex Optimization , 2020, IEEE Transactions on Signal Processing.

[33]  Zhi-Quan Luo,et al.  Communication Efficient Primal-Dual Algorithm for Nonconvex Nonsmooth Distributed Optimization , 2021, AISTATS.

[34]  Ying Sun,et al.  Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks , 2020, AISTATS.

[35]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[36]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[37]  Usman A. Khan,et al.  A Linear Algorithm for Optimization Over Directed Graphs With Geometric Convergence , 2018, IEEE Control Systems Letters.

[38]  Angelia Nedic,et al.  Distributed Optimization for Control , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[39]  Jie Wang,et al.  D-SPIDER-SFO: A Decentralized Optimization Algorithm with Faster Convergence Rate for Nonconvex Problems , 2019, AAAI.

[40]  Kun Yuan,et al.  Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD , 2021 .

[41]  Soummya Kar,et al.  Decentralized Stochastic Optimization and Machine Learning: A Unified Variance-Reduction Framework for Robust Performance and Fast Convergence , 2020, IEEE Signal Processing Magazine.

[42]  Ali H. Sayed,et al.  Diffusion Adaptation Strategies for Distributed Optimization and Learning Over Networks , 2011, IEEE Transactions on Signal Processing.

[43]  Angelia Nedic,et al.  Distributed stochastic gradient tracking methods , 2018, Mathematical Programming.

[44]  Guanghui Lan,et al.  First-order and Stochastic Optimization Methods for Machine Learning , 2020 .

[45]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[46]  Wei Shi,et al.  A Push-Pull Gradient Method for Distributed Optimization in Networks , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[47]  Zhouchen Lin,et al.  Revisiting EXTRA for Smooth Distributed Optimization , 2020, SIAM J. Optim..

[48]  I. Gijbels,et al.  Penalized likelihood regression for generalized linear models with non-quadratic penalties , 2011 .

[49]  Ying Sun,et al.  Distributed Optimization Based on Gradient-tracking Revisited: Enhancing Convergence Rate via Surrogation. , 2020 .

[50]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[51]  Lam M. Nguyen,et al.  An Optimal Hybrid Variance-Reduced Algorithm for Stochastic Composite Nonconvex Optimization , 2020, 2008.09055.

[52]  Gesualdo Scutari,et al.  Distributed nonconvex constrained optimization over time-varying digraphs , 2018, Mathematical Programming.

[53]  Lihua Xie,et al.  Augmented distributed gradient methods for multi-agent optimization under uncoordinated constant stepsizes , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[54]  Ali H. Sayed,et al.  Distributed Learning in Non-Convex Environments— Part II: Polynomial Escape From Saddle-Points , 2019, IEEE Transactions on Signal Processing.

[55]  Laurent Massoulié,et al.  Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.

[56]  Usman A. Khan,et al.  A General Framework for Decentralized Optimization With First-Order Methods , 2020, Proceedings of the IEEE.

[57]  Shi Pu,et al.  Improving the Transient Times for Distributed Stochastic Gradient Methods , 2021, IEEE Transactions on Automatic Control.

[58]  Jorge Cortés,et al.  Distributed Strategies for Generating Weight-Balanced and Doubly Stochastic Digraphs , 2009, Eur. J. Control.

[59]  Ioannis Ch. Paschalidis,et al.  A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent , 2019, IEEE Transactions on Automatic Control.

[60]  Wei Shi,et al.  A Decentralized Proximal-Gradient Method With Network Independent Step-Sizes and Separated Convergence Rates , 2017, IEEE Transactions on Signal Processing.

[61]  Ali H. Sayed,et al.  On the Influence of Bias-Correction on Distributed Stochastic Optimization , 2019, IEEE Transactions on Signal Processing.

[62]  Christopher De Sa,et al.  Optimal Complexity in Decentralized Training , 2021, ICML.

[63]  Soummya Kar,et al.  A fast randomized incremental gradient method for decentralized non-convex optimization , 2021, IEEE Transactions on Automatic Control.

[64]  José M. F. Moura,et al.  Fast Distributed Gradient Methods , 2011, IEEE Transactions on Automatic Control.

[65]  Ali H. Sayed,et al.  A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization , 2019, NeurIPS.