Communication Efficient Primal-Dual Algorithm for Nonconvex Nonsmooth Distributed Optimization

Decentralized optimization frequently appears in large scale machine learning problems. However, few works have been focused on solving the decentralized optimization problems under the difficult nonconvex nonsmooth setting. In this paper, we propose a distributed primal-dual algorithm to solve this type of problems in a decentralized manner and the proposed algorithm can achieve an O(1/ ) iteration complexity to attain an -solution, which is the well-known lower iteration complexity bound for nonconvex optimization. Furthermore, to reduce communication overhead, we also modify our algorithm by compressing the vectors exchanged between nodes. The iteration complexity of the algorithm with compression is still O(1/ ). To our knowledge, it is the first algorithm achieving this rate under a nonconvex, nonsmooth decentralized setting with compression. Besides, we apply the proposed algorithm to solve nonconvex linear regression problem and train a deep learning model, both of which demonstrate the efficacy of the proposed algorithms.

[1]  Asuman E. Ozdaglar,et al.  Distributed Alternating Direction Method of Multipliers , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[2]  Zhi-Quan Luo,et al.  A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[3]  Mingyi Hong,et al.  Distributed Non-Convex First-Order optimization and Information Processing: Lower Complexity Bounds and Rate Optimal Algorithms , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[4]  Laurent Massoulié,et al.  Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.

[5]  Hanlin Tang,et al.  Communication Compression for Decentralized Training , 2018, NeurIPS.

[6]  Michael W. Mahoney,et al.  Sub-Sampled Newton Methods I: Globally Convergent Algorithms , 2016, ArXiv.

[7]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[8]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[9]  Qing Ling,et al.  On the Convergence of Decentralized Gradient Descent , 2013, SIAM J. Optim..

[10]  Cong Xu,et al.  TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.

[11]  Qing Ling,et al.  On the Linear Convergence of the ADMM in Decentralized Consensus Optimization , 2013, IEEE Transactions on Signal Processing.

[12]  Tian Ye,et al.  DEED: A General Quantization Scheme for Communication Efficiency in Bits , 2020, ArXiv.

[13]  Eunho Yang,et al.  Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning , 2019, ICML.

[14]  Yi Zhou,et al.  Communication-efficient algorithms for decentralized and stochastic optimization , 2017, Mathematical Programming.

[15]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[16]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[17]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Ji Liu,et al.  DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.

[21]  Martin Jaggi,et al.  Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.

[22]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[23]  Rong Jin,et al.  On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.

[24]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[25]  Xiangru Lian,et al.  D2: Decentralized Training over Decentralized Data , 2018, ICML.

[26]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[27]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[28]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[29]  George Michailidis,et al.  DAdam: A Consensus-Based Distributed Adaptive Gradient Method for Online Optimization , 2018, IEEE Transactions on Signal Processing.