Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks

In this work, we study two first-order primal-dual based algorithms, the Gradient Primal-Dual Algorithm (GPDA) and the Gradient Alternating Direction Method of Multipliers (GADMM), for solving a class of linearly constrained non-convex optimization problems. We show that with random initialization of the primal and dual variables, both algorithms are able to compute second-order stationary solutions (ss2) with probability one. This is the first result showing that primal-dual algorithm is capable of finding ss2 when only using first-order information; it also extends the existing results for first-order, but primal-only algorithms. An important implication of our result is that it also gives rise to the first global convergence result to the ss2, for two classes of unconstrained distributed non-convex learning problems over multi-agent networks.

[1]  R. Tyrrell Rockafellar,et al.  Augmented Lagrangians and Applications of the Proximal Point Algorithm in Convex Programming , 1976, Math. Oper. Res..

[2]  M. Shub Global Stability of Dynamical Systems , 1986 .

[3]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[4]  Stephen J. Wright Implementing proximal point methods for linear programming , 1990 .

[5]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[6]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[7]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[8]  Alejandro Ribeiro,et al.  Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.

[9]  Asuman E. Ozdaglar,et al.  Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.

[10]  Angelia Nedic,et al.  Subgradient Methods for Saddle-Point Problems , 2009, J. Optimization Theory and Applications.

[11]  Ioannis D. Schizas,et al.  Distributed LMS for Consensus-Based In-Network Adaptive Processing , 2009, IEEE Transactions on Signal Processing.

[12]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[13]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[14]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[15]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[16]  D. Andersen,et al.  Delayed Proximal Gradient Methods , 2013 .

[17]  Pascal Bianchi,et al.  Convergence of a Multi-Agent Projected Stochastic Gradient Algorithm for Non-Convex Optimization , 2011, IEEE Transactions on Automatic Control.

[18]  Guoyin Li,et al.  Splitting methods for nonconvex composite optimization , 2014, ArXiv.

[19]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[20]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[22]  Zhi-Quan Luo,et al.  Semi-asynchronous routing for large scale hierarchical networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[24]  Yuchen Zhang,et al.  DiSCO: Distributed Optimization for Self-Concordant Empirical Loss , 2015, ICML.

[25]  Erfan Yazdandoost Hamedani,et al.  A primal-dual method for conic constrained distributed optimization problems , 2016, NIPS.

[26]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[27]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[28]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[29]  Wei Zhang,et al.  Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.

[30]  Jefferson G. Melo,et al.  Convergence rate bounds for a proximal ADMM with over-relaxation stepsize parameter for solving nonconvex linearly constrained problems , 2017, 1702.01850.

[31]  Mingyi Hong,et al.  Prox-PDA: The Proximal Primal-Dual Algorithm for Fast Distributed Nonconvex Optimization and Learning Over Networks , 2017, ICML.

[32]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[33]  Xiang Gao,et al.  On the Information-Adaptive Variants of the ADMM: An Iteration Complexity Perspective , 2017, Journal of Scientific Computing.

[34]  Alexander J. Smola,et al.  A Generic Approach for Escaping Saddle points , 2017, AISTATS.

[35]  Michael I. Jordan,et al.  First-order methods almost always avoid saddle points: The case of vanishing step-sizes , 2019, NeurIPS.

[36]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[37]  Shiqian Ma,et al.  Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis , 2016, Computational Optimization and Applications.