Decomposing Linearly Constrained Nonconvex Problems by a Proximal Primal Dual Approach: Algorithms, Convergence, and Applications

In this paper, we propose a new decomposition approach named the proximal primal dual algorithm (Prox-PDA) for smooth nonconvex linearly constrained optimization problems. The proposed approach is primal-dual based, where the primal step minimizes certain approximation of the augmented Lagrangian of the problem, and the dual step performs an approximate dual ascent. The approximation used in the primal step is able to decompose the variable blocks, making it possible to obtain simple subproblems by leveraging the problem structures. Theoretically, we show that whenever the penalty parameter in the augmented Lagrangian is larger than a given threshold, the Prox-PDA converges to the set of stationary solutions, globally and in a sublinear manner (i.e., certain measure of stationarity decreases in the rate of $\mathcal{O}(1/r)$, where $r$ is the iteration counter). Interestingly, when applying a variant of the Prox-PDA to the problem of distributed nonconvex optimization (over a connected undirected graph), the resulting algorithm coincides with the popular EXTRA algorithm [Shi et al 2014], which is only known to work in convex cases. Our analysis implies that EXTRA and its variants converge globally sublinearly to stationary solutions of certain nonconvex distributed optimization problem. There are many possible extensions of the Prox-PDA, and we present one particular extension to certain nonconvex distributed matrix factorization problem.

[1]  Guoyin Li,et al.  Global Convergence of Splitting Methods for Nonconvex Composite Optimization , 2014, SIAM J. Optim..

[2]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[3]  P. Toint,et al.  A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds , 1991 .

[4]  Mikhail V. Solodov,et al.  Local Convergence of Exact and Inexact Augmented Lagrangian Methods under the Second-Order Sufficient Optimality Condition , 2012, SIAM J. Optim..

[5]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[6]  Michael P. Friedlander,et al.  A Globally Convergent Linearly Constrained Lagrangian Method for Nonlinear Optimization , 2005, SIAM J. Optim..

[7]  M. Hestenes Multiplier and gradient methods , 1969 .

[8]  Jack Yurkiewicz,et al.  Constrained optimization and Lagrange multiplier methods, by D. P. Bertsekas, Academic Press, New York, 1982, 395 pp. Price: $65.00 , 1985, Networks.

[9]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[10]  Mingyi Hong,et al.  A Distributed, Asynchronous, and Incremental Algorithm for Nonconvex Optimization: An ADMM Approach , 2014, IEEE Transactions on Control of Network Systems.

[11]  Yin Zhang,et al.  An Alternating Direction Algorithm for Nonnegative Matrix Factorization , 2010 .

[12]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[13]  Stephen P. Boyd,et al.  Distributed average consensus with least-mean-square deviation , 2007, J. Parallel Distributed Comput..

[14]  Sanjo Zlobec,et al.  On the Liu–Floudas Convexification of Smooth Programs , 2005, J. Glob. Optim..

[15]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[16]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[17]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[18]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[19]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[20]  Xiangfeng Wang,et al.  Multi-Agent Distributed Optimization via Inexact Consensus ADMM , 2014, IEEE Transactions on Signal Processing.

[21]  Nicholas I. M. Gould,et al.  On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[22]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[23]  Qing Ling,et al.  EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization , 2014, 1404.6264.

[24]  Guoyin Li,et al.  Splitting methods for nonconvex composite optimization , 2014, ArXiv.

[25]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[26]  Alejandro Ribeiro,et al.  Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.

[27]  Georgios B. Giannakis,et al.  Distributed consensus-based demodulation: algorithms and error analysis , 2010, IEEE Transactions on Wireless Communications.

[28]  Shuguang Cui,et al.  Dynamic Resource Allocation in Cognitive Radio Networks , 2010, IEEE Signal Processing Magazine.

[29]  John N. Tsitsiklis,et al.  Problems in decentralized decision making and computation , 1984 .

[30]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[31]  Nicholas I. M. Gould,et al.  Adaptive augmented Lagrangian methods: algorithms and practical numerical experience , 2014, Optim. Methods Softw..

[32]  R. Glowinski,et al.  Numerical Methods for Nonlinear Variational Problems , 1985 .

[33]  James T. Kwok,et al.  Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[34]  Georgios B. Giannakis,et al.  Distributed Clustering Using Wireless Sensor Networks , 2011, IEEE Journal of Selected Topics in Signal Processing.

[35]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Nikos D. Sidiropoulos,et al.  Robust volume minimization-based matrix factorization via alternating optimization , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  R. Glowinski,et al.  Sur l'approximation, par éléments finis d'ordre un, et la résolution, par pénalisation-dualité d'une classe de problèmes de Dirichlet non linéaires , 1975 .

[38]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[39]  Qing Ling,et al.  Decentralized low-rank matrix completion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Chao Yang,et al.  Alternating direction methods for classical and ptychographic phase retrieval , 2012 .

[41]  Behrouz Touri,et al.  Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.

[42]  Zhi-Quan Luo,et al.  Semi-asynchronous routing for large scale hierarchical networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Asuman E. Ozdaglar,et al.  On the O(1=k) convergence of asynchronous distributed alternating Direction Method of Multipliers , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[44]  Nikos D. Sidiropoulos,et al.  Parallel Algorithms for Constrained Tensor Factorization via Alternating Direction Method of Multipliers , 2014, IEEE Transactions on Signal Processing.

[45]  Cédric Févotte,et al.  Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[47]  A. Goldsmith,et al.  Sum power iterative water-filling for multi-antenna Gaussian broadcast channels , 2002, Conference Record of the Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002..

[48]  Jonathan Eckstein Splitting methods for monotone operators with applications to parallel optimization , 1989 .

[49]  Musa A. Mammadov,et al.  An inexact modified subgradient algorithm for nonconvex optimization , 2010, Comput. Optim. Appl..

[50]  Wotao Yin,et al.  On the Global and Linear Convergence of the Generalized Alternating Direction Method of Multipliers , 2016, J. Sci. Comput..

[51]  Xiaodong Li,et al.  Stable Principal Component Pursuit , 2010, 2010 IEEE International Symposium on Information Theory.

[52]  Bingsheng He,et al.  Solving Large-Scale Least Squares Semidefinite Programming by Alternating Direction Methods , 2011, SIAM J. Matrix Anal. Appl..

[53]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.

[54]  Renato D. C. Monteiro,et al.  Iteration-complexity of first-order augmented Lagrangian methods for convex programming , 2015, Mathematical Programming.