A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update

Nonconvex optimization arises in many areas of computational science and engineering. However, most nonconvex optimization algorithms are only known to have local convergence or subsequence convergence properties. In this paper, we propose an algorithm for nonconvex optimization and establish its global convergence (of the whole sequence) to a critical point. In addition, we give its asymptotic convergence rate and numerically demonstrate its efficiency. In our algorithm, the variables of the underlying problem are either treated as one block or multiple disjoint blocks. It is assumed that each non-differentiable component of the objective function, or each constraint, applies only to one block of variables. The differentiable components of the objective function, however, can involve multiple blocks of variables together. Our algorithm updates one block of variables at a time by minimizing a certain prox-linear surrogate, along with an extrapolation to accelerate its convergence. The order of update can be either deterministically cyclic or randomly shuffled for each cycle. In fact, our convergence analysis only needs that each block be updated at least once in every fixed number of iterations. We show its global convergence (of the whole sequence) to a critical point under fairly loose conditions including, in particular, the Kurdyka–Łojasiewicz condition, which is satisfied by a broad class of nonconvex/nonsmooth applications. These results, of course, remain valid when the underlying problem is convex. We apply our convergence results to the coordinate descent iteration for non-convex regularized linear regression, as well as a modified rank-one residue iteration for nonnegative matrix factorization. We show that both applications have global convergence. Numerically, we tested our algorithm on nonnegative matrix and tensor factorization problems, where random shuffling clearly improves the chance to avoid low-quality local solutions.

[1]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[2]  Qing Ling,et al.  Decentralized low-rank matrix completion , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[4]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[5]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[6]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[7]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[8]  Haesun Park,et al.  Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework , 2014, J. Glob. Optim..

[9]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Yangyang Xu,et al.  Proximal gradient method for huberized support vector machine , 2015, Pattern Analysis and Applications.

[12]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[13]  Maryam Fazel,et al.  Iterative reweighted algorithms for matrix rank minimization , 2012, J. Mach. Learn. Res..

[14]  Paul Van Dooren,et al.  Descent methods for Nonnegative Matrix Factorization , 2008, ArXiv.

[15]  Adrian S. Lewis,et al.  THEINEQUALITY FOR NONSMOOTH SUBANALYTIC FUNCTIONS WITH APPLICATIONS TO , 2007 .

[16]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[17]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[18]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[19]  S. Łojasiewicz Sur la géométrie semi- et sous- analytique , 1993 .

[20]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[21]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[22]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[23]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[24]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[25]  Genevera I. Allen,et al.  Sparse Higher-Order Principal Components Analysis , 2012, AISTATS.

[26]  GhadimiSaeed,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2016 .

[27]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[28]  Xiaojun Chen,et al.  Smoothing methods for nonsmooth, nonconvex minimization , 2012, Math. Program..

[29]  Wotao Yin,et al.  A Primer on Coordinate Descent Algorithms , 2016, 1610.00040.

[30]  Wotao Yin,et al.  A fast patch-dictionary method for whole image recovery , 2014, ArXiv.

[31]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[32]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[33]  Ambuj Tewari,et al.  On the Nonasymptotic Convergence of Cyclic Coordinate Descent Methods , 2013, SIAM J. Optim..

[34]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[35]  Lin Xiao,et al.  Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming , 2013, ArXiv.

[36]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[37]  Ming Yan,et al.  Coordinate Friendly Structures, Algorithms and Applications , 2016, ArXiv.

[38]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Zhi-Quan Luo,et al.  Iteration complexity analysis of block coordinate descent methods , 2013, Mathematical Programming.

[40]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[41]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[42]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[43]  Lin Xiao,et al.  A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming , 2017, SIAM J. Numer. Anal..

[44]  Amir Beck,et al.  On the Convergence of Block Coordinate Descent Type Methods , 2013, SIAM J. Optim..

[45]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[46]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[47]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[48]  Clifford Hildreth,et al.  A quadratic programming procedure , 1957 .

[49]  BolteJérôme,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems , 2010 .

[50]  Benar Fux Svaiter,et al.  Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..

[51]  Luigi Grippof,et al.  Globally convergent block-coordinate techniques for unconstrained optimization , 1999 .

[52]  Adil M. Bagirov,et al.  Subgradient Method for Nonconvex Nonsmooth Optimization , 2013, J. Optim. Theory Appl..

[53]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[54]  Antonio Fuduli,et al.  Minimizing Nonconvex Nonsmooth Functions via Cutting Planes and Proximity Control , 2003, SIAM J. Optim..

[55]  A. Kruger On Fréchet Subdifferentials , 2003 .

[56]  Peter Richtárik,et al.  Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function , 2011, Mathematical Programming.

[57]  Adrian S. Lewis,et al.  A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization , 2005, SIAM J. Optim..

[58]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[59]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[60]  Wotao Yin,et al.  Improved Iteratively Reweighted Least Squares for Unconstrained Smoothed 퓁q Minimization , 2013, SIAM J. Numer. Anal..

[61]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[62]  Max Welling,et al.  Positive tensor factorization , 2001, Pattern Recognit. Lett..

[63]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[64]  Lin Xiao,et al.  On the complexity analysis of randomized block-coordinate descent methods , 2013, Mathematical Programming.

[65]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[66]  Yangyang Xu,et al.  Alternating proximal gradient method for sparse nonnegative Tucker decomposition , 2013, Mathematical Programming Computation.