Convergence guarantees for a class of non-convex and non-smooth optimization problems

We consider the problem of finding critical points of functions that are non-convex and non-smooth. Studying a fairly broad class of such problems, we analyze the behavior of three gradient-based methods (gradient descent, proximal update, and Frank-Wolfe update). For each of these methods, we establish rates of convergence for general problems, and also prove faster rates for continuous sub-analytic functions. We also show that our algorithms can escape strict saddle points for a class of non-smooth functions, thereby generalizing known results for smooth functions. Our analysis leads to a simplification of the popular CCCP algorithm, used for optimizing functions that can be written as a difference of two convex functions. Our simplified algorithm retains all the convergence properties of CCCP, along with a significantly lower cost per iteration. We illustrate our methods and theory via applications to the problems of best subset selection, robust estimation, mixture density estimation, and shape-from-shading reconstruction.

[1]  Adrian S. Lewis,et al.  The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[2]  John N. Tsitsiklis,et al.  NP-hardness of deciding convexity of quartic polynomials and related problems , 2010, Math. Program..

[3]  Akiko Takeda,et al.  DC formulations and algorithms for sparse optimization problems , 2017, Mathematical Programming.

[4]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[5]  Zhi-Quan Luo,et al.  Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Georgios Piliouras,et al.  Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.

[7]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[8]  A. Ostrowski Solution of equations and systems of equations , 1967 .

[9]  Allan D. Jepson,et al.  Polynomial shape from shading , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Hoang Tuy,et al.  D.C. Optimization: Theory, Methods and Algorithms , 1995 .

[11]  Bo Wen,et al.  A proximal difference-of-convex algorithm with extrapolation , 2016, Computational Optimization and Applications.

[12]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[14]  P. Hartman On functions representable as a difference of convex functions , 1959 .

[15]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[16]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[17]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[18]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[19]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[22]  Simon Lacoste-Julien,et al.  Convergence Rate of Frank-Wolfe for Non-Convex Objectives , 2016, ArXiv.

[23]  Raquel Urtasun,et al.  Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials , 2014, NIPS.

[24]  Stephen P. Boyd,et al.  Variations and extension of the convex–concave procedure , 2016 .

[25]  Guoyin Li,et al.  Calculus of the Exponent of Kurdyka–Łojasiewicz Inequality and Its Applications to Linear Convergence of First-Order Methods , 2016, Foundations of Computational Mathematics.

[26]  Nicholas I. M. Gould,et al.  On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..

[27]  Amir Ali Ahmadi,et al.  A Complete Characterization of the Gap between Convexity and SOS-Convexity , 2011, SIAM J. Optim..

[28]  M. Coste AN INTRODUCTION TO SEMIALGEBRAIC GEOMETRY , 2002 .

[29]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[30]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[31]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[32]  Purnamrita Sarkar,et al.  Convergence of Gradient EM on Multi-component Mixture of Gaussians , 2017, NIPS.

[33]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[34]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[35]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[36]  Fariborz Maseeh,et al.  Convergence Analysis of a Proximal Point Algorithm for Minimizing Differences of Functions , 2017 .

[37]  H. Ngai,et al.  Convergence Analysis of DC Algorithm for DC programming with subanalytic data , 2010 .

[38]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[39]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[40]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .