Gradient descent with nonconvex constraints: local concavity determines convergence

Many problems in high-dimensional statistics and optimization involve minimization over nonconvex constraints-for instance, a rank constraint for a matrix estimation problem-but little is known about the theoretical properties of such optimization problems for a general nonconvex constraint set. In this paper we study the interplay between the geometric properties of the constraint set and the convergence behavior of gradient descent for minimization over this set. We develop the notion of local concavity coefficients of the constraint set, measuring the extent to which convexity is violated, which govern the behavior of projected gradient descent over this set. We demonstrate the versatility of these concavity coefficients by computing them for a range of problems in low-rank estimation, sparse estimation, and other examples. Through our understanding of the role of these geometric properties in optimization, we then provide a convergence analysis when projections are calculated only approximately, leading to a more efficient method for projected gradient descent in low-rank estimation problems.

[1]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[2]  Annamaria Canino,et al.  On p-convex sets and geodesics , 1988 .

[3]  Alexander Shapiro Existence and Differentiability of Metric Projections in Hilbert Spaces , 1994, SIAM J. Optim..

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  R. Rockafellar,et al.  Local differentiability of distance functions , 2000 .

[6]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[7]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[8]  Teemu Pennanen,et al.  Local Convergence of the Proximal Point Algorithm and Multiplier Methods Without Monotonicity , 2002, Math. Oper. Res..

[9]  Alfredo N. Iusem,et al.  Inexact Variants of the Proximal Point Algorithm without Monotonicity , 2002, SIAM J. Optim..

[10]  Giovanni Colombo,et al.  Sweeping by a continuous prox-regular set $ , 2003 .

[11]  L. Thibault,et al.  Characterizations of Prox-Regular Sets in Uniformly Convex Banach Spaces , 2006 .

[12]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[13]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[14]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[15]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[16]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[17]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[18]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[19]  Lionel Thibault,et al.  Full Length Paper , 2011 .

[20]  Joydeep Ghosh,et al.  Noisy Matrix Completion Using Alternating Minimization , 2013, ECML/PKDD.

[21]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[22]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[23]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[24]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[25]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[26]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[27]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[28]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[29]  Stephen J. Wright,et al.  A proximal method for composite minimization , 2008, Mathematical Programming.

[30]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[31]  Benjamin Recht,et al.  Sharp Time–Data Tradeoffs for Linear Inverse Problems , 2015, IEEE Transactions on Information Theory.

[32]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.