Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees

Optimization problems with rank constraints arise in many applications, including matrix regression, structured PCA, matrix completion and matrix decomposition problems. An attractive heuristic for solving such problems is to factorize the low-rank matrix, and to run projected gradient descent on the nonconvex factorized optimization problem. The goal of this problem is to provide a general theoretical framework for understanding when such methods work well, and to characterize the nature of the resulting fixed point. We provide a simple set of conditions under which projected gradient descent, when given a suitable initialization, converges geometrically to a statistically useful solution. Our results are applicable even when the initial solution is outside any region of local convexity, and even when the problem is globally concave. Working in a non-asymptotic framework, we show that our conditions are satisfied for a wide range of concrete models, including matrix regression, structured PCA, matrix completion with real and quantized observations, matrix decomposition, and graph clustering problems. Simulation results show excellent agreement with the theoretical predictions.

[1]  Gene H. Golub,et al.  Matrix computations , 1983 .

[2]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[3]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[6]  M. Ledoux The concentration of measure phenomenon , 2001 .

[7]  N. Maculan,et al.  An O(n) Algorithm for Projecting a Vector on the Intersection of a Hyperplane and a Box in Rn , 2003 .

[8]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[9]  U. Feige,et al.  Spectral techniques applied to sparse random graphs , 2005 .

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Yoram Singer,et al.  Efficient projections onto the {\it l}$_{\mbox{1}}$-ball for learning in high dimensions , 2008, ICML 2008.

[12]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[14]  Monique Laurent,et al.  Matrix Completion Problems , 2009, Encyclopedia of Optimization.

[15]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[16]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[17]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[18]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[19]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[20]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[21]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[22]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[23]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[24]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[27]  Sham M. Kakade,et al.  Robust Matrix Decomposition With Sparse Corruptions , 2011, IEEE Transactions on Information Theory.

[28]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[29]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[30]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[31]  Hao Su,et al.  Efficient Euclidean Projections onto the Intersection of Norm Balls , 2012, ICML.

[32]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[33]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[34]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[35]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[36]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[37]  Ali Jalali,et al.  Low-Rank Matrix Recovery From Errors and Erasures , 2013, IEEE Transactions on Information Theory.

[38]  B. Nadler,et al.  MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[39]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[40]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[41]  Wen-Xin Zhou,et al.  A max-norm constrained minimization approach to 1-bit matrix completion , 2013, J. Mach. Learn. Res..

[42]  Zhaoran Wang,et al.  High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality , 2014, 1412.8729.

[43]  Zhaoran Wang,et al.  Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, ArXiv.

[44]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[45]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[46]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[47]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[48]  S. Sanghavi,et al.  Improved Graph Clustering , 2012, IEEE Transactions on Information Theory.

[49]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[50]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[51]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[52]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[53]  Sujay Sanghavi,et al.  The Local Convexity of Solving Quadratic Equations , 2015 .

[54]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Nonconvex Factorization , 2015, FOCS.

[55]  Harrison H. Zhou,et al.  Minimax estimation in sparse canonical correlation analysis , 2014, 1405.1595.

[56]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[57]  Jian-Feng Cai,et al.  Projected Wirtinger Gradient Descent for Low-Rank Hankel Matrix Completion in Spectral Compressed Sensing , 2015, ArXiv.

[58]  Adel Javanmard,et al.  1-bit matrix completion under exact low-rank constraint , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).

[59]  T. Cai,et al.  Optimal estimation and rank detection for sparse spiked covariance matrices , 2013, Probability theory and related fields.

[60]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[61]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[62]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[63]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[64]  Kunle Olukotun,et al.  Taming the Wild: A Unified Analysis of Hogwild-Style Algorithms , 2015, NIPS.

[65]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[66]  Dejiao Zhang,et al.  Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation , 2015, AISTATS.

[67]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.