Provable accelerated gradient method for nonconvex low rank optimization

Optimization over low rank matrices has broad applications in machine learning. For large-scale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem $$\min _{\mathbf {U}\in \mathbb {R}^{n\times r}} g(\mathbf {U})=f(\mathbf {U}\mathbf {U}^T)$$ min U ∈ R n × r g ( U ) = f ( U U T ) under the assumptions that $$f(\mathbf {X})$$ f ( X ) is restricted $$\mu $$ μ -strongly convex and L -smooth on the set $$\{\mathbf {X}:\mathbf {X}\succeq 0,\text{ rank }(\mathbf {X})\le r\}$$ { X : X ⪰ 0 , rank ( X ) ≤ r } . We propose an accelerated gradient method with alternating constraint that operates directly on the $$\mathbf {U}$$ U factors and show that the method has local linear convergence rate with the optimal dependence on the condition number of $$\sqrt{L/\mu }$$ L / μ . Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.

[1]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[2]  Ryan Kennedy Low-Rank Matrix Completion , 2013 .

[3]  Wotao Yin,et al.  A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[4]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[5]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[6]  Anastasios Kyrillidis,et al.  Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems , 2016 .

[7]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[8]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[9]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[10]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[11]  Anastasios Kyrillidis,et al.  Provable non-convex projected gradient descent for a class of constrained matrix optimization problems , 2016, ArXiv.

[12]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[13]  Xiao Zhang,et al.  A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery , 2018, ICML.

[14]  Trevor J Hastie,et al.  Reduced-rank vector generalized linear models , 2003 .

[15]  Yurii Nesterov,et al.  Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[16]  R. Larsen Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[17]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[18]  A. Edelman Eigenvalues and condition numbers of random matrices , 1988 .

[19]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[20]  Anastasios Kyrillidis,et al.  Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[21]  Yiyuan She,et al.  Reduced Rank Vector Generalized Linear Models for Feature Extraction , 2010, 1007.3098.

[22]  Xiao Zhang,et al.  A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[23]  Venkatesan Guruswami,et al.  Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[24]  Zhihui Zhu,et al.  The Global Optimization Geometry of Low-Rank Matrix Optimization , 2017, IEEE Transactions on Information Theory.

[25]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[26]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[27]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[28]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[29]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[30]  Zhaoran Wang,et al.  Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[31]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[32]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[33]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[34]  Yair Carmon,et al.  Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[35]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[36]  Huan Li,et al.  Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[37]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[38]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[39]  Christos Boutsidis,et al.  Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[40]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[41]  Bo Wen,et al.  Linear Convergence of Proximal Gradient Algorithm with Extrapolation for a Class of Nonconvex Nonsmooth Minimization Problems , 2015, SIAM J. Optim..

[42]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[43]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[44]  Yair Carmon,et al.  "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[45]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[47]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[48]  Weiwei Sun,et al.  New Perturbation Bounds for Unitary Polar Factors , 2003, SIAM J. Matrix Anal. Appl..

[49]  Guangcan Liu,et al.  Low-Rank Matrix Completion in the Presence of High Coherence , 2016, IEEE Transactions on Signal Processing.

[50]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[51]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[52]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[53]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[54]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[55]  Qiuwei Li,et al.  The non-convex geometry of low-rank matrix optimization , 2016, Information and Inference: A Journal of the IMA.

[56]  Jieping Ye,et al.  A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing , 2016, NIPS.

[57]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[58]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[59]  Aswin C. Sankaranarayanan,et al.  SpaRCS: Recovering low-rank and sparse matrices from compressive measurements , 2011, NIPS.

[60]  ModelsThomas W. Yee Reduced-rank Vector Generalized Linear Models , 2000 .

[61]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[62]  A. Tsybakov,et al.  Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[63]  Hongbin Zha,et al.  A Unified Convex Surrogate for the Schatten-p Norm , 2016, AAAI.

[64]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[65]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[66]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[67]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[68]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[69]  Avishai Wagner,et al.  Low-Rank Matrix Recovery from Row-and-Column Affine Measurements , 2015, ICML.

[70]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[71]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[72]  Ren-Cang Li,et al.  New Perturbation Bounds for the Unitary Polar Factor , 1995, SIAM J. Matrix Anal. Appl..

[73]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[74]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[75]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.