论文信息 - Provable accelerated gradient method for nonconvex low rank optimization - 字舞流文

Provable accelerated gradient method for nonconvex low rank optimization

Optimization over low rank matrices has broad applications in machine learning. For large-scale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem $$\min _{\mathbf {U}\in \mathbb {R}^{n\times r}} g(\mathbf {U})=f(\mathbf {U}\mathbf {U}^T)$$ min U ∈ R n × r g ( U ) = f ( U U T ) under the assumptions that $$f(\mathbf {X})$$ f ( X ) is restricted $$\mu $$ μ -strongly convex and L -smooth on the set $$\{\mathbf {X}:\mathbf {X}\succeq 0,\text{ rank }(\mathbf {X})\le r\}$$ { X : X ⪰ 0 , rank ( X ) ≤ r } . We propose an accelerated gradient method with alternating constraint that operates directly on the $$\mathbf {U}$$ U factors and show that the method has local linear convergence rate with the optimal dependence on the condition number of $$\sqrt{L/\mu }$$ L / μ . Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of $$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$$ X = U ~ V ~ T and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.

Huan Li | Zhouchen Lin | Zhouchen Lin | Huan Li

[1] Trevor J. Hastie,et al. Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[2] Ryan Kennedy. Low-Rank Matrix Completion , 2013 .

[3] Wotao Yin,et al. A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..

[4] Zhaoran Wang,et al. A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[5] Mary Wootters,et al. Fast matrix completion without the condition number , 2014, COLT.

[6] Anastasios Kyrillidis,et al. Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems , 2016 .

[7] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[8] Anastasios Kyrillidis,et al. Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[9] Zeyuan Allen Zhu,et al. Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[10] Junwei Lu,et al. Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[11] Anastasios Kyrillidis,et al. Provable non-convex projected gradient descent for a class of constrained matrix optimization problems , 2016, ArXiv.

[12] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.

[13] Xiao Zhang,et al. A Primal-Dual Analysis of Global Optimality in Nonconvex Low-Rank Matrix Recovery , 2018, ICML.

[14] Trevor J Hastie,et al. Reduced-rank vector generalized linear models , 2003 .

[15] Yurii Nesterov,et al. Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..

[16] R. Larsen. Lanczos Bidiagonalization With Partial Reorthogonalization , 1998 .

[17] Zaïd Harchaoui,et al. A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[18] A. Edelman. Eigenvalues and condition numbers of random matrices , 1988 .

[19] Renato D. C. Monteiro,et al. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[20] Anastasios Kyrillidis,et al. Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[21] Yiyuan She,et al. Reduced Rank Vector Generalized Linear Models for Feature Extraction , 2010, 1007.3098.

[22] Xiao Zhang,et al. A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[23] Venkatesan Guruswami,et al. Optimal column-based low-rank matrix reconstruction , 2011, SODA.

[24] Zhihui Zhu,et al. The Global Optimization Geometry of Low-Rank Matrix Optimization , 2017, IEEE Transactions on Information Theory.

[25] Nicolas Boumal,et al. The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[26] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[27] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[28] Nathan Srebro,et al. Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[29] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[30] Zhaoran Wang,et al. Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[31] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[32] Yair Carmon,et al. Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[33] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.

[34] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..

[35] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[36] Huan Li,et al. Accelerated Proximal Gradient Methods for Nonconvex Programming , 2015, NIPS.

[37] Renato D. C. Monteiro,et al. Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[38] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[39] Christos Boutsidis,et al. Faster Subset Selection for Matrices and Applications , 2011, SIAM J. Matrix Anal. Appl..

[40] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[41] Bo Wen,et al. Linear Convergence of Proximal Gradient Algorithm with Extrapolation for a Class of Nonconvex Nonsmooth Minimization Problems , 2015, SIAM J. Optim..

[42] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[43] Constantine Caramanis,et al. Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[44] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[45] Yong Yu,et al. Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Junwei Lu,et al. Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[47] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[48] Weiwei Sun,et al. New Perturbation Bounds for Unitary Polar Factors , 2003, SIAM J. Matrix Anal. Appl..

[49] Guangcan Liu,et al. Low-Rank Matrix Completion in the Presence of High Coherence , 2016, IEEE Transactions on Signal Processing.

[50] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[51] John D. Lafferty,et al. A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[52] T. Cai,et al. Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[53] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[54] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[55] Qiuwei Li,et al. The non-convex geometry of low-rank matrix optimization , 2016, Information and Inference: A Journal of the IMA.

[56] Jieping Ye,et al. A Non-convex One-Pass Framework for Generalized Factorization Machine and Rank-One Matrix Sensing , 2016, NIPS.

[57] John D. Lafferty,et al. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[58] Martin J. Wainwright,et al. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[59] Aswin C. Sankaranarayanan,et al. SpaRCS: Recovering low-rank and sparse matrices from compressive measurements , 2011, NIPS.

[60] ModelsThomas W. Yee. Reduced-rank Vector Generalized Linear Models , 2000 .

[61] Ewout van den Berg,et al. 1-Bit Matrix Completion , 2012, ArXiv.

[62] A. Tsybakov,et al. Estimation of high-dimensional low-rank matrices , 2009, 0912.5338.

[63] Hongbin Zha,et al. A Unified Convex Surrogate for the Schatten-p Norm , 2016, AAAI.

[64] V. Koltchinskii,et al. Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[65] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[66] Yin Zhang,et al. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[67] Martin J. Wainwright,et al. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[68] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[69] Avishai Wagner,et al. Low-Rank Matrix Recovery from Row-and-Column Affine Measurements , 2015, ICML.

[70] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[71] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[72] Ren-Cang Li,et al. New Perturbation Bounds for the Unitary Polar Factor , 1995, SIAM J. Matrix Anal. Appl..

[73] Max Simchowitz,et al. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[74] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[75] Martin J. Wainwright,et al. Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.