Accelerated Factored Gradient Descent for Low-Rank Matrix Factorization

We study the low-rank matrix estimation problem, where the objective function L(M) is defined over the space of positive semidefinite matrices with rank less than or equal to r. A fast approach to solve this problem is matrix factorization, which reparameterizes M as the product of two smaller matrix such that M = UU> and then performs gradient descent on U directly, a.k.a., factored gradient descent. Since the resulting problem is nonconvex, whether Nesterov’s acceleration scheme can be adapted to it remains a long-standing question. In this paper, we answer this question affirmatively by proposing a novel and practical accelerated factored gradient descent method motivated by Nesterov’s accelerated gradient descent. The proposed method enjoys better iteration complexity and computational complexity than the state-ofthe-art algorithms in a wide regime. The key idea of our algorithm is to restrict all its iterates onto a special convex set, which enables the acceleration. Experimental results demonstrate the faster convergence of our algorithm and corroborate our theory.

[1]  Prateek Jain,et al.  Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..

[2]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[3]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[4]  Ren-Cang Li,et al.  New Perturbation Bounds for the Unitary Polar Factor , 1995, SIAM J. Matrix Anal. Appl..

[5]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[6]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[7]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[8]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[9]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[10]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[11]  Alexandre Bernardino,et al.  Matrix Completion for Multi-label Image Classification , 2011, NIPS.

[12]  Xiao Zhang,et al.  A Unified Framework for Nonconvex Low-Rank plus Sparse Matrix Recovery , 2018, AISTATS.

[13]  Xiao Zhang,et al.  A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[14]  JainPrateek,et al.  Non-convex Optimization for Machine Learning , 2017 .

[15]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[16]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[17]  Michael I. Jordan,et al.  Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.

[18]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[19]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[20]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[21]  Huan Li,et al.  Provable accelerated gradient method for nonconvex low rank optimization , 2017, Machine Learning.

[22]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[23]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[24]  Xiao Zhang,et al.  A Unified Variance Reduction-Based Framework for Nonconvex Low-Rank Matrix Recovery , 2017, ICML.

[25]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[28]  Suvrit Sra,et al.  An Estimate Sequence for Geodesically Convex Optimization , 2018, COLT.

[29]  Anastasios Kyrillidis,et al.  Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[30]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[31]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[32]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[33]  Hong Cheng,et al.  Accelerated First-order Methods for Geodesically Convex Optimization on Riemannian Manifolds , 2017, NIPS.

[34]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[35]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[36]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[37]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[38]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[39]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[40]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[41]  Jiawei Han,et al.  Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation , 2016, ICML.