Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Low-rank matrix estimation is a canonical problem that finds numerous applications in signal processing, machine learning and imaging science. A popular approach in practice is to factorize the matrix into two compact low-rank factors, and then optimize these factors directly via simple iterative methods such as gradient descent and alternating minimization. Despite nonconvexity, recent literatures have shown that these simple heuristics in fact achieve linear convergence when initialized properly for a growing number of problems of interest. However, upon closer examination, existing approaches can still be computationally expensive especially for ill-conditioned matrices: the convergence rate of gradient descent depends linearly on the condition number of the low-rank matrix, while the per-iteration cost of alternating minimization is often prohibitive for large matrices. The goal of this paper is to set forth a competitive algorithmic approach dubbed Scaled Gradient Descent (ScaledGD) which can be viewed as pre-conditioned or diagonally-scaled gradient descent, where the pre-conditioners are adaptive and iteration-varying with a minimal computational overhead. With tailored variants for low-rank matrix sensing, robust principal component analysis and matrix completion, we theoretically show that ScaledGD achieves the best of both worlds: it converges linearly at a rate independent of the condition number of the low-rank matrix similar as alternating minimization, while maintaining the low per-iteration cost of gradient descent. Our analysis is also applicable to general loss functions that are restricted strongly convex and smooth over low-rank matrices. To the best of our knowledge, ScaledGD is the first algorithm that provably has such properties over a wide range of low-rank matrix estimation tasks.

[1]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[2]  Bamdev Mishra,et al.  A Riemannian geometry for low-rank matrix completion , 2012, ArXiv.

[3]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[4]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[5]  J. Tanner,et al.  Low rank matrix completion by alternating steepest descent methods , 2016 .

[6]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[7]  Sujay Sanghavi,et al.  The Local Convexity of Solving Systems of Quadratic Equations , 2015, 1506.07868.

[8]  MANTAS MAŽEIKA THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION , 2016 .

[9]  Jian-Feng Cai,et al.  Spectral Compressed Sensing via Projected Gradient Descent , 2017, SIAM J. Optim..

[10]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[11]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[12]  Yuxin Chen,et al.  Nonconvex Matrix Factorization from Rank-One Measurements , 2019, AISTATS.

[13]  Volkan Cevher,et al.  MATRIX ALPS: Accelerated low rank and sparse matrix reconstruction , 2012, 2012 IEEE Statistical Signal Processing Workshop (SSP).

[14]  Dmitriy Drusvyatskiy,et al.  Low-Rank Matrix Recovery with Composite Optimization: Good Conditioning and Rapid Convergence , 2019, Found. Comput. Math..

[15]  Yudong Chen,et al.  Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation: Recent Theory and Fast Algorithms via Convex and Nonconvex Optimization , 2018, IEEE Signal Processing Magazine.

[16]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[17]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[18]  Xiaodong Li,et al.  Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization , 2016, Applied and Computational Harmonic Analysis.

[19]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[20]  Wei Hu,et al.  Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.

[21]  Xiaodong Li,et al.  Nonconvex Rectangular Matrix Completion via Gradient Descent without $\ell_{2,\infty}$ Regularization , 2019 .

[22]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[23]  JainPrateek,et al.  Non-convex Optimization for Machine Learning , 2017 .

[24]  Ji Chen,et al.  Nonconvex Rectangular Matrix Completion via Gradient Descent Without ℓ₂,∞ Regularization , 2020, IEEE Transactions on Information Theory.

[25]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[26]  Yuling Yan,et al.  Bridging Convex and Nonconvex Optimization in Robust PCA: Noise, Outliers, and Missing Data , 2020, Annals of statistics.

[27]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[28]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[29]  John Wright,et al.  Complete Dictionary Recovery Using Nonconvex Optimization , 2015, ICML.

[30]  Yuxin Chen,et al.  Robust Spectral Compressed Sensing via Structured Matrix Completion , 2013, IEEE Transactions on Information Theory.

[31]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[32]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[33]  Xiaodong Li,et al.  Model-free Nonconvex Matrix Completion: Local Minima Analysis and Applications in Memory-efficient Kernel PCA , 2019, J. Mach. Learn. Res..

[34]  Yuejie Chi,et al.  Low-Rank Matrix Recovery with Scaled Subgradient Methods: Fast and Robust Convergence Without the Condition Number , 2020, ArXiv.

[35]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[36]  Jean Lafond,et al.  Low Rank Matrix Completion with Exponential Family Noise , 2015, COLT.

[37]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[38]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[39]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[40]  Yuejie Chi,et al.  Beyond Procrustes: Balancing-free Gradient Descent for Asymmetric Low-Rank Matrix Sensing , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[41]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[42]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[43]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[44]  Bamdev Mishra,et al.  Riemannian Preconditioning , 2014, SIAM J. Optim..

[45]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[46]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[47]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[48]  Tony F. Chan,et al.  Guarantees of Riemannian Optimization for Low Rank Matrix Recovery , 2015, SIAM J. Matrix Anal. Appl..

[49]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[50]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[51]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[52]  Pradeep Ravikumar,et al.  Exponential Family Matrix Completion under Structural Constraints , 2014, ICML.

[53]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[54]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[55]  Damek Davis,et al.  The nonsmooth landscape of phase retrieval , 2017, IMA Journal of Numerical Analysis.

[56]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[57]  Yuling Yan,et al.  Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization , 2019, SIAM J. Optim..

[58]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.