Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably

A rank-r matrix X \in R^{m x n} can be written as a product UV', where U \in R^{m x r} and V \in R^{n x r}. One could exploit this observation in optimization: e.g., consider the minimization of a convex function f(X) over rank-r matrices, where the scaffold of rank-r matrices is modeled via the factorization in U and V variables. Such heuristic has been widely used before for specific problem instances, where the solution sought is (approximately) low-rank. Though such parameterization reduces the number of variables and is more efficient in computational speed and memory requirement (of particular interest is the case r << min{m, n}), it comes at a cost: f(UV') becomes a non-convex function w.r.t. U and V. In this paper, we study such parameterization in optimization of generic convex f and focus on first-order, gradient descent algorithmic solutions. We propose an algorithm we call the Bi-Factored Gradient Descent (BFGD) algorithm, an efficient first-order method that operates on the U, V factors. We show that when f is smooth, BFGD has local sublinear convergence, and linear convergence when f is both smooth and strongly convex. Moreover, for several key applications, we provide simple and efficient initialization schemes that provide approximate solutions good enough for the above convergence results to hold.

[1]  J. A. López del Val,et al.  Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[2]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[3]  Petros Drineas,et al.  Fast Monte-Carlo algorithms for approximate matrix multiplication , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4]  Volkan Cevher,et al.  A Universal Primal-Dual Convex Optimization Framework , 2015, NIPS.

[5]  Adel Javanmard,et al.  Localization from Incomplete Noisy Distance Measurements , 2011, Foundations of Computational Mathematics.

[6]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[7]  Nitish Gupta,et al.  Collectively Embedding Multi-Relational Data for Predicting User Preferences , 2015, ArXiv.

[8]  Anastasios Kyrillidis,et al.  Provable non-convex projected gradient descent for a class of constrained matrix optimization problems , 2016, ArXiv.

[9]  Lieven Vandenberghe,et al.  Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[10]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[11]  Yin Zhang,et al.  Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions , 2013, SIAM J. Sci. Comput..

[12]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[15]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[16]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[17]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[18]  Yudong Chen,et al.  Coherent Matrix Completion , 2013, ICML.

[19]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[20]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[21]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[22]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[23]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[24]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[25]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[26]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[27]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[28]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[29]  Chunyan Miao,et al.  Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[30]  Jared Tanner,et al.  Normalized Iterative Hard Thresholding for Matrix Completion , 2013, SIAM J. Sci. Comput..

[31]  Christopher C. Johnson Logistic Matrix Factorization for Implicit Feedback Data , 2014 .

[32]  Anastasios Kyrillidis,et al.  Approximate matrix multiplication with application to linear embeddings , 2014, 2014 IEEE International Symposium on Information Theory.

[33]  Scott Aaronson,et al.  The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[35]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[36]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[37]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[38]  G. Dunteman Principal Components Analysis , 1989 .

[39]  James Bennett,et al.  The Netflix Prize , 2007 .

[40]  Prateek Jain,et al.  Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.

[41]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[42]  Yi-Kai Liu,et al.  Universal low-rank matrix recovery from Pauli measurements , 2011, NIPS.

[43]  Steven T. Flammia,et al.  Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators , 2012, 1205.2300.

[44]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[45]  Martin Jaggi,et al.  A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[46]  Aswin C. Sankaranarayanan,et al.  SpaRCS: Recovering low-rank and sparse matrices from compressive measurements , 2011, NIPS.

[47]  Sewoong Oh,et al.  A Gradient Descent Algorithm on the Grassman Manifold for Matrix Completion , 2009, ArXiv.

[48]  Sören Laue A Hybrid Algorithm for Convex Semidefinite Optimization , 2012, ICML.

[49]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[50]  S. Sanghavi,et al.  A general framework for high-dimensional estimation in the presence of incoherence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[51]  Kilian Q. Weinberger,et al.  Graph Laplacian Regularization for Large-Scale Semidefinite Programming , 2006, NIPS.

[52]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[53]  G. Sapiro,et al.  A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[54]  Emmanuel J. Candès,et al.  NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[55]  Guangdong Feng,et al.  A Tensor Based Method for Missing Traffic Data Completion , 2013 .

[56]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[57]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[58]  Robert L. Kosut,et al.  Quantum tomography protocols with positivity are compressed sensing protocols , 2015, npj Quantum Information.

[59]  David P. Woodruff,et al.  Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[60]  Stephen J. Wright,et al.  Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[61]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[62]  Volkan Cevher,et al.  Matrix Recipes for Hard Thresholding Methods , 2012, Journal of Mathematical Imaging and Vision.

[63]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[64]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[65]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[66]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[67]  Emmanuel J. Candès,et al.  Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[68]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[69]  E. Candès,et al.  Compressed sensing and robust recovery of low rank matrices , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[70]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[72]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[73]  Elad Hazan,et al.  Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[74]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[75]  Prateek Jain,et al.  Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot , 2015, AISTATS.

[76]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[77]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[78]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[79]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[80]  Yousef Saad,et al.  Fast methods for estimating the Numerical rank of large matrices , 2016, ICML.

[81]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[82]  Pierre-Antoine Absil,et al.  RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[83]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[84]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[85]  Sham M. Kakade,et al.  Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent , 2016, NIPS.

[86]  Volkan Cevher,et al.  Randomized Low-Memory Singular Value Projection , 2013, ArXiv.

[87]  H. Andrews,et al.  Singular Value Decomposition (SVD) Image Coding , 1976, IEEE Trans. Commun..

[88]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[89]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[90]  Koen Verstrepen,et al.  Collaborative filtering with binary, positive-only data , 2015 .

[91]  Yoram Bresler,et al.  ADMiRA: Atomic Decomposition for Minimum Rank Approximation , 2009, IEEE Transactions on Information Theory.

[92]  Shuicheng Yan,et al.  Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[93]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[94]  A. Willsky,et al.  Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[95]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[96]  Massimo Fornasier,et al.  Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[97]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[98]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[99]  Anastasios Kyrillidis,et al.  Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems , 2016 .

[100]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[101]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[102]  Yoonkyung Lee,et al.  Dimensionality reduction for binary data through the projection of natural parameters , 2015, J. Multivar. Anal..

[103]  Stephen P. Boyd,et al.  Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.

[104]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[105]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[106]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.