论文信息 - Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably

Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably

A rank-r matrix X \in R^{m x n} can be written as a product UV', where U \in R^{m x r} and V \in R^{n x r}. One could exploit this observation in optimization: e.g., consider the minimization of a convex function f(X) over rank-r matrices, where the scaffold of rank-r matrices is modeled via the factorization in U and V variables. Such heuristic has been widely used before for specific problem instances, where the solution sought is (approximately) low-rank. Though such parameterization reduces the number of variables and is more efficient in computational speed and memory requirement (of particular interest is the case r << min{m, n}), it comes at a cost: f(UV') becomes a non-convex function w.r.t. U and V. In this paper, we study such parameterization in optimization of generic convex f and focus on first-order, gradient descent algorithmic solutions. We propose an algorithm we call the Bi-Factored Gradient Descent (BFGD) algorithm, an efficient first-order method that operates on the U, V factors. We show that when f is smooth, BFGD has local sublinear convergence, and linear convergence when f is both smooth and strongly convex. Moreover, for several key applications, we provide simple and efficient initialization schemes that provide approximate solutions good enough for the above convergence results to hold.

[1] J. A. López del Val,et al. Principal Components Analysis , 2018, Applied Univariate, Bivariate, and Multivariate Statistics Using Python.

[2] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[3] Petros Drineas,et al. Fast Monte-Carlo algorithms for approximate matrix multiplication , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4] Volkan Cevher,et al. A Universal Primal-Dual Convex Optimization Framework , 2015, NIPS.

[5] Adel Javanmard,et al. Localization from Incomplete Noisy Distance Measurements , 2011, Foundations of Computational Mathematics.

[6] Mary Wootters,et al. Fast matrix completion without the condition number , 2014, COLT.

[7] Nitish Gupta,et al. Collectively Embedding Multi-Relational Data for Predicting User Preferences , 2015, ArXiv.

[8] Anastasios Kyrillidis,et al. Provable non-convex projected gradient descent for a class of constrained matrix optimization problems , 2016, ArXiv.

[9] Lieven Vandenberghe,et al. Interior-Point Method for Nuclear Norm Approximation with Application to System Identification , 2009, SIAM J. Matrix Anal. Appl..

[10] L. Mirsky. SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[11] Yin Zhang,et al. Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions , 2013, SIAM J. Sci. Comput..

[12] Max Simchowitz,et al. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[13] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14] Zhaoran Wang,et al. A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[15] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[16] Emmanuel J. Candès,et al. Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[17] David L Donoho,et al. Compressed sensing , 2006, IEEE Transactions on Information Theory.

[18] Yudong Chen,et al. Coherent Matrix Completion , 2013, ICML.

[19] Dennis DeCoste,et al. Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[20] Alexandre d'Aspremont,et al. Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[21] Emmanuel J. Candès,et al. A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[22] Emmanuel J. Candès,et al. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[23] Stephen Becker,et al. Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[24] Manik Varma,et al. Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[25] Michael E. Tipping,et al. Probabilistic Principal Component Analysis , 1999 .

[26] Martin J. Wainwright,et al. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[27] Chao Yang,et al. ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[28] Anastasios Kyrillidis,et al. Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[29] Chunyan Miao,et al. Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[30] Jared Tanner,et al. Normalized Iterative Hard Thresholding for Matrix Completion , 2013, SIAM J. Sci. Comput..

[31] Christopher C. Johnson. Logistic Matrix Factorization for Implicit Feedback Data , 2014 .

[32] Anastasios Kyrillidis,et al. Approximate matrix multiplication with application to linear embeddings , 2014, 2014 IEEE International Symposium on Information Theory.

[33] Scott Aaronson,et al. The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34] Nagarajan Natarajan,et al. Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[35] Ewout van den Berg,et al. 1-Bit Matrix Completion , 2012, ArXiv.

[36] Robert D. Nowak,et al. Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[37] Sanjoy Dasgupta,et al. A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[38] G. Dunteman. Principal Components Analysis , 1989 .

[39] James Bennett,et al. The Netflix Prize , 2007 .

[40] Prateek Jain,et al. Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.

[41] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[42] Yi-Kai Liu,et al. Universal low-rank matrix recovery from Pauli measurements , 2011, NIPS.

[43] Steven T. Flammia,et al. Quantum tomography via compressed sensing: error bounds, sample complexity and efficient estimators , 2012, 1205.2300.

[44] Renato D. C. Monteiro,et al. A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[45] Martin Jaggi,et al. A Simple Algorithm for Nuclear Norm Regularized Problems , 2010, ICML.

[46] Aswin C. Sankaranarayanan,et al. SpaRCS: Recovering low-rank and sparse matrices from compressive measurements , 2011, NIPS.

[47] Sewoong Oh,et al. A Gradient Descent Algorithm on the Grassman Manifold for Matrix Completion , 2009, ArXiv.

[48] Sören Laue. A Hybrid Algorithm for Convex Semidefinite Optimization , 2012, ICML.

[49] Martin J. Wainwright,et al. Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[50] S. Sanghavi,et al. A general framework for high-dimensional estimation in the presence of incoherence , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[51] Kilian Q. Weinberger,et al. Graph Laplacian Regularization for Large-Scale Semidefinite Programming , 2006, NIPS.

[52] Constantine Caramanis,et al. Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[53] G. Sapiro,et al. A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography. , 2013, Journal of structural biology.

[54] Emmanuel J. Candès,et al. NESTA: A Fast and Accurate First-Order Method for Sparse Recovery , 2009, SIAM J. Imaging Sci..

[55] Guangdong Feng,et al. A Tensor Based Method for Missing Traffic Data Completion , 2013 .

[56] Vladimir Pavlovic,et al. A New Baseline for Image Annotation , 2008, ECCV.

[57] Yi Ma,et al. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[58] Robert L. Kosut,et al. Quantum tomography protocols with positivity are compressed sensing protocols , 2015, npj Quantum Information.

[59] David P. Woodruff,et al. Optimal Approximate Matrix Product in Terms of Stable Rank , 2015, ICALP.

[60] Stephen J. Wright,et al. Framework for kernel regularization with application to protein clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[61] John D. Lafferty,et al. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[62] Volkan Cevher,et al. Matrix Recipes for Hard Thresholding Methods , 2012, Journal of Mathematical Imaging and Vision.

[63] Nathan Srebro,et al. Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[64] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[65] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[66] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[67] Emmanuel J. Candès,et al. Templates for convex cone problems with applications to sparse signal recovery , 2010, Math. Program. Comput..

[68] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[69] E. Candès,et al. Compressed sensing and robust recovery of low rank matrices , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[70] Gustavo Carneiro,et al. Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71] René Vidal,et al. Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[72] Yonina C. Eldar,et al. Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[73] Elad Hazan,et al. Sparse Approximate Solutions to Semidefinite Programs , 2008, LATIN.

[74] Lawrence K. Saul,et al. A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[75] Prateek Jain,et al. Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot , 2015, AISTATS.

[76] Kim-Chuan Toh,et al. Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[77] Renato D. C. Monteiro,et al. Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[78] Lothar Reichel,et al. Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[79] Prateek Jain,et al. Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[80] Yousef Saad,et al. Fast methods for estimating the Numerical rank of large matrices , 2016, ICML.

[81] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[82] Pierre-Antoine Absil,et al. RTRMC: A Riemannian trust-region method for low-rank matrix completion , 2011, NIPS.

[83] John D. Lafferty,et al. A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[84] Petros Drineas,et al. Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[85] Sham M. Kakade,et al. Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent , 2016, NIPS.

[86] Volkan Cevher,et al. Randomized Low-Memory Singular Value Projection , 2013, ArXiv.

[87] H. Andrews,et al. Singular Value Decomposition (SVD) Image Coding , 1976, IEEE Trans. Commun..

[88] Yin Zhang,et al. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[89] Martin J. Wainwright,et al. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[90] Koen Verstrepen,et al. Collaborative filtering with binary, positive-only data , 2015 .

[91] Yoram Bresler,et al. ADMiRA: Atomic Decomposition for Minimum Rank Approximation , 2009, IEEE Transactions on Information Theory.

[92] Shuicheng Yan,et al. Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[93] Pablo A. Parrilo,et al. Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[94] A. Willsky,et al. Sparse and low-rank matrix decompositions , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[95] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[96] Massimo Fornasier,et al. Compressive Sensing , 2015, Handbook of Mathematical Methods in Imaging.

[97] Rachel Ward,et al. New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[98] Tommi S. Jaakkola,et al. Maximum-Margin Matrix Factorization , 2004, NIPS.

[99] Anastasios Kyrillidis,et al. Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems , 2016 .

[100] Yehuda Koren,et al. Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[101] Yi Ma,et al. Robust principal component analysis? , 2009, JACM.

[102] Yoonkyung Lee,et al. Dimensionality reduction for binary data through the projection of natural parameters , 2015, J. Multivar. Anal..

[103] Stephen P. Boyd,et al. Rank minimization and applications in system theory , 2004, Proceedings of the 2004 American Control Conference.

[104] Inderjit S. Dhillon,et al. Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[105] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[106] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.