Optimal Sample Complexity for Matrix Completion and Related Problems via 𝓁s2-Regularization

We study the strong duality of non-convex matrix factorization: we show under certain dual conditions, non-convex matrix factorization and its dual have the same optimum. This has been well understood for convex optimization, but little was known for matrix factorization. We formalize the strong duality of matrix factorization through a novel analytical framework, and show that the duality gap is zero for a wide class of matrix factorization problems. Although matrix factorization problems are hard to solve in full generality, under certain conditions the optimal solution of the non-convex program is the same as its bi-dual, and we can achieve global optimality of the non-convex program by solving its bi-dual. We apply our framework to matrix completion and robust Principal Component Analysis (PCA). While a long line of work has studied these problems, for basic problems in this area such as matrix completion, the information-theoretically optimal sample complexity was not known, and the sample complexity bounds if one also requires computational efficiency are even larger. In this work, we show that exact recoverability and strong duality hold with optimal sample complexity guarantees for matrix completion, and nearly-optimal guarantees for exact recoverability of robust PCA. For matrix completion, under the standard incoherence assumption that the underlying rank-$r$ matrix $X^*\in \mathbb{R}^{n\times n}$ with skinny SVD $U \Sigma V^T$ has $\max\{\|U^Te_i\|_2^2, \|V^Te_i\|_2^2\} \leq \frac{\mu r}{n}$ for all $i$, to the best of our knowledge we give (1) the first non-efficient algorithm achieving the optimal $O(\mu n r \log n)$ sample complexity, and (2) the first efficient algorithm achieving $O(\kappa^2\mu n r \log n)$ sample complexity, which matches the known $\Omega(\mu n r \log n)$ information-theoretic lower bound for constant condition number $\kappa$.

[1]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[2]  Junbin Gao,et al.  Relations Among Some Low-Rank Subspace Recovery Models , 2014, Neural Computation.

[3]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[4]  Raghunandan H. Keshavan Efficient algorithms for collaborative filtering , 2012 .

[5]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[6]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[7]  Anders Rantzer,et al.  Low-Rank Optimization With Convex Constraints , 2016, IEEE Transactions on Automatic Control.

[8]  J. Jahn Introduction to the Theory of Nonlinear Optimization , 1994 .

[9]  Maria-Florina Balcan,et al.  Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling , 2016, NIPS.

[10]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[11]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[12]  Michael L. Overton,et al.  On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..

[13]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[14]  Zhaoran Wang,et al.  Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[15]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I: Introduction. Preliminaries. The Geometry of Semi-Algebraic Sets. The Decision Problem for the Existential Theory of the Reals , 1992, J. Symb. Comput..

[16]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[17]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[18]  Ola Svensson,et al.  Inapproximability Results for Maximum Edge Biclique, Minimum Linear Arrangement, and Sparsest Cut , 2011, SIAM J. Comput..

[19]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[20]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[21]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[22]  Chao Zhang,et al.  Completing Low-Rank Matrices With Corrupted Samples From Few Coefficients in General Basis , 2015, IEEE Transactions on Information Theory.

[23]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[24]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[25]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[26]  Justin K. Romberg,et al.  Blind Deconvolution Using Convex Programming , 2012, IEEE Transactions on Information Theory.

[27]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[28]  M. Ledoux The concentration of measure phenomenon , 2001 .

[29]  Reinhold Schneider,et al.  Convergence Results for Projected Line-Search Methods on Varieties of Low-Rank Matrices Via Łojasiewicz Inequality , 2014, SIAM J. Optim..

[30]  David P. Woodruff,et al.  Weighted low rank approximations with provable guarantees , 2016, STOC.

[31]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[32]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[33]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[34]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[35]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[36]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[37]  Y. Zhang,et al.  Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization , 2014, Optim. Methods Softw..

[38]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[39]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[40]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[41]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[42]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[43]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[44]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[45]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[46]  Parikshit Shah,et al.  Compressed Sensing Off the Grid , 2012, IEEE Transactions on Information Theory.

[47]  Avi Wigderson,et al.  P = BPP if E requires exponential circuits: derandomizing the XOR lemma , 1997, STOC '97.

[48]  R. Vershynin Lectures in Geometric Functional Analysis , 2012 .

[49]  Yonina C. Eldar,et al.  Strong Duality in Nonconvex Quadratic Optimization with Two Quadratic Constraints , 2006, SIAM J. Optim..

[50]  Tengyu Ma,et al.  Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.

[51]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[52]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[53]  Yu-Xiang Wang,et al.  Stability of matrix factorization for collaborative filtering , 2012, ICML.

[54]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[55]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[56]  Andreas Goerdt,et al.  An approximation hardness result for bipartite Clique , 2004, Electron. Colloquium Comput. Complex..

[57]  Marie-Françoise Roy,et al.  On the combinatorial and algebraic complexity of Quanti erEliminationS , 1994 .

[58]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[59]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[60]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[61]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[62]  Xiao Zhang,et al.  A Unified Framework for Low-Rank plus Sparse Matrix Recovery , 2017, 1702.06525.

[63]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[64]  Edward Y. Chang,et al.  Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds , 2015, AAAI.

[65]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[66]  Prasad Raghavendra,et al.  Computational Limits for Matrix Completion , 2014, COLT.

[67]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[68]  Chao Zhang,et al.  A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank , 2013, ECML/PKDD.

[69]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[70]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[71]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part II: The General Decision Problem. Preliminaries for Quantifier Elimination , 1992, J. Symb. Comput..

[72]  Xiao Zhang,et al.  A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery , 2017 .

[73]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[74]  R. Vershynin Estimation in High Dimensions: A Geometric Perspective , 2014, 1405.5103.

[75]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[76]  Junbin Gao,et al.  Robust latent low rank representation for subspace clustering , 2014, Neurocomputing.

[77]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[78]  Emmanuel J. Candès,et al.  Simple bounds for recovering low-complexity models , 2011, Math. Program..

[79]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[80]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[81]  Quan Li,et al.  Matrix Completion from $O(n)$ Samples in Linear Time , 2017, COLT.

[82]  Yuanzhi Li,et al.  Recovery guarantee of weighted low-rank approximation via alternating minimization , 2016, ICML.

[83]  Uriel Feige,et al.  Resolution lower bounds for the weak pigeon hole principle , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[84]  Zeyuan Allen Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2017, STOC.

[85]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[86]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[87]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.