Matrix Completion and Related Problems via Strong Duality

This work studies the strong duality of non-convex matrix factorization problems: we show that under certain dual conditions, these problems and its dual have the same optimum. This has been well understood for convex optimization, but little was known for non-convex problems. We propose a novel analytical framework and show that under certain dual conditions, the optimal solution of the matrix factorization program is the same as its bi-dual and thus the global optimality of the non-convex program can be achieved by solving its bi-dual which is convex. These dual conditions are satisfied by a wide class of matrix factorization problems, although matrix factorization problems are hard to solve in full generality. This analytical framework may be of independent interest to non-convex optimization more broadly. We apply our framework to two prototypical matrix factorization problems: matrix completion and robust Principal Component Analysis (PCA). These are examples of efficiently recovering a hidden matrix given limited reliable observations of it. Our framework shows that exact recoverability and strong duality hold with nearly-optimal sample complexity guarantees for matrix completion and robust PCA.

[1]  Marie-Françoise Roy,et al.  On the combinatorial and algebraic complexity of Quanti erEliminationS , 1994 .

[2]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[3]  Anima Anandkumar,et al.  Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.

[4]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[5]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[6]  Ankur Moitra An Almost Optimal Algorithm for Computing Nonnegative Rank , 2013, SODA.

[7]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[8]  Chao Zhang,et al.  Completing Low-Rank Matrices With Corrupted Samples From Few Coefficients in General Basis , 2015, IEEE Transactions on Information Theory.

[9]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[11]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[12]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[13]  Parikshit Shah,et al.  Compressed Sensing Off the Grid , 2012, IEEE Transactions on Information Theory.

[14]  Avi Wigderson,et al.  P = BPP if E requires exponential circuits: derandomizing the XOR lemma , 1997, STOC '97.

[15]  Renato D. C. Monteiro,et al.  Digital Object Identifier (DOI) 10.1007/s10107-004-0564-1 , 2004 .

[16]  Andreas Goerdt,et al.  An approximation hardness result for bipartite Clique , 2004, Electron. Colloquium Comput. Complex..

[17]  R. Vershynin Lectures in Geometric Functional Analysis , 2012 .

[18]  R. Vershynin Estimation in High Dimensions: A Geometric Perspective , 2014, 1405.5103.

[19]  Constantine Caramanis,et al.  Fast Algorithms for Robust PCA via Gradient Descent , 2016, NIPS.

[20]  Reinhold Schneider,et al.  Convergence Results for Projected Line-Search Methods on Varieties of Low-Rank Matrices Via Łojasiewicz Inequality , 2014, SIAM J. Optim..

[21]  Nathan Srebro,et al.  Concentration-Based Guarantees for Low-Rank Matrix Reconstruction , 2011, COLT.

[22]  Junbin Gao,et al.  Robust latent low rank representation for subspace clustering , 2014, Neurocomputing.

[23]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[24]  Ola Svensson,et al.  Inapproximability Results for Maximum Edge Biclique, Minimum Linear Arrangement, and Sparsest Cut , 2011, SIAM J. Comput..

[25]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[26]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[27]  Quan Li,et al.  Matrix Completion from $O(n)$ Samples in Linear Time , 2017, COLT.

[28]  Yuanzhi Li,et al.  Recovery guarantee of weighted low-rank approximation via alternating minimization , 2016, ICML.

[29]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part II: The General Decision Problem. Preliminaries for Quantifier Elimination , 1992, J. Symb. Comput..

[30]  Xiao Zhang,et al.  A Nonconvex Free Lunch for Low-Rank plus Sparse Matrix Recovery , 2017 .

[31]  Uriel Feige,et al.  Resolution lower bounds for the weak pigeon hole principle , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[32]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[33]  Francis R. Bach,et al.  Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[34]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[35]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[36]  Y. Zhang,et al.  Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization , 2014, Optim. Methods Softw..

[37]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[38]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[39]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[40]  David P. Woodruff,et al.  Relative Error Tensor Low Rank Approximation , 2017, Electron. Colloquium Comput. Complex..

[41]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[42]  David P. Woodruff,et al.  Low rank approximation with entrywise l1-norm error , 2017, STOC.

[43]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[44]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[45]  J. Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I , 1989 .

[46]  Yin Zhang,et al.  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm , 2012, Mathematical Programming Computation.

[47]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[48]  Justin K. Romberg,et al.  Blind Deconvolution Using Convex Programming , 2012, IEEE Transactions on Information Theory.

[49]  M. Ledoux The concentration of measure phenomenon , 2001 .

[50]  Raghunandan H. Keshavan Efficient algorithms for collaborative filtering , 2012 .

[51]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[52]  Zhaoran Wang,et al.  Low-Rank and Sparse Structure Pursuit via Alternating Minimization , 2016, AISTATS.

[53]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I: Introduction. Preliminaries. The Geometry of Semi-Algebraic Sets. The Decision Problem for the Existential Theory of the Reals , 1992, J. Symb. Comput..

[54]  Martin J. Wainwright,et al.  Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions , 2011, ICML.

[55]  René Vidal,et al.  Structured Low-Rank Matrix Factorization: Optimality, Algorithm, and Applications to Image Processing , 2014, ICML.

[56]  Anders Rantzer,et al.  Low-Rank Optimization With Convex Constraints , 2016, IEEE Transactions on Automatic Control.

[57]  Tengyu Ma,et al.  Finding Approximate Local Minima for Nonconvex Optimization in Linear Time , 2016, ArXiv.

[58]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[59]  Ruslan Salakhutdinov,et al.  Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.

[60]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[61]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[62]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[63]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[64]  Yu-Xiang Wang,et al.  Stability of matrix factorization for collaborative filtering , 2012, ICML.

[65]  David P. Woodruff,et al.  Weighted low rank approximations with provable guarantees , 2016, STOC.

[66]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[67]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[68]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[69]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[70]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.

[71]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[72]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[73]  Yonina C. Eldar,et al.  Strong Duality in Nonconvex Quadratic Optimization with Two Quadratic Constraints , 2006, SIAM J. Optim..

[74]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[75]  Sanjeev Arora,et al.  Computing a nonnegative matrix factorization -- provably , 2011, STOC '12.

[76]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[77]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[78]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[79]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[80]  Junbin Gao,et al.  Relations Among Some Low-Rank Subspace Recovery Models , 2014, Neural Computation.

[81]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[82]  J. Jahn Introduction to the Theory of Nonlinear Optimization , 1994 .

[83]  Zhouchen Lin,et al.  Low-Rank Models in Visual Analysis: Theories, Algorithms, and Applications , 2017 .

[84]  Emmanuel J. Candès,et al.  Simple bounds for recovering low-complexity models , 2011, Math. Program..

[85]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[86]  Prasad Raghavendra,et al.  Computational Limits for Matrix Completion , 2014, COLT.

[87]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[88]  David P. Woodruff,et al.  Testing Matrix Rank, Optimally , 2018, SODA.

[89]  Chao Zhang,et al.  A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank , 2013, ECML/PKDD.

[90]  David Gross,et al.  Recovering Low-Rank Matrices From Few Coefficients in Any Basis , 2009, IEEE Transactions on Information Theory.

[91]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[92]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[93]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[94]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[95]  Sajid Javed,et al.  On the Applications of Robust PCA in Image and Video Processing , 2018, Proceedings of the IEEE.

[96]  Nathan Srebro,et al.  Learning with matrix factorizations , 2004 .

[97]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[98]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[99]  Edward Y. Chang,et al.  Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds , 2015, AAAI.

[100]  Maria-Florina Balcan,et al.  Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling , 2016, NIPS.

[101]  Michael L. Overton,et al.  On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..