Matrix Completion has No Spurious Local Minimum

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for \textit{positive semidefinite} matrix completion has no spurious local minima --- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve positive semidefinite matrix completion with \textit{arbitrary} initialization in polynomial time. The result can be generalized to the setting when the observed entries contain noise. We believe that our main proof strategy can be useful for understanding geometric properties of other statistical problems involving partial or noisy observations.

[1]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[2]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[3]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[4]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[5]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[6]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[7]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[8]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[9]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[10]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[11]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[12]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[13]  R. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson , 2010, 1004.3821.

[14]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[15]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[16]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[17]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[18]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[19]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[20]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[21]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[22]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[23]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[24]  Mary Wootters,et al.  Fast matrix completion without the condition number , 2014, COLT.

[25]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.

[26]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[27]  Trevor J. Hastie,et al.  Matrix completion and low-rank SVD via fast alternating least squares , 2014, J. Mach. Learn. Res..

[28]  Prateek Jain,et al.  Fast Exact Matrix Completion with Finite Samples , 2014, COLT.

[29]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[30]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[31]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[32]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[33]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[34]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[35]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[36]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[37]  Nicolas Boumal,et al.  On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.

[38]  Yuanzhi Li,et al.  Recovery guarantee of weighted low-rank approximation via alternating minimization , 2016, ICML.

[39]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[40]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.