Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent

Matrix completion, where we wish to recover a low rank matrix by observing a few entries from it, is a widely studied problem in both theory and practice with wide applications. Most of the provable algorithms so far on this problem have been restricted to the offline setting where they provide an estimate of the unknown matrix using all observations simultaneously. However, in many applications, the online version, where we observe one entry at a time and dynamically update our estimate, is more appealing. While existing algorithms are efficient for the offline setting, they could be highly inefficient for the online setting. In this paper, we propose the first provable, efficient online algorithm for matrix completion. Our algorithm starts from an initial estimate of the matrix and then performs non-convex stochastic gradient descent (SGD). After every observation, it performs a fast update involving only one row of two tall matrices, giving near linear total runtime. Our algorithm can be naturally used in the offline setting as well, where it gives competitive sample complexity and runtime to state of the art algorithms. Our proofs introduce a general framework to show that SGD updates tend to stay away from saddle surfaces and could be of broader interests for other non-convex problems to prove tight rates.

[1]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[2]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[3]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[4]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[5]  Zuowei Shen,et al.  Robust video denoising using low rank matrix completion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[7]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[8]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[9]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[10]  Raghunandan H. Keshavan Efficient algorithms for collaborative filtering , 2012 .

[11]  Qingsheng Zhu,et al.  Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization , 2012, Knowl. Based Syst..

[12]  Akshay Krishnamurthy,et al.  Low-Rank Matrix and Tensor Completion via Adaptive Sampling , 2013, NIPS.

[13]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Math. Program. Comput..

[14]  Prasad Raghavendra,et al.  Computational Limits for Matrix Completion , 2014, COLT.

[15]  Moritz Hardt,et al.  Understanding Alternating Minimization for Matrix Completion , 2013, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[16]  Prateek Jain,et al.  Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.

[17]  Prateek Jain,et al.  Fast Exact Matrix Completion with Finite Samples , 2014, COLT.

[18]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[19]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Nonconvex Factorization , 2014, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[20]  M. Lelarge,et al.  Streaming, Memory Limited Matrix Completion with Noise , 2015, 1504.03156.

[21]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[22]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[23]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[24]  Georgios Piliouras,et al.  Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points , 2016, ArXiv.

[25]  Prateek Jain,et al.  Matching Matrix Bernstein with Little Memory: Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, ArXiv.

[26]  Prateek Jain,et al.  Streaming PCA: Matching Matrix Bernstein and Near-Optimal Finite Sample Guarantees for Oja's Algorithm , 2016, COLT.

[27]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[28]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[29]  Georgios Piliouras,et al.  Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.